Figures
Abstract
Objective
Successful prognosis is crucial for the management and treatment of osteosarcoma (OSC). This study aimed to predict the cancer-specific survival rate in patients with OSC using deep learning algorithms and classical Cox proportional hazard models to provide data to support individualized treatment of patients with OSC.
Methods
Data on patients diagnosed with OSC from 2004 to 2017 were obtained from the Surveillance, Epidemiology, and End Results database. The study sample was then divided randomly into a training cohort and a validation cohort in the proportion of 7:3. The DeepSurv algorithm and the Cox proportional hazard model were chosen to construct prognostic models for patients with OSC. The prediction efficacy of the model was estimated using the concordance index (C-index), the integrated Brier score (IBS), the root mean square error (RMSE), and the mean absolute error (SME).
Results
A total of 3218 patients were randomized into training and validation groups (n = 2252 and 966, respectively). Both DeepSurv and Cox models had better efficacy in predicting cancer-specific survival (CSS) in OSC patients (C-index >0.74). In the validation of other metrics, DeepSurv did not have superiority over the Cox model in predicting survival in OSC patients.
Citation: Liu Y, Xie L, Wang D, Xia K (2023) A deep learning algorithm with good prediction efficacy for cancer-specific survival in osteosarcoma: A retrospective study. PLoS ONE 18(9): e0286841. https://doi.org/10.1371/journal.pone.0286841
Editor: Filomena de Nigris, Universita degli Studi della Campania Luigi Vanvitelli, ITALY
Received: April 13, 2023; Accepted: May 24, 2023; Published: September 28, 2023
Copyright: © 2023 Liu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data are available from the Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence - SEER 18 Regs Custom Data (with additional treatment fields), Any registered researcher can free download from https://seer.cancer.gov/data/.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Osteosarcoma (OSC) is among the most frequently observed primary tumors of the bone in children and adolescents and ranks third after chondrosarcoma and chordoma in adults [1]. The tumor is usually located in the distal femur and proximal tibia, with a survival rate of 50% to 65%, but 25%-50% of patients with initial metastases die from pulmonary metastases [2]. OSC shows a bimodal age distribution, with the first peak at 15–19 years and the second at 75–79 years [3, 4]. OSCs are derived from primitive mesenchyma cells, often in bones and rarely in soft tissues [5]. Although local and distant OSC metastases progress slowly, the presence or absence of metastases is an important prognostic factor [6]. The staging system of the American Joint Committee on Cancer (AJCC) is recommended by The World Health Organization; however, it has shortcomings in regards to focus and usefulness for predicting patient prognosis. Several previous studies have used nomograms to predict cancer patient survival and have achieved positive predictive efficacy [7–9]. The nomogram is a Cox proportional hazard model (CPH), and its premise is based on the following: a restrictive assumption of proportional hazard between the independent and dependent variables is satisfied. However, it is difficult to identify the practical fundamental relationship between the two variables in practice. In addition, a linear relationship between clinical characteristics and prognostic outcomes alone is not sufficient for clinical decision-making [10]. Hence, there is a need for a better model that evaluates the relationship between these nonlinear variables.
The deep learning network provides new perspectives on how to address the highly complicated linear or nonlinear relationships between clinical features and prognostic hazards of individuals [11, 12]. Fotso et al. developed a Python-based deep neural network called PySurvival [13], and it is useful for predicting the impact of patient characteristics on prognosis. In addition, the authors confirmed that the algorithm showed better performance than other methods in handling survival data. Currently, this algorithm has performed well in several cancer prognostic studies [14, 15].
To date, we have found no reports on the application of DeepSurv to OSC prognosis. Therefore, the objective of this research was to develop a DeepSurv-based prognostic model for cancer-specific survival (CSS) for patients with OSC using patient data from the Surveillance, Epidemiology, and End Results (SEER) database, and to compare the efficacy of DeepSurv with that of the Cox proportional hazard model to provide physicians and patients with predictive tools to assess the risk stratification and individual prognosis of patients with OSC.
2. Materials and methods
2.1. Eligibility criteria and clinical information
Study data were extracted from the SEER database, Plus version (https://seer.cancer.gov/), in which 18 states are enrolled, and released in April 2022 [15]. Criteria such as the primary tumor site and histological information were selected according to the International Classification of Tumor Diseases, 3rd ed. (ICD-O-3). Criteria for inclusion were defined as the following: (1) the primary site was coded as C40-41, (2) the histological codes were 9180–9187 and 9192–9194, (3) the year of diagnosis was between 2004 and 2017, and (4) the behavioral code was malignant. Criteria for exclusion were listed as: (1) missing data on months of survival, race, surgery status, or year of diagnosis; (2) imprecise tumor size; (3) unclear T-, N-, or M-stage status; or (4) laterality status listed as “missing laterality information,” or “bilateral tumors.”
Because the SEER database withholds private patient information and the authors have signed an official data-use agreement with the database, no further ethical review by the authors’ institutions was required.
2.2. Selection and reconfiguration of variables
Fifteen characteristic variables were included: age at diagnosis, sex, race, marital status, number of tumors, T stage, N stage, M stage, grade, SEER combined stage, primary site, radiation, surgery, and chemotherapy. The characteristic variables covered the demographic, clinical data, and treatment information of patients with OSC. The continuous variables of age and tumor size were classified using the X-tile software (https://medicine.yale.edu/lab/rimm/research/software/) to determine the best cut-off values. Marital status was classified as “married” or “other,” and tumor number was classified as “single” or “multiple.” The primary site was classified according to the actual frequency distribution; tumors with a higher frequency were classified separately and those with a frequency <150 were combined as “other.” Surgical modalities were classified into three categories: non-operated, radical surgery, and other surgery. Cancer-specific survival was the endpoint of interest, defined as the time interval between diagnosis and death due to OSC.
2.3. Model development and performance evaluation
In this study, two algorithms were selected for training: DeepSurv and Cox proportional hazard (CPH). The dataset was randomly split into training and validation sets in a ratio of 7:3. The predictive performances of the models were evaluated using the concordance index (C-index), the integrated Brier score (IBS), the root-mean-square error (RMSE), and the mean absolute error (MAE). The C-index varies between 0 and 1, and the closer to 1, the better the discriminatory ability of the model. The IBS is calculated by integrating the prediction-error curve between 0 and 0.25, with values closer to 0 indicating more precise prediction performance of the model [16]. The RMSE and MAE describe the differences between the actual and predicted CSS values in each model, with smaller values indicating better model performance [17].
2.4. Statistical analysis
Categorical variables are summarized as frequencies (n) and percentages (%). Differences between the two groups in baseline characteristics were analyzed using the chi-square test. Data cleaning and time-dependent ROC curves were generated using R software (version 4.2; https://www.r-project.org/). Survival algorithms were implemented in Python (version 3.7; https://www.python.org/) using PySurvival. Statistical significance was set at p < 0.05.
3. Results
3.1. Patient characteristics
A total of 3614 patients with OSC were identified in the SEER database as eligible for inclusion. After application of the exclusion criteria, 3218 of these were analyzed. Random splitting in a 7:3 ratio produced 2252 patients in the training cohort and 966 patients in the validation cohort (Fig 1).
The characteristics of the final sample set were 68.2% <35 years of age, 54.7% men, 60.8% with tumors less than 11.8 cm, 59.7% with cancer in the long bones of the lower limb and associated joint sites, 85% with a single primary tumor, 82% with stage T1-2, 81% with distant metastases, and 67.1% with grade III-IV. The N0 stage was observed in 95.9% of patients, an absence of distant metastasis in 81%, grade III-IV in 67.1%, no radiotherapy in 90.6%, and chemotherapy in 80.1%. After randomized splitting, we found no statistical differences in the characteristics of the training and validation cohorts except in the N stage, indicating good comparability between the two cohorts (Table 1).
3.2. Model development and validation
To ensure comparability, we incorporated all "dummy" features into the construction of the DeepSurv and CPH models. For the construction of the DeepSurv model, we used xav_uniform as the initial approach and used an adaptation of the moment estimation estimator with a learning rate of 0.00063 for neural network training.
In the training cohort, the C-index of the DeepSurv model exceeded that of the CPH model (0.790 vs. 0.750, respectively). Additionally, the C-index of DeepSurv exceeded that of CPH in the validation cohort (0.747 vs. 0.744, respectively). The IBS of the DeepSurv model was lower than that of CPH in the training cohort (0.14 vs. 0.15, respectively), but was 0.16 for both algorithms in the validation cohort. The RMSE and MAE of the prediction error of DeepSurv were both larger than those of the CPH model. However, the RMSE and MAE of the survival values of the DeepSurv model were 15.367 and 12.569 respectively, which were smaller than those of the CPH model (17.228 and 14.900, respectively) (Fig 2). Although the time-dependent ROC area of the DeepSurv model was larger than that of the CPH model in the training set, the event-dependent ROCs of both models overlapped well in the training cohort (Fig 3).
(A, B) Prediction errors in the DeepSurv model by root-mean-square error (RMSE) and mean absolute error (MAE), respectively. (C, D) Prediction errors in the CPH model by RMSE and MAE, respectively.
The time-dependent ROC curves for (A) the training cohorts and (B) the validation cohorts.
3.3. Algorithm deployment
Based on the DeepSurv algorithm, we built an application that predicted the CSS of patients with OSC after the entry of relevant information regarding the patient’s condition. In addition, the application can easily display the CSS rates of patients at 3, 5, and 10 years. The functionality of the application and visualization of the output are shown in Fig 4. This application is primarily intended for purposes of research and information and can be accessed publicly at the following link https://rrreert-1-14-main-1xfl0e.streamlit.app/.
4. Discussion
Accurate prediction of the survival of patients with OSC is crucial for counseling, follow-up, and management of treatment. With the development and refinement of machine learning algorithms, their applications in the medical field have become increasingly widespread [11, 18, 19]. Due to the use of an increased number of data dimensions and volume of data, machine learning has begun to rival the predictive performance of the traditional CPH model. In the present study, we employed the DeepSurv algorithm to build and evaluate a prognostic model of the CSS rate in patients with OSC, compared its predictive efficacy with that of the CPH model, and demonstrated relatively good predictive efficacy.
Studies have confirmed that age, surgical approach, tumor size, grade classification, primary site, distant metastases, and adjuvant radiotherapy are prognostic factors in patients with OSC [6, 7, 20]. Most of these studies used CPH regression algorithms for prediction, which means that the following two effects may have been simplified or ignored: effects correction, the causal effect of one exposure within the levels of another interest exposure; and cross-interaction, the causality of two exposure effects within a domain of interest [21]. Therefore, we used the DeepSurv algorithm to accommodate nonlinearities, reduce interactions, and reduce effect corrections in the SEER queue [22]. The calculator we deployed on a web page allowed not only the prediction of individual CSS rates in patients with OSC but also the comparison of the prognostic impact of different levels and variables. In the present study, the DeepSurv algorithm was not found to be superior to CPH in predicting CSS by the various metrics we evaluated.
In previous studies, machine learning algorithms representative of DeepSurv have outperformed the traditional Cox proportional hazard model in survival prediction [10, 23, 24]. In the training cohort of the present study, the DeepSurv model had a higher C-index than the CPH model; however, in the validation cohort, it did not show improved efficacy in predicting the CSS rates of patients. This suggests that machine-learning algorithms can only show advantages under conditions where traditional models are limited. Several explanations are possible for the similar efficacies observed in DeepSurv and CPH in the present study. First, the number of features used to build the model may not have been sufficiently large enough to demonstrate the advantages of machine learning in dealing with large samples of multidimensional data. Second, the collection of features available from the SEER database was mostly based on clinical experience, suggesting that the features collected may have had a strong linear relationship with patient outcomes. These features may be more suitable for applications using parametric models such as CPH. In testing the model hypothesis, the DeepSurv model was applied under a wider range of conditions than CPH, but achieved a similar predictive efficacy. This implies that DeepSurv may be an effective alternative model for predicting the CSS rate in patients with OSC.
Although we aimed to use the DeepSurv algorithm to predict the survival of patients with OSC, we obtained a model with good performance and subsequently deployed it on a webpage for easy access. However, our study has several limitations. First, it is a retrospective study with potential selection bias. Second, model training and validation were both performed using the SEER database, without external validation. Finally, the dummy-variable form used for fitting the models increased the number of features, resulting in a lack of information about feature importance in the output of the study model. Therefore, there is a significant need to implement a multicenter, large-scale prospective trial to validate the effectiveness of the model.
5. Conclusions
Using the DeepSurv algorithm, we developed a high-performance prediction model for CSS rates in patients with OSC. In addition, the developed model was deployed on a webpage to provide physicians and patients with an easy-to-use management prediction tool to facilitate personalized treatment. Our study indicates that the DeepSurv algorithm demonstrates high potential for use in applications in both clinical research and practice.
References
- 1. Rickel K, Fang F, Tao J. Molecular genetics of osteosarcoma. Bone. 2017;102: 69–79. pmid:27760307
- 2. de Nigris F, Rossiello R, Schiano C, Arra C, Williams-Ignarro S, Barbieri A, et al. Deletion of Yin Yang 1 protein in osteosarcoma cells on cell invasion and CXCR4/angiogenesis and metastasis. Cancer Res. 2008;68: 1797–1808. pmid:18339860
- 3. Czarnecka AM, Synoradzki K, Firlej W, Bartnik E, Sobczuk P, Fiedorowicz M, et al. Molecular Biology of Osteosarcoma. Cancers (Basel). 2020;12: 2130. pmid:32751922
- 4. Ritter J, Bielack SS. Osteosarcoma. Ann Oncol. 2010;21 Suppl 7: vii320–325. pmid:20943636
- 5. Harris MA, Hawkins CJ. Recent and Ongoing Research into Metastatic Osteosarcoma Treatments. Int J Mol Sci. 2022;23: 3817. pmid:35409176
- 6. Li W, Liu Y, Liu W, Tang Z-R, Dong S, Li W, et al. Machine Learning-Based Prediction of Lymph Node Metastasis Among Osteosarcoma Patients. Front Oncol. 2022;12: 797103. pmid:35515104
- 7. Wang J, Zhanghuang C, Tan X, Mi T, Liu J, Jin L, et al. A Nomogram for Predicting Cancer-Specific Survival of Osteosarcoma and Ewing’s Sarcoma in Children: A SEER Database Analysis. Front Public Health. 2022;10: 837506. pmid:35178367
- 8. Xue W, Zhang Z, Yu H, Li C, Sun Y, An J, et al. Development of nomogram and discussion of radiotherapy effect for osteosarcoma survival. Sci Rep. 2023;13: 223. pmid:36604532
- 9. W L G J, H W, R W, C X, B W, et al. Interpretable clinical visualization model for prediction of prognosis in osteosarcoma: a large cohort data study. Frontiers in oncology. 2022;12. pmid:36003782
- 10. Hou K-Y, Chen J-R, Wang Y-C, Chiu M-H, Lin S-P, Mo Y-H, et al. Radiomics-Based Deep Learning Prediction of Overall Survival in Non-Small-Cell Lung Cancer Using Contrast-Enhanced Computed Tomography. Cancers (Basel). 2022;14: 3798. pmid:35954461
- 11. Lin J, Yin M, Liu L, Gao J, Yu C, Liu X, et al. The Development of a Prediction Model Based on Random Survival Forest for the Postoperative Prognosis of Pancreatic Cancer: A SEER-Based Study. Cancers (Basel). 2022;14: 4667. pmid:36230593
- 12. Zhou X, Nakamura K, Sahara N, Takagi T, Toyoda Y, Enomoto Y, et al. Deep Learning-Based Recurrence Prediction of Atrial Fibrillation After Catheter Ablation. Circ J. 2022;86: 299–308. pmid:34629373
- 13. Fotso S. PySurvival: Open source package for Survival Analysis modeling. Accessed March 17, 2020. https://square. github.io/pysurvival/#citation
- 14. Kim DW, Lee S, Kwon S, Nam W, Cha I-H, Kim HJ. Deep learning-based survival prediction of oral cancer patients. Sci Rep. 2019;9: 6994. pmid:31061433
- 15. Adeoye J, Koohi-Moghadam M, Lo AWI, Tsang RK-Y, Chow VLY, Zheng L-W, et al. Deep Learning Predicts the Malignant-Transformation-Free Survival of Oral Potentially Malignant Disorders. Cancers (Basel). 2021;13: 6054. pmid:34885164
- 16.
Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence—SEER Research Data, 8 Registries, Nov 2021 Sub (1975–2019)—Linked To County Attributes—Time Dependent (1990–2019) Income/Rurality, 1969–2020 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2022, based on the November 2021 submission.
- 17. Lawless JF, Yuan Y. Estimation of prediction error for survival models. Stat Med. 2010;29: 262–274. pmid:19882678
- 18. Erdman EA, Young LD, Bernson DL, Bauer C, Chui K, Stopka TJ. A Novel Imputation Approach for Sharing Protected Public Health Data. Am J Public Health. 2021;111: 1830–1838. pmid:34529494
- 19. She Y, Jin Z, Wu J, Deng J, Zhang L, Su H, et al. Development and Validation of a Deep Learning Model for Non-Small Cell Lung Cancer Survival. JAMA Netw Open. 2020;3: e205842. pmid:32492161
- 20. Howard FM, Kochanny S, Koshy M, Spiotto M, Pearson AT. Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer. JAMA Netw Open. 2020;3: e2025881. pmid:33211108
- 21. Huang Y, Wang C, Tang D, Chen B, Jiang Z. Development and Validation of Nomogram-Based Prognosis Tools for Patients with Extremity Osteosarcoma: A SEER Population Study. J Oncol. 2022;2022: 9053663. pmid:35602295
- 22. Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol. 2012;41: 514–520. pmid:22253321
- 23. Kim SI, Kang JW, Eun Y-G, Lee YC. Prediction of survival in oropharyngeal squamous cell carcinoma using machine learning algorithms: A study based on the surveillance, epidemiology, and end results database. Front Oncol. 2022;12: 974678. pmid:36072804
- 24. Yang B, Liu C, Wu R, Zhong J, Li A, Ma L, et al. Development and Validation of a DeepSurv Nomogram to Predict Survival Outcomes and Guide Personalized Adjuvant Chemotherapy in Non-Small Cell Lung Cancer. Front Oncol. 2022;12: 895014. pmid:35814402