Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A deep learning algorithm with good prediction efficacy for cancer-specific survival in osteosarcoma: A retrospective study

  • Yang Liu ,

    Contributed equally to this work with: Yang Liu, Lang Xie

    Roles Resources, Software

    Affiliation Department of Orthopedics, The First Affiliated Hospital of Guizhou University of Traditional Chinese Medicine, Guiyang, China

  • Lang Xie ,

    Contributed equally to this work with: Yang Liu, Lang Xie

    Roles Data curation, Methodology

    Affiliation Hospital Infection Management Department, Bijie First People’s Hospital, Bijie, China

  • Dingxue Wang,

    Roles Software, Validation

    Affiliation Department of Oncology, The First Affiliated Hospital of Guizhou University of Traditional Chinese Medicine, Guiyang, China

  • Kaide Xia

    Roles Funding acquisition, Supervision, Writing – review & editing

    xiakaide@163.com

    Affiliation Clinical College of Maternal and Child Health Care, Guizhou Medical University, Guiyang, China

Abstract

Objective

Successful prognosis is crucial for the management and treatment of osteosarcoma (OSC). This study aimed to predict the cancer-specific survival rate in patients with OSC using deep learning algorithms and classical Cox proportional hazard models to provide data to support individualized treatment of patients with OSC.

Methods

Data on patients diagnosed with OSC from 2004 to 2017 were obtained from the Surveillance, Epidemiology, and End Results database. The study sample was then divided randomly into a training cohort and a validation cohort in the proportion of 7:3. The DeepSurv algorithm and the Cox proportional hazard model were chosen to construct prognostic models for patients with OSC. The prediction efficacy of the model was estimated using the concordance index (C-index), the integrated Brier score (IBS), the root mean square error (RMSE), and the mean absolute error (SME).

Results

A total of 3218 patients were randomized into training and validation groups (n = 2252 and 966, respectively). Both DeepSurv and Cox models had better efficacy in predicting cancer-specific survival (CSS) in OSC patients (C-index >0.74). In the validation of other metrics, DeepSurv did not have superiority over the Cox model in predicting survival in OSC patients.

Conclusions

After validation, our CSS prediction model for patients with OSC based on the DeepSurv algorithm demonstrated satisfactory prediction efficacy and provided a convenient webpage calculator.

1. Introduction

Osteosarcoma (OSC) is among the most frequently observed primary tumors of the bone in children and adolescents and ranks third after chondrosarcoma and chordoma in adults [1]. The tumor is usually located in the distal femur and proximal tibia, with a survival rate of 50% to 65%, but 25%-50% of patients with initial metastases die from pulmonary metastases [2]. OSC shows a bimodal age distribution, with the first peak at 15–19 years and the second at 75–79 years [3, 4]. OSCs are derived from primitive mesenchyma cells, often in bones and rarely in soft tissues [5]. Although local and distant OSC metastases progress slowly, the presence or absence of metastases is an important prognostic factor [6]. The staging system of the American Joint Committee on Cancer (AJCC) is recommended by The World Health Organization; however, it has shortcomings in regards to focus and usefulness for predicting patient prognosis. Several previous studies have used nomograms to predict cancer patient survival and have achieved positive predictive efficacy [79]. The nomogram is a Cox proportional hazard model (CPH), and its premise is based on the following: a restrictive assumption of proportional hazard between the independent and dependent variables is satisfied. However, it is difficult to identify the practical fundamental relationship between the two variables in practice. In addition, a linear relationship between clinical characteristics and prognostic outcomes alone is not sufficient for clinical decision-making [10]. Hence, there is a need for a better model that evaluates the relationship between these nonlinear variables.

The deep learning network provides new perspectives on how to address the highly complicated linear or nonlinear relationships between clinical features and prognostic hazards of individuals [11, 12]. Fotso et al. developed a Python-based deep neural network called PySurvival [13], and it is useful for predicting the impact of patient characteristics on prognosis. In addition, the authors confirmed that the algorithm showed better performance than other methods in handling survival data. Currently, this algorithm has performed well in several cancer prognostic studies [14, 15].

To date, we have found no reports on the application of DeepSurv to OSC prognosis. Therefore, the objective of this research was to develop a DeepSurv-based prognostic model for cancer-specific survival (CSS) for patients with OSC using patient data from the Surveillance, Epidemiology, and End Results (SEER) database, and to compare the efficacy of DeepSurv with that of the Cox proportional hazard model to provide physicians and patients with predictive tools to assess the risk stratification and individual prognosis of patients with OSC.

2. Materials and methods

2.1. Eligibility criteria and clinical information

Study data were extracted from the SEER database, Plus version (https://seer.cancer.gov/), in which 18 states are enrolled, and released in April 2022 [15]. Criteria such as the primary tumor site and histological information were selected according to the International Classification of Tumor Diseases, 3rd ed. (ICD-O-3). Criteria for inclusion were defined as the following: (1) the primary site was coded as C40-41, (2) the histological codes were 9180–9187 and 9192–9194, (3) the year of diagnosis was between 2004 and 2017, and (4) the behavioral code was malignant. Criteria for exclusion were listed as: (1) missing data on months of survival, race, surgery status, or year of diagnosis; (2) imprecise tumor size; (3) unclear T-, N-, or M-stage status; or (4) laterality status listed as “missing laterality information,” or “bilateral tumors.”

Because the SEER database withholds private patient information and the authors have signed an official data-use agreement with the database, no further ethical review by the authors’ institutions was required.

2.2. Selection and reconfiguration of variables

Fifteen characteristic variables were included: age at diagnosis, sex, race, marital status, number of tumors, T stage, N stage, M stage, grade, SEER combined stage, primary site, radiation, surgery, and chemotherapy. The characteristic variables covered the demographic, clinical data, and treatment information of patients with OSC. The continuous variables of age and tumor size were classified using the X-tile software (https://medicine.yale.edu/lab/rimm/research/software/) to determine the best cut-off values. Marital status was classified as “married” or “other,” and tumor number was classified as “single” or “multiple.” The primary site was classified according to the actual frequency distribution; tumors with a higher frequency were classified separately and those with a frequency <150 were combined as “other.” Surgical modalities were classified into three categories: non-operated, radical surgery, and other surgery. Cancer-specific survival was the endpoint of interest, defined as the time interval between diagnosis and death due to OSC.

2.3. Model development and performance evaluation

In this study, two algorithms were selected for training: DeepSurv and Cox proportional hazard (CPH). The dataset was randomly split into training and validation sets in a ratio of 7:3. The predictive performances of the models were evaluated using the concordance index (C-index), the integrated Brier score (IBS), the root-mean-square error (RMSE), and the mean absolute error (MAE). The C-index varies between 0 and 1, and the closer to 1, the better the discriminatory ability of the model. The IBS is calculated by integrating the prediction-error curve between 0 and 0.25, with values closer to 0 indicating more precise prediction performance of the model [16]. The RMSE and MAE describe the differences between the actual and predicted CSS values in each model, with smaller values indicating better model performance [17].

2.4. Statistical analysis

Categorical variables are summarized as frequencies (n) and percentages (%). Differences between the two groups in baseline characteristics were analyzed using the chi-square test. Data cleaning and time-dependent ROC curves were generated using R software (version 4.2; https://www.r-project.org/). Survival algorithms were implemented in Python (version 3.7; https://www.python.org/) using PySurvival. Statistical significance was set at p < 0.05.

3. Results

3.1. Patient characteristics

A total of 3614 patients with OSC were identified in the SEER database as eligible for inclusion. After application of the exclusion criteria, 3218 of these were analyzed. Random splitting in a 7:3 ratio produced 2252 patients in the training cohort and 966 patients in the validation cohort (Fig 1).

The characteristics of the final sample set were 68.2% <35 years of age, 54.7% men, 60.8% with tumors less than 11.8 cm, 59.7% with cancer in the long bones of the lower limb and associated joint sites, 85% with a single primary tumor, 82% with stage T1-2, 81% with distant metastases, and 67.1% with grade III-IV. The N0 stage was observed in 95.9% of patients, an absence of distant metastasis in 81%, grade III-IV in 67.1%, no radiotherapy in 90.6%, and chemotherapy in 80.1%. After randomized splitting, we found no statistical differences in the characteristics of the training and validation cohorts except in the N stage, indicating good comparability between the two cohorts (Table 1).

thumbnail
Table 1. Clinical and pathological characteristics of the study sample of patients with osteosarcoma.

https://doi.org/10.1371/journal.pone.0286841.t001

3.2. Model development and validation

To ensure comparability, we incorporated all "dummy" features into the construction of the DeepSurv and CPH models. For the construction of the DeepSurv model, we used xav_uniform as the initial approach and used an adaptation of the moment estimation estimator with a learning rate of 0.00063 for neural network training.

In the training cohort, the C-index of the DeepSurv model exceeded that of the CPH model (0.790 vs. 0.750, respectively). Additionally, the C-index of DeepSurv exceeded that of CPH in the validation cohort (0.747 vs. 0.744, respectively). The IBS of the DeepSurv model was lower than that of CPH in the training cohort (0.14 vs. 0.15, respectively), but was 0.16 for both algorithms in the validation cohort. The RMSE and MAE of the prediction error of DeepSurv were both larger than those of the CPH model. However, the RMSE and MAE of the survival values of the DeepSurv model were 15.367 and 12.569 respectively, which were smaller than those of the CPH model (17.228 and 14.900, respectively) (Fig 2). Although the time-dependent ROC area of the DeepSurv model was larger than that of the CPH model in the training set, the event-dependent ROCs of both models overlapped well in the training cohort (Fig 3).

thumbnail
Fig 2. Performance of the DeepSurv and cox proportional hazard (CPH) models in the validation cohort.

(A, B) Prediction errors in the DeepSurv model by root-mean-square error (RMSE) and mean absolute error (MAE), respectively. (C, D) Prediction errors in the CPH model by RMSE and MAE, respectively.

https://doi.org/10.1371/journal.pone.0286841.g002

thumbnail
Fig 3.

The time-dependent ROC curves for (A) the training cohorts and (B) the validation cohorts.

https://doi.org/10.1371/journal.pone.0286841.g003

3.3. Algorithm deployment

Based on the DeepSurv algorithm, we built an application that predicted the CSS of patients with OSC after the entry of relevant information regarding the patient’s condition. In addition, the application can easily display the CSS rates of patients at 3, 5, and 10 years. The functionality of the application and visualization of the output are shown in Fig 4. This application is primarily intended for purposes of research and information and can be accessed publicly at the following link https://rrreert-1-14-main-1xfl0e.streamlit.app/.

4. Discussion

Accurate prediction of the survival of patients with OSC is crucial for counseling, follow-up, and management of treatment. With the development and refinement of machine learning algorithms, their applications in the medical field have become increasingly widespread [11, 18, 19]. Due to the use of an increased number of data dimensions and volume of data, machine learning has begun to rival the predictive performance of the traditional CPH model. In the present study, we employed the DeepSurv algorithm to build and evaluate a prognostic model of the CSS rate in patients with OSC, compared its predictive efficacy with that of the CPH model, and demonstrated relatively good predictive efficacy.

Studies have confirmed that age, surgical approach, tumor size, grade classification, primary site, distant metastases, and adjuvant radiotherapy are prognostic factors in patients with OSC [6, 7, 20]. Most of these studies used CPH regression algorithms for prediction, which means that the following two effects may have been simplified or ignored: effects correction, the causal effect of one exposure within the levels of another interest exposure; and cross-interaction, the causality of two exposure effects within a domain of interest [21]. Therefore, we used the DeepSurv algorithm to accommodate nonlinearities, reduce interactions, and reduce effect corrections in the SEER queue [22]. The calculator we deployed on a web page allowed not only the prediction of individual CSS rates in patients with OSC but also the comparison of the prognostic impact of different levels and variables. In the present study, the DeepSurv algorithm was not found to be superior to CPH in predicting CSS by the various metrics we evaluated.

In previous studies, machine learning algorithms representative of DeepSurv have outperformed the traditional Cox proportional hazard model in survival prediction [10, 23, 24]. In the training cohort of the present study, the DeepSurv model had a higher C-index than the CPH model; however, in the validation cohort, it did not show improved efficacy in predicting the CSS rates of patients. This suggests that machine-learning algorithms can only show advantages under conditions where traditional models are limited. Several explanations are possible for the similar efficacies observed in DeepSurv and CPH in the present study. First, the number of features used to build the model may not have been sufficiently large enough to demonstrate the advantages of machine learning in dealing with large samples of multidimensional data. Second, the collection of features available from the SEER database was mostly based on clinical experience, suggesting that the features collected may have had a strong linear relationship with patient outcomes. These features may be more suitable for applications using parametric models such as CPH. In testing the model hypothesis, the DeepSurv model was applied under a wider range of conditions than CPH, but achieved a similar predictive efficacy. This implies that DeepSurv may be an effective alternative model for predicting the CSS rate in patients with OSC.

Although we aimed to use the DeepSurv algorithm to predict the survival of patients with OSC, we obtained a model with good performance and subsequently deployed it on a webpage for easy access. However, our study has several limitations. First, it is a retrospective study with potential selection bias. Second, model training and validation were both performed using the SEER database, without external validation. Finally, the dummy-variable form used for fitting the models increased the number of features, resulting in a lack of information about feature importance in the output of the study model. Therefore, there is a significant need to implement a multicenter, large-scale prospective trial to validate the effectiveness of the model.

5. Conclusions

Using the DeepSurv algorithm, we developed a high-performance prediction model for CSS rates in patients with OSC. In addition, the developed model was deployed on a webpage to provide physicians and patients with an easy-to-use management prediction tool to facilitate personalized treatment. Our study indicates that the DeepSurv algorithm demonstrates high potential for use in applications in both clinical research and practice.

Acknowledgments

We would like to thank Editage (www.editage.cn) for English language editing.

References

  1. 1. Rickel K, Fang F, Tao J. Molecular genetics of osteosarcoma. Bone. 2017;102: 69–79. pmid:27760307
  2. 2. de Nigris F, Rossiello R, Schiano C, Arra C, Williams-Ignarro S, Barbieri A, et al. Deletion of Yin Yang 1 protein in osteosarcoma cells on cell invasion and CXCR4/angiogenesis and metastasis. Cancer Res. 2008;68: 1797–1808. pmid:18339860
  3. 3. Czarnecka AM, Synoradzki K, Firlej W, Bartnik E, Sobczuk P, Fiedorowicz M, et al. Molecular Biology of Osteosarcoma. Cancers (Basel). 2020;12: 2130. pmid:32751922
  4. 4. Ritter J, Bielack SS. Osteosarcoma. Ann Oncol. 2010;21 Suppl 7: vii320–325. pmid:20943636
  5. 5. Harris MA, Hawkins CJ. Recent and Ongoing Research into Metastatic Osteosarcoma Treatments. Int J Mol Sci. 2022;23: 3817. pmid:35409176
  6. 6. Li W, Liu Y, Liu W, Tang Z-R, Dong S, Li W, et al. Machine Learning-Based Prediction of Lymph Node Metastasis Among Osteosarcoma Patients. Front Oncol. 2022;12: 797103. pmid:35515104
  7. 7. Wang J, Zhanghuang C, Tan X, Mi T, Liu J, Jin L, et al. A Nomogram for Predicting Cancer-Specific Survival of Osteosarcoma and Ewing’s Sarcoma in Children: A SEER Database Analysis. Front Public Health. 2022;10: 837506. pmid:35178367
  8. 8. Xue W, Zhang Z, Yu H, Li C, Sun Y, An J, et al. Development of nomogram and discussion of radiotherapy effect for osteosarcoma survival. Sci Rep. 2023;13: 223. pmid:36604532
  9. 9. W L G J, H W, R W, C X, B W, et al. Interpretable clinical visualization model for prediction of prognosis in osteosarcoma: a large cohort data study. Frontiers in oncology. 2022;12. pmid:36003782
  10. 10. Hou K-Y, Chen J-R, Wang Y-C, Chiu M-H, Lin S-P, Mo Y-H, et al. Radiomics-Based Deep Learning Prediction of Overall Survival in Non-Small-Cell Lung Cancer Using Contrast-Enhanced Computed Tomography. Cancers (Basel). 2022;14: 3798. pmid:35954461
  11. 11. Lin J, Yin M, Liu L, Gao J, Yu C, Liu X, et al. The Development of a Prediction Model Based on Random Survival Forest for the Postoperative Prognosis of Pancreatic Cancer: A SEER-Based Study. Cancers (Basel). 2022;14: 4667. pmid:36230593
  12. 12. Zhou X, Nakamura K, Sahara N, Takagi T, Toyoda Y, Enomoto Y, et al. Deep Learning-Based Recurrence Prediction of Atrial Fibrillation After Catheter Ablation. Circ J. 2022;86: 299–308. pmid:34629373
  13. 13. Fotso S. PySurvival: Open source package for Survival Analysis modeling. Accessed March 17, 2020. https://square. github.io/pysurvival/#citation
  14. 14. Kim DW, Lee S, Kwon S, Nam W, Cha I-H, Kim HJ. Deep learning-based survival prediction of oral cancer patients. Sci Rep. 2019;9: 6994. pmid:31061433
  15. 15. Adeoye J, Koohi-Moghadam M, Lo AWI, Tsang RK-Y, Chow VLY, Zheng L-W, et al. Deep Learning Predicts the Malignant-Transformation-Free Survival of Oral Potentially Malignant Disorders. Cancers (Basel). 2021;13: 6054. pmid:34885164
  16. 16. Surveillance, Epidemiology, and End Results (SEER) Program (www.seer.cancer.gov) SEER*Stat Database: Incidence—SEER Research Data, 8 Registries, Nov 2021 Sub (1975–2019)—Linked To County Attributes—Time Dependent (1990–2019) Income/Rurality, 1969–2020 Counties, National Cancer Institute, DCCPS, Surveillance Research Program, released April 2022, based on the November 2021 submission.
  17. 17. Lawless JF, Yuan Y. Estimation of prediction error for survival models. Stat Med. 2010;29: 262–274. pmid:19882678
  18. 18. Erdman EA, Young LD, Bernson DL, Bauer C, Chui K, Stopka TJ. A Novel Imputation Approach for Sharing Protected Public Health Data. Am J Public Health. 2021;111: 1830–1838. pmid:34529494
  19. 19. She Y, Jin Z, Wu J, Deng J, Zhang L, Su H, et al. Development and Validation of a Deep Learning Model for Non-Small Cell Lung Cancer Survival. JAMA Netw Open. 2020;3: e205842. pmid:32492161
  20. 20. Howard FM, Kochanny S, Koshy M, Spiotto M, Pearson AT. Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer. JAMA Netw Open. 2020;3: e2025881. pmid:33211108
  21. 21. Huang Y, Wang C, Tang D, Chen B, Jiang Z. Development and Validation of Nomogram-Based Prognosis Tools for Patients with Extremity Osteosarcoma: A SEER Population Study. J Oncol. 2022;2022: 9053663. pmid:35602295
  22. 22. Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol. 2012;41: 514–520. pmid:22253321
  23. 23. Kim SI, Kang JW, Eun Y-G, Lee YC. Prediction of survival in oropharyngeal squamous cell carcinoma using machine learning algorithms: A study based on the surveillance, epidemiology, and end results database. Front Oncol. 2022;12: 974678. pmid:36072804
  24. 24. Yang B, Liu C, Wu R, Zhong J, Li A, Ma L, et al. Development and Validation of a DeepSurv Nomogram to Predict Survival Outcomes and Guide Personalized Adjuvant Chemotherapy in Non-Small Cell Lung Cancer. Front Oncol. 2022;12: 895014. pmid:35814402