Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Prognosing post-treatment outcomes of head and neck cancer using structured data and machine learning: A systematic review

  • Mohammad Moharrami ,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Writing – original draft, Writing – review & editing

    faraz.moharrami@mail.utoronto.ca

    Affiliations Faculty of Dentistry, University of Toronto, Toronto, Canada, Department of Dental Oncology, Princess Margaret Cancer Centre, Toronto, Canada, Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Geneva, Switzerland

  • Parnia Azimian Zavareh,

    Roles Data curation, Investigation, Methodology, Writing – review & editing

    Affiliation Topic Group Dental Diagnostics and Digital Dentistry, ITU/WHO Focus Group AI on Health, Geneva, Switzerland

  • Erin Watson,

    Roles Formal analysis, Project administration, Supervision, Writing – review & editing

    Affiliations Faculty of Dentistry, University of Toronto, Toronto, Canada, Department of Dental Oncology, Princess Margaret Cancer Centre, Toronto, Canada

  • Sonica Singhal,

    Roles Investigation, Methodology, Visualization, Writing – review & editing

    Affiliations Faculty of Dentistry, University of Toronto, Toronto, Canada, Chronic Disease and Injury Prevention Department, Health Promotion, Public Health Ontario, Toronto, Canada

  • Alistair E. W. Johnson,

    Roles Conceptualization, Formal analysis, Supervision, Writing – review & editing

    Affiliation Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada

  • Ali Hosni,

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Radiation Oncology, Princess Margaret Cancer Center, University of Toronto, Toronto, Canada

  • Carlos Quinonez,

    Roles Conceptualization, Data curation, Methodology, Writing – review & editing

    Affiliations Faculty of Dentistry, University of Toronto, Toronto, Canada, Schulich School of Medicine & Dentistry, Western University, London, Canada

  • Michael Glogauer

    Roles Conceptualization, Data curation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliations Faculty of Dentistry, University of Toronto, Toronto, Canada, Department of Dental Oncology, Princess Margaret Cancer Centre, Toronto, Canada, Department of Dentistry, Centre for Advanced Dental Research and Care, Mount Sinai Hospital, Toronto, Canada

Abstract

Background

This systematic review aimed to evaluate the performance of machine learning (ML) models in predicting post-treatment survival and disease progression outcomes, including recurrence and metastasis, in head and neck cancer (HNC) using clinicopathological structured data.

Methods

A systematic search was conducted across the Medline, Scopus, Embase, Web of Science, and Google Scholar databases. The methodological characteristics and performance metrics of studies that developed and validated ML models were assessed. The risk of bias was evaluated using the Prediction model Risk Of Bias ASsessment Tool (PROBAST).

Results

Out of 5,560 unique records, 34 articles were included. For survival outcome, the ML model outperformed the Cox proportional hazards model in time-to-event analyses for HNC, with a concordance index of 0.70–0.79 vs. 0.66–0.76, and for all sub-sites including oral cavity (0.73–0.89 vs. 0.69–0.77) and larynx (0.71–0.85 vs. 0.57–0.74). In binary classification analysis, the area under the receiver operating characteristics (AUROC) of ML models ranged from 0.75–0.97, with an F1-score of 0.65–0.89 for HNC; AUROC of 0.61–0.91 and F1-score of 0.58–0.86 for the oral cavity; and AUROC of 0.76–0.97 and F1-score of 0.63–0.92 for the larynx. Disease-specific survival outcomes showed higher performance than overall survival outcomes, but the performance of ML models did not differ between three- and five-year follow-up durations. For disease progression outcomes, no time-to-event metrics were reported for ML models. For binary classification of the oral cavity, the only evaluated subsite, the AUROC ranged from 0.67 to 0.97, with F1-scores between 0.53 and 0.89.

Conclusions

ML models have demonstrated considerable potential in predicting post-treatment survival and disease progression, consistently outperforming traditional linear models and their derived nomograms. Future research should incorporate more comprehensive treatment features, emphasize disease progression outcomes, and establish model generalizability through external validations and the use of multicenter datasets.

1. Introduction

Head and neck cancer (HNC) is the seventh most common cancer globally, accounting for more than 660,000 new cases and 325,000 deaths annually. By 2030, the incidence of HNC is expected to rise by 30% compared to the 2020 rate, largely driven by increases in oropharyngeal cancer [1, 2]. Post-treatment recurrences and metastases are common occurrences that contribute to poor prognosis of HNC [3, 4]. The five-year relative survival rate of HNC has improved during the past decades from 54.1% (1975–84) to 66.8% (2005–2004) based on the Surveillance, Epidemiology, and End Results (SEER) data [5]. Despite the increase in survival rates, per capita death rates have risen over the last decade, reflecting the predominance of the increase in incidence over the survival rate [2]. Squamous cell carcinoma (SCC) is the most common type of HNC, constituting 90% of the cases and attracting considerable research aimed at enhancing diagnostic, prognostic, and therapeutic interventions. [6].

Traditional prognostication of HNC outcomes primarily relied on nomograms and linear models that took into account factors such as the primary tumor size and extent, lymph node involvement, and the presence of distant metastasis [3]. However, this approach inadequately addressed the inherent heterogeneity among HNC patients, leading to less accurate individual risk assessments. In response, recent models have incorporated a more diverse range of prognostic variables, including patient demographics, histopathological information, treatment details, comorbidities, and molecular markers [7, 8]. Concurrently, machine learning (ML) has emerged as a promising tool, leveraging its capacity for non-parametric modeling to analyze extensive and intricate datasets more flexibly. This approach allows for the accounting of non-linear relationships and interactions between predictors, offering a more nuanced understanding of the data [9, 10].

Although there has been a shift toward including unstructured data such as medical images in the prognosis of HNC outcomes, utilizing structured data such as tabulated clinicopathological features offers several advantages. Structured data is more amenable to systematic organization and analysis, whereas unstructured data requires extensive preprocessing to convert it into a format suitable for analysis [11, 12]. Recent advancements in deep learning (DL) have reduced the need for extensive preprocessing by substituting it with the requirement for large datasets for model development [13]. However, the scarcity of large, suitable datasets for training presents a challenge to this approach. Structured data typically comprises variables with established clinical relevance, leading to well-specified models with greater interpretability. This characteristic also makes structured data more consistent across different institutions and provides opportunities for external validation. Furthermore, the availability of structured data might facilitate the development of larger-scale prognostic models [14].

This review aims to evaluate the performance of ML models in predicting post-treatment disease progression (i.e., recurrence and metastasis) and survival outcomes in patients with HNC, utilizing structured data. This study distinctly concentrates on models that leverage clinicopathological data, routinely sourced from electronic health records (EHRs), which hold the potential for implementation in actual clinical settings. Additionally, this review will not be confined to the oral cavity but will also encompass other disease sites within HNC and provide an in-depth analysis of each site. The study will also provide an evaluation of current ML models, highlighting their strengths and weaknesses in light of rapid field advancements.

2. Materials and methods

2.1. Protocols

This systematic review was prepared according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement [15]. The focused question was addressed using the participant, intervention, comparison, outcomes, time, and setting (PICO-TS) criteria [16]. The study protocol was registered on the International Prospective Register of Systematic Reviews (PROSPERO) platform under the registration number CRD42023426148.

2.2. Search strategy

Five electronic databases including Medline, Scopus, Embase, Web of Science, and Google Scholar were used to identify publications that met the inclusion criteria. The search was conducted up to March 15, 2024, using the following combination of MeSH terms and keywords: (artificial intelligence OR AI OR machine learning OR deep learning OR neural network* OR supervised learning OR semisupervised learning OR unsupervised learning OR multilayer perceptron OR MLP) AND (oral OR head and neck OR HNC OR mouth OR oropharyn* OR hypopharyn* OR nasopharyn* OR laryn*) AND (carcinoma OR malignancy OR neoplasm OR cancer*). Besides the electronic databases, reference lists of the selected studies were manually screened.

2.3. Eligibility criteria

The following inclusion criteria were used to screen and assess the retrieved records:

  1. being an original peer-reviewed article in English;
  2. reporting on ML-based prognostication models for post-treatment survival and disease progression. Post-treatment survival outcomes included both disease-specific and overall survival for any follow-up duration, encompassing both disease-free survival and survival with disease. Disease progression outcomes comprised post-treatment loco-regional recurrence and metastasis.
  3. reporting on models developed only using structured and clinicopathological data;
  4. reporting on the performance of the ML and DL models.

Records were excluded if they:

  1. used unstructured data; unstructured data refers to information that does not have a predefined data format or is not organized in a systematic manner. This type of data typically includes formats like text, images, audio, and video, which are not easily searchable or analyzable using conventional data processing techniques.
  2. used structured data other than clinicopathological such as biochemical markers, molecular, and genomic data;
  3. reported on prognostic outcomes other than survival and disease progression;
  4. only reported on validation and not the development of models.

2.4. Focused PICO-TS question

What is the predictive performance of ML models, developed based on clinicopathological data, in informing the survival and disease progression of HNC patients?

  1. Participants (P): HNC patients who received treatments.
  2. Intervention (I): application of ML models in predicting post-treatment survival and disease progression outcomes
  3. Comparison (C): ML models’ predictive performance compared to actual events and/or traditional linear models. Examples of traditional models include logistic and Cox proportional hazards (Cox PH) regressions.
  4. Outcomes (O): performance metrics reported for binary classification or time-to-event outcomes including sensitivity (recall), specificity, precision, F1-score, accuracy, area under receiver operating characteristics (AUROC), concordance index (C-index), and Brier score.
  5. Timeframe (T): The review will include both retrospective and prospective studies that report outcomes based on data collected from patients over varying follow-up periods to ensure the analysis reflects long-term prognostic performance. We will specify the follow-up durations for each included study in the review results.
  6. Settings (S): The review will include studies conducted in various clinical settings, such as hospitals, cancer research centers, and academic institutions. The data will be primarily sourced from electronic health records (EHRs), which are routinely used in clinical practice.

2.5. Selection of studies

A two-stage screening (title-abstract, full text) was carried out by two authors independently (MM, PAZ). Title management was performed by a commercially available software program (Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia). The duplicates were removed within and between the databases. The full texts of potential articles were retrieved and evaluated using an eligibility form. Any disagreements on the selection of studies were discussed and resolved. The reasons for excluding articles not meeting the eligibility criteria were reported.

2.6. Data extraction

Using a predesigned data extraction form, the following information was extracted from the papers that met the eligibility criteria: title, authors’ names, authors’ affiliations, database, year of publication population, sample size, case-to-control ratio, outcome measure, tumor site, tumor histology, clinicopathological features, ML models, dimensionality reduction, feature selection, resampling techniques, imbalance class correction, validation (i.e., internal vs. external), performance metrics, traditional model comparator, and authors’ conclusion.

2.7. Risk of bias

The systematic review assessed the quality of the included studies using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) [17]. This tool, tailored for evaluating diagnostic and prognostic models, examines bias risk across four domains: participants, predictors, outcomes, and analysis. The evaluation employed 20 signaling questions, categorizing the risk of bias as low, unclear, or high. Applicability concerns for the initial three domains were similarly classified as low, unclear, or high. An overall low risk of bias was determined if all domains received a low rating. However, for prediction models developed without external validation, even if all domains were rated low, the risk of bias should be considered high unless the model was developed on a very large dataset and included internal validation [18]. Two authors (MM, PAZ) independently conducted the assessments, with cross-checking to ensure accuracy and consistency.

2.8. Data analysis

A qualitative methodology was employed to interpret and summarize the findings from the selected studies. This synthesis encompassed study characteristics, clinicopathological features, ML models, validation methods, and performance metrics. Subgroup analyses of the results were adapted based on the tumor site. Specifically, the following sites were considered separately: oral cavity, oropharynx, nasopharynx, hypopharynx, and larynx. When a specific subsite, such as the tongue, was investigated, it was categorized under a broader category, such as the oral cavity. If a study combined different sites without reporting performance for each site separately, they were categorized as HNC. The synthesis of the results was performed when there were at least two or more studies available for each site. When feasible, and if not reported, metrics such as the F1-score were derived from other metrics such as recall and precision. For binary classification tasks that did not incorporate time modeling, our report primarily focuses on AUROC as a threshold-independent metric and the F1-score as a threshold-dependent metric, provided these were reported. In cases where studies conducted time-to-event analyses, considering both time modeling and censored data, the primary performance metric employed was the C-index. In addition to discrimination metrics, calibration metrics were reported if they were assessed in a study. While all performance metrics were documented, only those derived from test sets (either internal or external) were utilized for synthesis. The discussion includes an analysis of the limitations and trends observed across the studies, providing insights into the current state of ML models used in this domain.

3. Result

3.1. Study selection and search results

The systematic search resulted in 5,560 unique records. After screening the title and abstracts 5,489 records were excluded. From the remaining 71 records, 39 articles were excluded in the full-text reading stage: 15 articles worked on non-clinicopathological data [1930], five articles were about preventive outcomes including early diagnosis and malignancy transformation [3135], five articles were on pre-treatment lymph node metastasis [3640], four articles were about the use of ML in treatment planning and delivery [4144], three articles used traditional stochastic modeling [4547], one article was not about HNC [48], one article validated an already available tool and did not develop a model [49], one article was about toxicity [50], and three articles were on non-SCC cancer [5154]. After manually screening the reference lists, two other studies were added, and the final 34 included studies were processed for data extraction [5588]. The summary of the search strategy is depicted in the PRISMA flowchart in Fig 1.

thumbnail
Fig 1. The flowchart of the search process.

From: Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 2021;372:n71. doi: 10.1136/bmj.n71. For more information, visit: http://www.prisma-statement.org/.

https://doi.org/10.1371/journal.pone.0307531.g001

3.2. Assessment of methodological quality

Table 1 outlines a comprehensive assessment of the risk of bias and applicability concerns. Among the 34 studies evaluated, six were assessed as exhibiting a low overall risk of bias, 24 exhibited a high risk, and four exhibited an unclear risk of bias. In the participant domain, all studies were characterized by a low risk of bias except for one with unclear risk. Concerning the predictor domain, an unclear risk of bias was identified in three studies, whereas the remaining studies were considered to have a low risk. For the outcome domain, one study was identified as high risk, four had an unclear risk, and the remainder were categorized as low risk. The most significant source of bias emerged in the analysis’s domain, with 24 studies classified as high risk. The primary contributors to this high-risk rating were small sample sizes, which resulted in a ratio of participants with the outcome to the number of predictor candidates being less than 10. Additional contributing factors included a lack of cross-validation or a test set, not accounting for complexities in data such as censoring or competing risks, the utilization of univariate analyses for feature selection, and failure to report performance metrics for both discrimination and calibration. In the analysis’s domain, four other studies exhibited an unclear risk of bias, while only six were evaluated as having a low risk. Regarding applicability concerns, all studies demonstrated a low risk across all domains (Fig 2).

thumbnail
Fig 2. The risk of bias and applicability concerns of the included studies based on PROBAST.

Abbreviations. ROB: risk of bias, PROBAST: Prediction model Risk Of Bias ASsessment Tool.

https://doi.org/10.1371/journal.pone.0307531.g002

thumbnail
Table 1. The risk of bias and applicability concerns of the included studies based on PROBAST.

https://doi.org/10.1371/journal.pone.0307531.t001

3.3. Study characteristics

The characteristics of the included studies are summarized in chronological order in Tables 2 and 3 for the survival and disease progression outcomes. The publication dates ranged from 2015 to 2024 with only three studies published before 2019. Out of the 34 studies analyzed, 15 were based on the US population data, with 10 of those studies being conducted by non-US institutions using SEER datasets. The sample sizes ranged from 145 to 177,714. Of the 34 studies, 24 were on survival outcomes while 11 were on disease progression; one study covered both survival and disease progression. For the survival outcome, 11 studies reported the performance of ML models for tumors located in the oral cavity, eight for the larynx, five for the oropharynx, three for the hypopharynx, and two for the nasopharynx; also six studies reported the performance of ML models for HNC without differentiating for different sites. Regarding disease progression outcomes, nine studies targeted the oral cavity, while one study focused on the larynx, and another study HNC.

thumbnail
Table 2. The characteristic of the included studies in chronological order for the survival outcome.

https://doi.org/10.1371/journal.pone.0307531.t002

thumbnail
Table 3. The characteristic of the included studies in chronological order for the disease progression outcomes.

https://doi.org/10.1371/journal.pone.0307531.t003

According to Table 4, 11 out of 24 studies conducted time-to-event analyses for the survival outcome. In contrast, 14 studies performed binary classification, and one study carried out regression task. Also, six out of 24 studies performed calibration besides discrimination analyses. Referring to Table 5, for the disease progression outcome, only one study undertook time-to-event analyses, whereas the remaining 10 studies engaged in binary classification. Considering Tables 4 and 5 for both survival and disease progression outcomes, 12 studies implemented dimensionality reduction or feature selection, and also eight studies addressed the class imbalance through correction techniques. Concerning validation methods for the developed models, four studies utilized external validation, two applied temporal validation, and the remaining studies depended on internal validation.

thumbnail
Table 4. Performance of the included models in chronological order for the survival outcome.

https://doi.org/10.1371/journal.pone.0307531.t004

thumbnail
Table 5. Performance of the included models in chronological order for the disease progression outcomes.

https://doi.org/10.1371/journal.pone.0307531.t005

3.4. Performance of models for survival outcome

3.4.1. Time-to-event models.

For HNC as a whole, three studies reported C-indices ranging from 0.70 to 0.79 for ML models; In the same studies, the C-indices for Cox PH studies ranged from 0.66 to 0.76 [66, 68, 75]. For the oral cavity, the C-index for four studies conducting time-to-event analyses ranged from 0.73 to 0.89, with three of these studies also reporting on Cox PH models, showing C-indices between 0.69 and 0.77 [60, 71, 72, 75]. For the larynx, five studies reported C-indices for ML models ranging from 0.71 to 0.85, and for Cox PH models in the same studies, ranging from 0.57 to 0.74 [72, 75, 78, 80, 86]. For the oropharynx, three studies reported C-indices ranging from 0.77 to 0.80, [72, 73, 75], but there was not enough data to consolidate the C-index for Cox PH models. For the hypopharynx and nasopharynx, only two studies reported C-indices for ML models, ranging from 0.72 to 0.79 [72, 75] and from 0.72 to 0.83 [76, 77]. The C-index for Cox PH could not be consolidated for hypopharynx and nasopharynx.

Among the ML models, the Random Survival Forest (RSF) was employed in nine studies and compared with other ML models in seven of these studies. RSF outperformed the other models in three of these comparative studies [66, 72, 75]. Among the other four studies, two incorporated DeepSurv and demonstrated superior performance compared to RSF [60, 71]. Additionally, in another study, a deep neural network model [80] and survival SVM [77] outperformed RSF. Furthermore, DeepSurv excelled in all three studies in which it was compared to other models. The detailed performance metrics for the best-performing models can be found in Table 4.

3.4.2. Binary classification models.

For HNC as a whole, four studies reported AUROC ranging from 0.75 to 0.97, and three of these studies also reported F1-scores ranging from 0.65 to 0.89; based on three of these studies, AUROC for logistic regressions ranged from 0.71 to 0.84, and based on two studies, F1-scores ranged from 0.54 to 0.77 [67, 68, 75, 76]. For the oral cavity, seven studies reported AUROC and F1-score for ML models, ranging from 0.61 to 0.91 and 0.58 to 0.86, respectively, with four of these studies also reporting on logistic regression models, which ranged from 0.52 to 0.69 for AUROC and 0.57 to 0.62 for F1-score [56, 58, 63, 75, 76, 87, 88]. For the larynx, four studies reported AUROC for ML models ranging from 0.76 to 0.97, and three of these studies also reported F1-scores ranging from 0.63 to 0.92; from these four, two studies also reported on logistic regression with AUROC ranging from 0.76 to 0.92 [75, 76, 81, 85]. For the oropharynx, three studies reported AUROC for ML models ranging from 0.93 to 0.97 and F1-scores ranging from 0.90 to 0.92, but there was not enough information to consolidate results for logistic regression [7476]. Moreover, the AUROC of ML models ranged from 0.77 to 0.85 for two studies that reported on the hypopharynx [75, 82]. The results for the nasopharynx could not be synthesized since only one study reported on the related metrics.

Among ML models, tree-based models were used more often and generally demonstrated superior performance in terms of AUROC and F1-scores compared to other algorithms. In nine studies that conducted comparative analyses, six found that tree-based models, including random forest and XGBoost, outperformed others [58, 63, 76, 82, 85, 87]. However, in one study, a voting classifier of random forest, logistic regression, and Gaussian Naïve Bayes was superior [88]. In another study, a Support Vector Machine (SVM) excelled [81], and in one study, neural networks showed superior performance [67]. The detailed performance metrics for the best-performing models can be found in Table 4.

3.5. Performance of models for disease progression outcomes

3.5.1. Time-to-event models.

The only study that performed time-to-event analyses did not report a c-index for the RSF model, but it was 0.60 for Cox PH [83].

3.5.2. Binary classification models.

Overall, out of 11 studies, nine studies reported on the performance of ML models on the oral cavity from which five studies reported AUROC values ranging from 0.67 to 0.97 [59, 64, 65, 79, 84]. Among these, three studies also evaluated traditional linear models, with the AUROC for corresponding ML models ranging from 0.67 to 0.88, while for linear models including logistic regression, it ranged from 0.68 to 0.73 [64, 79, 84]. Additionally, six studies provided F-1 scores for ML models, with values ranging from 0.53 to 0.89 [57, 59, 62, 69, 79, 84]. Of these, three studies included F1-scores for linear models as well, with the corresponding ML models’ F1-scores ranging from 0.53 to 0.89 and linear models’ scores from 0.30 to 0.87 [69, 79, 84]. Besides the oral cavity, other sites either had no or only one related study, so their results could not be consolidated.

For the disease progression outcomes, unlike the survival outcomes, there was not a consistent trend indicating superior performance of tree-based models. The specific performance metrics for the top-performing models are detailed in Table 5.

4. Discussion

This systematic review assessed the utilization and performance of ML models in predicting post-treatment survival and disease progression in HNC patients, based on structured clinicopathological data. ML models consistently surpassed traditional models, including Cox PH and logistic regressions, as well as nomograms derived from these linear models. Furthermore, the results indicated that, in time-to-event analyses, ML models demonstrated superior performance for specific HNC subsites, such as the oral cavity, oropharynx, and larynx, compared to overall HNC. This discrepancy may be attributed to the distinct risk factors, etiology, tumor biology, incidence, treatment strategies, and prognosis associated with each subsite. [89].

The enhanced performance of ML models in modeling time and censored data for the survival outcome of different sites, including HNC as a whole (C-index range: 0.70 to 0.79), oral cavity (C-index range: 0.73 to 0.89), and larynx (C-index range: 0.71 to 0.85), compared to traditional models such as Cox PH for HNC (C-index range: 0.66 to 0.76), oral cavity (C-index range: 0.69 to 0.77), and larynx (C-index range: 0.57 to 0.74), can be attributed to several factors. Primarily, ML models are non-parametric, meaning they do not presuppose a specific form for the data and can also identify complex non-linear relationships between variables, which is beyond the capability of traditional linear models. Additionally, models such as Cox PH presume that the risks associated with different individuals are proportional over time, suggesting that at any given time point, a specific individual will always have a higher or lower risk than another, implying that their survival curves will never intersect, a premise not always valid in real-world settings. Moreover, multicollinearity is not an issue for most ML models, and they can process numerous features and their interactions, providing them with an intrinsic ability to dissect population-level incidence more effectively by considering a broader range of individuals’ features, thereby offering superior personalized risk predictions.

A variety of ML models were utilized for time-to-event analyses of the survival outcome. Among these models, DeepSurv emerged as the top performer. DeepSurv, a deep feed-forward neural network, outperformed the other models in every study in which it was employed [60, 71, 73]. DeepSurv implements a deep learning generalization of the Cox proportional hazards model and has an advantage over traditional Cox PH because it does not require a priori selection of covariates, but rather learns them adaptively. It has the capacity to model complex non-linear relationships between a patient’s covariates and the hazard function [90]. In the absence of DeepSurv, RSF showed the best performance in three studies [66, 72, 75]. RSF ensemble approach, combining multiple decision trees, enhances predictive accuracy, while its provision of variable importance measures offers valuable insights into influential factors affecting outcomes. RSF’s minimal preprocessing requirement is particularly beneficial in the medical field. Moreover, it adeptly handles non-linear relationships. Importantly, RSF allows for individual hazard functions to intersect, accommodating non-proportional hazards, which contrasts with models like DeepSurv that assume proportional hazards across individuals [91].

In the binary classification of survival outcomes, the observed trends paralleled those in time-to-event analyses with ML models outperforming linear models such as logistic regression. A broad spectrum of ML models was utilized for binary classification, encompassing simple neural networks, deep learning, SVMs, Naïve Bayes, K-nearest neighbors, gradient boosting, and various tree-based models. Among these, tree-based models, particularly random forest and XGBoost, were found to outperform other models in six out of nine studies that conducted such comparisons [58, 63, 76, 82, 85, 87]. However, there were instances where other algorithms were superior to tree-based models. For example, in one study, an SVM—distinguished by its capacity to identify the optimal margin separating different classes, particularly in high-dimensional spaces—exhibited superior performance [81]. Additionally, in another study, an ensemble voting model comprising random forest, logistic regression, and Gaussian Naïve Bayes excelled. This model considerably enhanced predictive accuracy by amalgamating the strengths of multiple models to reduce individual errors and variances [74]. This indicates that while tree-based models are favorable, there is a necessity to explore the potential of other ML models further.

In the examined research, two studies developed predictive models for both overall survival and disease-specific survival [71, 72]. Peng et al. observed that for the oral cavity, the performance of the RSF model was superior for disease-specific compared to overall survival based on the C-index score (0.84 vs 0.80). Another study by Adeoye et al. identified a consistent pattern showing that for the oral cavity, the C-index for disease-specific was higher than overall survival (0.89 vs. 0.77). The possible explanation for these findings can be that disease-specific survival only considers deaths attributed to the disease being studied, thus providing a more focused measure of a treatment’s effectiveness against the disease. In contrast, overall survival includes all causes of death, which can introduce more variability and potential confounding factors that could affect the model’s performance. Additionally, based on Table 2 and the chosen features, these results might be due to selecting predictors that are more tailored to the specific disease survival rather than overall survival.

The performance of the predictive models exhibited minimal variations across different follow-up durations. In the analysis of three studies comparing three- and five-year survival rates, two studies reported a marginally higher C-index for three-year survival (0.77 vs 0.76 and 0.82 vs 0.79) [66, 78], while another study, which utilized AUROC as a metric, found that the performance of ML models was negligibly higher for five-year vs three-year survival (0.63 vs. 0.61) [88]. These observed discrepancies may be related to the evolving nature of the diseases under study and the application of static variables at baseline in the models. While certain variables, such as patient socioeconomic status and health behaviors, may alter over time, the slight change in model performance suggests that baseline variables maintain consistent predictive value for extended follow-up periods compared to shorter ones.

In the analysis of disease progression outcomes among the 11 studies reviewed, none of the studies provided data on time-to-event metrics. However, according to binary classification metrics, ML models demonstrated superior performance compared to traditional linear regression models. The oral cavity was the only site with enough articles to synthesize the results. For ML models, the AUROC ranged from 0.67 to 0.88, while for linear models such as logistic regression models, the AUROC ranged from 0.68 to 0.73 for the same studies [64, 79, 84]. In the context of disease progression, in contrast to the findings on survival outcomes, there was no consistent pattern favoring a particular model, such as tree-based models. Given the limited number of studies and the scarcity of research on certain outcomes, further investigation is essential before drawing definitive conclusions.

The analyzed studies encountered several limitations, notably in the lack of comprehensive details on the treatments provided, such as surgical methods and dose-volume histogram information, which can impact both the development and performance of models. Additionally, while the majority of the research focused on survival outcomes, particularly overall survival, less attention was given to disease-specific outcomes and aspects of disease progression such as recurrences and metastasis, which significantly influence patient prognosis. Despite the emphasis on adhering to Transparent Reporting of a multivariable prediction model for Individual Prognosis (TRIPOD) reporting guidelines [92], only 11 out of 34 studies conducted time-to-event analyses and took into account the censored data. Moreover, the evaluation methods for the models predominantly focused on discrimination metrics, with only six studies including calibration to assess the alignment between predicted probabilities and actual outcomes, an essential component in model validation. Given the identified limitations, future research should focus on establishing more robust and transparent methodologies for ML modeling in HNC prognosis. It is imperative to conduct external validations to establish the generalizability of the models, which were notably scarce in the reviewed studies, with only four out of 34 studies undertaking this crucial step. Emphasizing the use of multicenter databases is also recommended to mitigate potential regional and demographic biases. Additionally, while ML models have shown promising results in HNC prognosis, there is a significant need for clinicians to be thoroughly educated on the nuances of each model, including their strengths, limitations, and biases. Such knowledge is crucial for clinicians to effectively leverage these models in clinical settings, promoting their broader adoption and integration into clinical practice.

5. Conclusion

ML models exhibit considerable potential in predicting post-treatment survival and progression in HNC patients. ML models consistently outperformed traditional linear models, such as logistic regression and Cox PH, as well as the nomograms derived from these models. Among ML models, DeepSurv followed by tree-based models demonstrated the highest performance. Regarding survival outcomes, models focusing on disease-specific outcomes achieved higher performance compared to those targeting overall survival, while there was no meaningful difference between follow-up durations of three and five years. There were fewer models for disease progression outcomes, with only one conducting time-to-event analyses. The studies generally lacked detailed incorporation of treatment specifics into their models, which could potentially improve model performance. Future research should integrate more comprehensive treatment data, place a greater emphasis on disease progression outcomes, and establish model generalizability through external validations and the utilization of multicenter datasets.

Supporting information

References

  1. 1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2021;71:209–49. pmid:33538338
  2. 2. Gormley M, Creaney G, Schache A, Ingarfield K, Conway DI. Reviewing the epidemiology of head and neck cancer: definitions, trends and risk factors. Br Dent J 2022;233:780–6. pmid:36369568
  3. 3. Lydiatt WM, Patel SG, O’Sullivan B, Brandwein MS, Ridge JA, Migliacci JC, et al. Head and neck cancers—major changes in the American Joint Committee on cancer eighth edition cancer staging manual. CA Cancer J Clin 2017;67:122–37. pmid:28128848
  4. 4. Shingaki S, Takada M, Sasai K, Bibi R, Kobayashi T, Nomura T, et al. Impact of lymph node metastasis on the pattern of failure and survival in oral carcinomas. AM J Surg 2003;185:278–84. pmid:12620571
  5. 5. Guo K, Xiao W, Chen X, Zhao Z, Lin Y, Chen G. Epidemiological trends of head and neck cancer: a population-based study. Biomed Res Int 2021;2021:1–14.
  6. 6. Jung K, Narwal M, Min SY, Keam B, Kang H. Squamous cell carcinoma of head and neck: what internists should know. Korean J Intern Med 2020;35:1031. pmid:32663913
  7. 7. Budach V, Tinhofer I. Novel prognostic clinical factors and biomarkers for outcome prediction in head and neck cancer: a systematic review. Lancet Oncol 2019;20:e313–26. pmid:31162105
  8. 8. Resteghini C, Trama A, Borgonovi E, Hosni H, Corrao G, Orlandi E, et al. Big data in head and neck cancer. Curr Treat Options Oncol 2018;19:1–15. pmid:30361937
  9. 9. Alabi RO, Youssef O, Pirinen M, Elmusrati M, Mäkitie AA, Leivo I, et al. Machine learning in oral squamous cell carcinoma: Current status, clinical concerns and prospects for future—A systematic review. Artif Intell Med 2021;115:102060. pmid:34001326
  10. 10. Adeoye J, Tan JY, Choi S-W, Thomson P. Prediction models applying machine learning to oral cavity cancer outcomes: A systematic review. Int J Med Inform 2021;154:104557. pmid:34455119
  11. 11. IBM Cloud Education. Structured vs. unstructured data: What’s the difference? IBM 2021. https://www.ibm.com/think/topics/structured-vs-unstructured-data (accessed June 6, 2024).
  12. 12. Amazon Web Services. What’s the difference between structured data and unstructured data? Amazon Web Services, Inc 2024. https://aws.amazon.com/compare/the-difference-between-structured-data-and-unstructured-data/#:~:text=Structured%20data%20is%20data%20that,files%20and%20large%20text%20documents. (accessed June 13, 2024).
  13. 13. Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020;20:1–11.
  14. 14. Azad P, Navimipour NJ, Rahmani AM, Sharifi A. The role of structured and unstructured data managing mechanisms in the Internet of things. Cluster Comput 2020;23:1185–98.
  15. 15. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. International Journal of Surgery 2010;8:336–41. pmid:20171303
  16. 16. Miller SA, Forrest JL. Enhancing your practice through evidence-based decision making: PICO, learning how to ask good questions. Journal of Evidence Based Dental Practice 2001;1:136–41.
  17. 17. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019;170:51–8. pmid:30596875
  18. 18. Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med 2019;170:W1–33. pmid:30596876
  19. 19. Sharma N, Om H. Using MLP and SVM for predicting survival rate of oral cancer patients. Network Modeling Analysis in Health Informatics and Bioinformatics 2014;3:1–10.
  20. 20. Tan MS, Tan JW, Chang S-W, Yap HJ, A Kareem S, Zain RB. A genetic programming approach to oral cancer prognosis. PeerJ 2016;4:e2482. pmid:27688975
  21. 21. Tseng Y-J, Wang H-Y, Lin T-W, Lu J-J, Hsieh C-H, Liao C-T. Development of a machine learning model for survival risk stratification of patients with advanced oral cancer. JAMA Netw Open 2020;3:e2011768–e2011768. pmid:32821921
  22. 22. Wang X, Yang J, Wei C, Zhou G, Wu L, Gao Q, et al. A personalized computational model predicts cancer risk level of oral potentially malignant disorders and its web application for promotion of non‐invasive screening. Journal of Oral Pathology & Medicine 2020;49:417–26. pmid:31823403
  23. 23. Wu X, Yao Y, Dai Y, Diao P, Zhang Y, Zhang P, et al. Identification of diagnostic and prognostic signatures derived from preoperative blood parameters for oral squamous cell carcinoma. Ann Transl Med 2021;9. pmid:34532357
  24. 24. Esce AR, Baca AL, Redemann JP, Rebbe RW, Schultz F, Agarwal S, et al. Predicting nodal metastases in squamous cell carcinoma of the oral tongue using artificial intelligence. Am J Otolaryngol 2024;45:104102. pmid:37948827
  25. 25. Campisi G, Calvino F, Carinci F, Matranga D, Carella M, Mazzotta M, et al. Peri-tumoral inflammatory cell infiltration in OSCC: A reliable marker of local recurrence and prognosis? An investigation using artificial neural networks. Int J Immunopathol Pharmacol 2011;24:113–20. pmid:21781456
  26. 26. Li L, Pu C, Jin N, Zhu L, Hu Y, Cascone P, et al. Prediction of 5-year overall survival of tongue cancer based machine learning. BMC Oral Health 2023;23:567. pmid:37574562
  27. 27. Siddalingappa R, Kanagaraj S. K-nearest-neighbor algorithm to predict the survival time and classification of various stages of oral cancer: a machine learning approach. F1000Res 2022;11. pmid:38046542
  28. 28. Mermod M, Jourdan E, Gupta R, Bongiovanni M, Tolstonog G, Simon C, et al. Development and validation of a multivariable prediction model for the identification of occult lymph node metastasis in oral squamous cell carcinoma. Head Neck 2020;42:1811–20. pmid:32057148
  29. 29. González‐García I, Pierre V, Dubois VFS, Morsli N, Spencer S, Baverel PG, et al. Early predictions of response and survival from a tumor dynamics model in patients with recurrent, metastatic head and neck squamous cell carcinoma treated with immunotherapy. CPT Pharmacometrics Syst Pharmacol 2021;10:230–40. pmid:33465293
  30. 30. Wang D, Guo R, Luo N, Ren X, Asarkar AA, Jia H, et al. Development and validation of a model to predict the risk of recurrence in patients with laryngeal squamous cell carcinoma after total laryngectomy. Ann Transl Med 2022;10.
  31. 31. Tewari P, Kashdan E, Walsh C, Martin CM, Parnell AC, O’Leary JJ. Estimating the conditional probability of developing human papilloma virus related oropharyngeal cancer by combining machine learning and inverse Bayesian modelling. PLoS Comput Biol 2021;17:e1009289. pmid:34415913
  32. 32. Sharma N, Om H. Hybrid framework using data mining techniques for early detection and prevention of oral cancer. International Journal of Advanced Intelligence Paradigms 2017;9:604–22.
  33. 33. Liu Y, Li Y, Fu Y, Liu T, Liu X, Zhang X, et al. Quantitative prediction of oral cancer risk in patients with oral leukoplakia. Oncotarget 2017;8:46057. pmid:28545021
  34. 34. Alhazmi A, Alhazmi Y, Makrami A, Masmali A, Salawi N, Masmali K, et al. Application of artificial intelligence and machine learning for prediction of oral cancer risk. Journal of Oral Pathology & Medicine 2021;50:444–50. pmid:33394536
  35. 35. Adeoye J, Koohi-Moghadam M, Lo AWI, Tsang RK-Y, Chow VLY, Zheng L-W, et al. Deep learning predicts the malignant-transformation-free survival of oral potentially malignant disorders. Cancers (Basel) 2021;13:6054. pmid:34885164
  36. 36. Bur AM, Holcomb A, Goodwin S, Woodroof J, Karadaghy O, Shnayder Y, et al. Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma. Oral Oncol 2019;92:20–5. pmid:31010618
  37. 37. Kwak MS, Eun Y, Lee J, Lee YC. Development of a machine learning model for the prediction of nodal metastasis in early T classification oral squamous cell carcinoma: SEER‐based population study. Head Neck 2021;43:2316–24. pmid:33792112
  38. 38. Farrokhian N, Holcomb AJ, Dimon E, Karadaghy O, Ward C, Whiteford E, et al. Development and validation of machine learning models for predicting occult nodal metastasis in early-stage oral cavity squamous cell carcinoma. JAMA Netw Open 2022;5:e227226–e227226. pmid:35416990
  39. 39. Feng M, Zhang J, Zhou X, Mo H, Jia L, Zhang C, et al. Application of an interpretable machine learning model to Predict Lymph Node Metastasis in patients with laryngeal carcinoma. J Oncol 2022;2022. pmid:36411795
  40. 40. Hatten K, Amin J, and I A-O, 2020 undefined. Machine learning prediction of extracapsular extension in human papillomavirus–associated oropharyngeal squamous cell carcinoma. Otolaryngology–Head and Neck Surgery 2020;2020:992–9. pmid:32600154
  41. 41. Dohopolski M, Wang K, Morgan H, Sher D, Wang J. Use of deep learning to predict the need for aggressive nutritional supplementation during head and neck radiotherapy. Radiotherapy and Oncology 2022;171:129–38. pmid:35461951
  42. 42. Howard FM, Kochanny S, Koshy M, Spiotto M, Pearson AT. Machine learning–guided adjuvant treatment of head and neck cancer. JAMA Netw Open 2020;3:e2025881–e2025881. pmid:33211108
  43. 43. Mascarella MA, Muthukrishnan N, Maleki F, Kergoat M- J, Richardson K, Mlynarek A, et al. Above and beyond age: prediction of major postoperative adverse events in head and neck surgery. Annals of Otology, Rhinology & Laryngology 2022;131:697–703. pmid:34416844
  44. 44. Reeves S, Tarmohamed O, Babbra A, Tighe D. Validation of a post operative complication risk prediction algorithm in a non-head and neck squamous cell carcinoma cohort. British Journal of Oral and Maxillofacial Surgery 2022;60:904–9. pmid:35346521
  45. 45. Dean JA, Welsh LC, Wong KH, Aleksic A, Dunne E, Islam MR, et al. Normal tissue complication probability (NTCP) modelling of severe acute mucositis using a novel oral mucosal surface organ at risk. Clin Oncol 2017;29:263–73. pmid:28057404
  46. 46. Shen Y, Qi Y, Wang C, Wu C, Zhan X. Predicting specific mortality from laryngeal cancer based on competing risk model: a retrospective analysis based on the SEER database. Ann Transl Med 2023;11.
  47. 47. Dom RM, Abidin B, Kareem SA, Ismail SM, Daud NM. Determining the critical success factors of oral cancer susceptibility prediction in Malaysia using fuzzy models. Sains Malays 2012;41:633–40.
  48. 48. Feng X, Hong T, Liu W, Xu C, Li W, Yang B, et al. Development and validation of a machine learning model to predict the risk of lymph node metastasis in renal carcinoma. Front Endocrinol (Lausanne) 2022;13:1054358. pmid:36465636
  49. 49. Alabi RO, Sjöblom A, Carpén T, Elmusrati M, Leivo I, Almangush A, et al. Application of artificial intelligence for overall survival risk stratification in oropharyngeal carcinoma: A validation of ProgTOOL. Int J Med Inform 2023;175:105064. pmid:37094545
  50. 50. Satheeshkumar PS, El-Dallal M, Mohan MP. Feature selection and predicting chemotherapy-induced ulcerative mucositis using machine learning methods. Int J Med Inform 2021;154:104563. pmid:34479094
  51. 51. Huang Z, Chen Z, Li Y, Lin T, Cai S, Wu W, et al. Machine learning-based survival prediction nomogram for postoperative parotid mucoepidermoid carcinoma. Sci Rep 2024;14:7686. pmid:38561379
  52. 52. Chen Y, Li G, Jiang W, Nie RC, Deng H, Chen Y, et al. Prognostic risk factor of major salivary gland carcinomas and survival prediction model based on random survival forests. Cancer Med 2023;12:10899–907. pmid:36934429
  53. 53. Zhang X, Liu G, Peng X. A random forest model for post-treatment survival prediction in patients with non-squamous cell carcinoma of the head and neck. J Clin Med 2023;12:5015. pmid:37568416
  54. 54. Oei RW, Lyu Y, Ye L, Kong F, Du C, Zhai R, et al. Progression-free survival prediction in patients with nasopharyngeal carcinoma after intensity-modulated radiotherapy: machine learning vs. traditional statistics. J Pers Med 2021;11:787. pmid:34442430
  55. 55. Tseng WT, Chiang WF, Liu SY, Roan J, Lin CN. The application of data mining techniques to oral cancer prognosis. J Med Syst 2015;39:59. pmid:25796587
  56. 56. Sharma N, Om H. Usage of probabilistic and general regression neural network for early detection and prevention of oral cancer. The Scientific World Journal 2015;2015:234191. pmid:26171415
  57. 57. Cheng C-S, Shueng P-W, Chang C-C, Kuo C-W. Adapting an evidence-based diagnostic model for predicting recurrence risk factors of oral cancer. Journal of Universal Computer Science 2018;24:742–52.
  58. 58. Karadaghy O, Shew M, New J, Bur A. Development and assessment of a machine learning model to help predict survival among patients with oral squamous cell carcinoma. JAMA Otolaryngol Head Neck Surg 2019;145:1115–20. pmid:31045212
  59. 59. Alabi RO, Elmusrati M, Sawazaki-Calone I, Kowalski LP, Haglund C, Coletta RD, et al. Machine learning application for prediction of locoregional recurrences in early oral tongue cancer: a Web-based prognostic tool. Virchows Arch 2019;475:489–97. pmid:31422502
  60. 60. Kim DW, Lee S, Kwon S, Nam W, Cha IH, Kim HJ. Deep learning-based survival prediction of oral cancer patients. Sci Rep 2019;9:6994. pmid:31061433
  61. 61. Hung M, Park J, Hon ES, Bounsanga J, Moazzami S, Ruiz-Negrón B, et al. Artificial intelligence in dentistry: Harnessing big data to predict oral cancer survival. World J Clin Oncol 2020;11:918–34. pmid:33312886
  62. 62. Alabi RO, Elmusrati M, Sawazaki‐Calone I, Kowalski LP, Haglund C, Coletta RD, et al. Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer. Int J Med Inform 2020;136:104068. pmid:31923822
  63. 63. Alkhadar H, Macluskey M, White S, Ellis I, Gardner A. Comparison of machine learning algorithms for the prediction of five-year survival in oral squamous cell carcinoma. J Oral Pathol Med 2021;50:378–84. pmid:33220109
  64. 64. Chu CS, Lee NP, Adeoye J, Thomson P, Choi SW. Machine learning and treatment outcome prediction for oral cancer. J Oral Pathol Med 2020;49:977–85. pmid:32740951
  65. 65. Shan J, Jiang R, Chen X, Zhong Y, Zhang W, Xie L, et al. Machine learning predicts lymph node metastasis in early-stage oral tongue squamous cell carcinoma. J Oral Maxillofac Surg 2020;78:2208–18. pmid:32649894
  66. 66. Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the tree-based machine learning algorithms to Cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on SEER database. Cancers (Basel) 2020;12:1–16. pmid:33003533
  67. 67. Nogay H. Prediction of post-treatment survival expectancy in head & neck cancers by machine learning methods. The Journal of Cognitive Systems 2020;5:5–9.
  68. 68. Yu H, Ma SJ, Farrugia M, Iovoli AJ, Wooten KE, Gupta V, et al. Machine learning incorporating host factors for predicting survival in head and neck squamous cell carcinoma patients. Cancers (Basel) 2021;13:4559. pmid:34572786
  69. 69. Bourdillon AT, Shah HP, Cohen O, Hajek MA, Mehra S. Novel machine learning model to predict interval of oral cancer recurrence for surveillance stratification. Laryngoscope 2022;133:1652–7. pmid:36054545
  70. 70. Gangil T, Shahabuddin AB, Dinesh Rao B, Palanisamy K, Chakrabarti B, Sharan K. Predicting clinical outcomes of radiotherapy for head and neck squamous cell carcinoma patients using machine learning algorithms. J Big Data 2022;9:25.
  71. 71. Adeoye J, Hui L, Koohi-Moghadam M, Tan JY, Choi SW, Thomson P. Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis. Int J Med Inform 2022;157:104635. pmid:34800847
  72. 72. Peng J, Lu Y, Chen L, Qiu K, Chen F, Liu J, et al. The prognostic value of machine learning techniques versus cox regression model for head and neck cancer. Methods 2022;205:123–32. pmid:35798257
  73. 73. Kim S Il, Kang JW, Eun YG, Lee YC. Prediction of survival in oropharyngeal squamous cell carcinoma using machine learning algorithms: A study based on the surveillance, epidemiology, and end results database. Front Oncol 2022;12. pmid:36072804
  74. 74. Alabi R, Almangush A, Elmusrati M, Leivo I, Mäkitie AA. An interpretable machine learning prognostic system for risk stratification in oropharyngeal cancer. Int J Med Inform 2022;168. pmid:36279655
  75. 75. Kotevski DP, Smee RI, Vajdic CM, Field M. Machine learning and nomogram prognostic modeling for 2-year head and neck cancer-specific survival using electronic health record data: a multisite study. JCO Clin Cancer Inform 2023;7. pmid:36596211
  76. 76. Kotevski DP, Smee RI, Vajdic CM, Field M. Empirical comparison of routinely collected electronic health record data for head and neck cancer-specific survival in machine-learnt prognostic models. Head Neck 2023;45:365–79. pmid:36369773
  77. 77. Xiao Z, Song Q, Wei Y, Fu Y, Huang D, Huang C. Use of survival support vector machine combined with random survival forest to predict the survival of nasopharyngeal carcinoma patients. Transl Cancer Res 2023;12:3581. pmid:38192980
  78. 78. Sun H, Wu S, Li S, Jiang X. Which model is better in predicting the survival of laryngeal squamous cell carcinoma?: Comparison of the random survival forest based on machine learning algorithms to Cox regression: analyses based on SEER database. Medicine 2023;102:e33144. pmid:36897699
  79. 79. Cai Y, Xie Y, Zhang S, Wang Y, Wang Y, Chen J, et al. Prediction of postoperative recurrence of oral cancer by artificial intelligence model: Multilayer perceptron. Head Neck 2023;45:3053–66. pmid:37789719
  80. 80. Choi N, Kim J, Yi H, Kim H, Kim TH, Chung MJ, et al. The use of artificial intelligence models to predict survival in patients with laryngeal squamous cell carcinoma. Sci Rep 2023;13:9734. pmid:37322055
  81. 81. Li Z, Li T, Zhang P, Wang X. A practical online prediction platform to predict the survival status of laryngeal squamous cell carcinoma after 5 years. Am J Otolaryngol 2024;45:104209.
  82. 82. Li Z, Ding S, Zhong Q, Fang J, Huang J, Huang Z, et al. A machine learning model for predicting the three-year survival status of patients with hypopharyngeal squamous cell carcinoma using multiple parameters. J Laryngol Otol 2023;137:1041–7. pmid:36682376
  83. 83. Zhang Y-F, Shen Y-J, Huang Q, Wu C-P, Zhou L, Ren H-L. Predicting survival of advanced laryngeal squamous cell carcinoma: comparison of machine learning models and Cox regression models. Sci Rep 2023;13:18498. pmid:37898687
  84. 84. Fatapour Y, Abiri A, Kuan EC, Brody JP. Development of a machine learning model to predict recurrence of oral tongue squamous cell carcinoma. Cancers (Basel) 2023;15:2769. pmid:37345106
  85. 85. Alabi RO, Almangush A, Elmusrati M, Leivo I, Mäkitie AA. Interpretable machine learning model for prediction of overall survival in laryngeal cancer. Acta Otolaryngol 2024:1–7. pmid:38279817
  86. 86. Liao F, Wang W, Wang J. A deep learning-based model predicts survival for patients with laryngeal squamous cell carcinoma: a large population-based study. Eur Arch Otorhinolaryngol 2023;280:789–95. pmid:36030468
  87. 87. Alabi RO, Elmusrati M, Leivo I, Almangush A, Mäkitie AA. Advanced-stage tongue squamous cell carcinoma: a machine learning model for risk stratification and treatment planning. Acta Otolaryngol 2023;143. pmid:36794334
  88. 88. Tan JiY, Adeoye J, Thomson P, Sharma D, Ramamurthy P, Choi S-W. Predicting Overall Survival Using Machine Learning Algorithms in Oral Cavity Squamous Cell Carcinoma. Anticancer Res 2022;42:5859–66. pmid:36456152
  89. 89. Chow LQM. Head and neck cancer. New England Journal of Medicine 2020;382:60–72. pmid:31893516
  90. 90. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 2018;18:1–12.
  91. 91. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS. Random survival forests. Ann Appl Stat 2008;2:841–68.
  92. 92. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015;162:55–63. pmid:25560714