Figures
Abstract
Background
Triage in emergency departments (ED) is a critical process for prioritizing care and ensuring clinical safety. However, current triage systems often exhibit vulnerabilities that compromise the efficiency and quality of healthcare delivery. Artificial Intelligence (AI) has emerged as a promising innovation to support decision-making and optimize patient flow in these high-pressure environments.
Objective
To map the available evidence regarding the implementation and performance of artificial intelligence in emergency department triage.
Method
This scoping review followed the Joanna Briggs Institute (JBI) methodology and the PRISMA-ScR guidelines. A comprehensive search was conducted across 13 databases (CINAHL, Cochrane Library, PubMed Central, SciELO, Web of Science, SCOPUS, Science Direct, VHL, Embase, and several regional dissertation repositories), with no language or time restrictions. Two independent reviewers performed the selection process using the Rayyan platform, with discrepancies resolved by a third evaluator. Data were synthesized using the PAGER framework, categorizing findings into Patterns, Advances, Gaps, Evidence for practice, and Recommendations for research.
Results
Nineteen studies met the inclusion criteria. AI was primarily implemented through Machine Learning (ML) algorithms, including Deep Learning architectures. Natural Language Processing (NLP) was frequently employed to process unstructured clinical data, with recent studies exploring the potential of Large Language Models (LLMs). Overall, ML-based models consistently outperformed traditional triage systems in predictive accuracy. These techniques were mainly utilized for automated classification, predicting clinical severity, and enhancing patient prioritization by integrating both objective and subjective assessment data.
Conclusions
The findings indicate that AI has significant potential to enhance emergency triage by streamlining service flows and providing robust clinical decision support. However, the current evidence remains heterogeneous and largely exploratory. Key challenges include variability in model performance, a lack of external validation, and studies often limited to specific populations. Consequently, many current tools still lack the necessary reliability for safe, large-scale clinical implementation.
Citation: Souza LL, de Oliveira YCN, Campos da Costa LC, Silva Filho JAAd, de Souza ATF, Mourão VG, et al. (2026) Artificial Intelligence in emergency department triage: A scoping review. PLoS One 21(6): e0352338. https://doi.org/10.1371/journal.pone.0352338
Editor: André Luis C. Ramalho, University of Porto Faculty of Medicine: Universidade do Porto Faculdade de Medicina, PORTUGAL
Received: January 28, 2026; Accepted: June 9, 2026; Published: June 25, 2026
Copyright: © 2026 Souza et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: This work was supported by the Coordination for the Improvement of Higher Education Personnel (CAPES), Brazil [Protocol number: 88887.134054/2025-00]. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Hospital Emergency Services serve as critical entry points into healthcare systems. These departments frequently face high demand leading to overcrowding, which can significantly compromise clinical outcomes [1]. Globally, “triage” refers to the systematic prioritization of patients based on clinical severity to organize patient flow and allocate resources efficiently. In many healthcare settings, this process is also termed “risk classification”; both terms describe the same structured clinical assessment used to determine urgency. In this study, “triage” and “risk classification” are used interchangeably. This process remains a cornerstone of emergency care, essential for prioritizing critical patients, enhancing safety, and optimizing service delivery [2,3].
However, the effectiveness of triage is often hindered by human factors. Variability in professional experience, clinical competencies, and individual characteristics can lead to process failures, such as incorrect patient categorization, inaccurate documentation, and diagnostic errors—all of which directly impact patient safety [4]. Among established protocols, the Manchester Triage System (MTS) is widely adopted, categorizing patients into urgency levels ranging from emergent to non-urgent [2,5]. Nevertheless, the isolated implementation of the MTS does not guarantee optimal care; it requires continuous monitoring and managerial refinement [2]. Consequently, nurses must continuously develop advanced clinical assessment skills, rapid decision-making abilities, and active listening to ensure swift risk identification and appropriate care pathways [1,6].
To address challenges in accuracy and agility, various technologies have been integrated into emergency services to streamline initial care. Among these, Artificial Intelligence (AI) has emerged as a transformative tool, capable of processing vast clinical datasets, recognizing complex patterns, and providing real-time decision support by simulating human perception and analytical reasoning [7].
Machine Learning (ML), a specialized subfield of AI, offers significant potential for the early detection of clinical conditions and outcomes within the emergency department. Furthermore, ML facilitates safer and more efficient referrals through the sophisticated analysis of electronic health records (EHR) [8,9]. Research indicates that triage precision increases when nurses utilize technological clinical decision support systems (CDSS), fostering greater safety and reducing adverse events [10,11]. Thus, adopting ML as a complementary resource to traditional risk assessment enhances service quality by providing evidence-based, rapid insights [7,8].
The development of ML models typically follows a structured pipeline: data acquisition, pre-processing, model training (where algorithms learn patterns from historical data), and validation (testing the model on unseen data to assess predictive performance) [8]. Within clinical contexts, two concepts are paramount: algorithmic bias, which occurs when training data reflects historical inequalities or poor data quality, leading to unfair or erroneous predictions [8]; and interpretability, the ability of healthcare professionals to understand the logic behind a model’s output [7,8]. Ultimately, the successful integration of these tools into clinical practice depends on their capacity to function as effective decision support systems that improve accuracy without compromising patient safety [10,11].
Given this context, it is essential to analyze how AI has been integrated into triage and risk classification processes. Synthesizing the available evidence and identifying knowledge gaps is crucial to guiding future research and enhancing clinical practice. Therefore, this study aims to map the scientific evidence regarding the application of AI in triage and risk classification within Emergency Services.
2. Method
2.1. Study design
This scoping review aims to map the scientific evidence regarding the application of AI in triage and risk classification within emergency services. The study was conducted following the methodological framework proposed by the Joanna Briggs Institute (JBI) [12] and adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) checklist [13] (see S1 File). The alignment with these frameworks ensures transparency, reproducibility, and methodological rigor across all research stages [14].
The PCC strategy (Population, Concept, and Context) was employed to formulate the research question: Population (P): Patients undergoing risk classification/triage; Concept (C): Artificial Intelligence; and Context (C): Hospital Emergency Services. Thus, the research question was: How has AI been utilized in risk classification and triage within emergency services?
To ensure originality and prevent duplication, a preliminary search for registered reviews was conducted in the International Prospective Register of Systematic Reviews (PROSPERO), Open Science Framework (OSF), The Cochrane Library, and the Database of Abstracts of Reviews of Effects (DARE). No reviews with an identical thematic focus were identified. Consequently, this study protocol was registered on the Open Science Framework (osf.io/z2hu7; https://doi.org/10.17605/OSF.IO/Z2HU7).
2.2. Selection process and search strategy
The literature search was performed across 13 databases: CINAHL, Cochrane Library, PubMed Central, SciELO, Web of Science, SCOPUS, Science Direct, the Virtual Health Library (VHL), and Embase. Grey literature sources included the CAPES Thesis and Dissertation Catalog, the Brazilian Digital Library of Theses and Dissertations (BDTD), the Scientific Open Access Repository of Portugal (RCAAP), and Theses Canada. Data collection was finalized in September 2025. Access to these sources was facilitated via the CAPES Journal Portal through the Federated Academic Community (CAFe) platform.
Search terms were identified using Health Sciences Descriptors (DeCS) and Medical Subject Headings (MeSH), with adaptations for English and Portuguese. The primary descriptors included: “Risk Classification”, “Risk Assessment”, “Artificial Intelligence”, and “Emergency Service, Hospital”.
The selection of these descriptors was a strategic decision to enhance search specificity. During the pre-analytical phase, sensitivity tests using broader terms—such as “Triage” or “Emergency Department”—yielded an excessive volume of non-relevant results (low precision), primarily related to manual protocols. To maintain methodological rigor, more targeted descriptors (“Risk Classification” and “Risk Assessment”) were combined with “Artificial Intelligence”. This strategy ensured the identification of studies strictly aligned with AI-driven structured risk classification.
Boolean operators “AND” and “OR” were used to combine descriptors, adapted to the syntax requirements of each database without temporal or language restrictions. The core search string was: (“Risk Classification” OR “Risk Assessment”) AND “Artificial Intelligence” AND “Emergency Service, Hospital”. Detailed search syntaxes for each database are provided in Table 1.
2.3. Eligibility criteria
Studies were included if they addressed the application of AI in triage or risk classification within emergency services. The inclusion criteria encompassed original research articles, dissertations, theses, ministerial ordinances, and clinical guidelines. To be eligible, studies had to be available in open access or in full text through the café platform. Conversely, studies were excluded if they did not directly answer the research question, or if they were abstracts, reviews, experience reports, letters to the editor, or book chapters
2.4. Selection and data extraction
The literature search, screening, and selection processes were conducted independently and concurrently by two reviewers using separate electronic devices, following a structured four-stage workflow. Initially, duplicate studies were identified and removed using the Rayyan platform, followed by a preliminary screening of titles and abstracts against the established eligibility criteria. Any discrepancies or conflicts during these phases were resolved through consultation with a third reviewer to finalize the selection. In the final stage, the remaining studies underwent a comprehensive full-text review to determine their definitive inclusion in the final sample.
For data extraction, a standardized Microsoft Word table was used to record the following variables: author, year of publication, country of origin, objectives, methodology, sample size/characteristics, keywords, and primary findings. Following the primary selection, a reverse search (reference chaining) was performed. The bibliographies of all included studies were manually screened to identify additional relevant research that met the inclusion criteria.
2.5. Synthesis of results and framework
The results are presented through flowcharts and tables, highlighting variables directly aligned with the research objective. To ensure a structured and comprehensive analysis, data synthesis was guided by the PAGER framework (Patterns, Advances, Gaps, Evidence for practice, and Research recommendations) [15].
3. Results
3.1. Study selection
The initial search yielded 1,275 records across all data sources. After removing duplicates, 1,187 titles and abstracts were screened, of which 1,151 did not meet the eligibility criteria. Consequently, 36 full-text articles were assessed for eligibility. Of these, 32 were excluded for not directly addressing the research question. To ensure a comprehensive mapping, a reverse search (reference chaining) was performed on the remaining studies, identifying 15 additional relevant articles. The final sample comprised 19 studies, as detailed in the PRISMA-ScR flowchart (Fig 1).
¹ Flowchart adapted from PRISMA-ScR.
3.2. Characteristics of the included studies
The included studies were published between 2013 and 2025, with a notable increase in recent years. The highest concentration occurred in 2021 (n = 4; 21%) and 2023 (n = 3; 16%), followed by a steady output in 2019, 2022, 2024, and 2025 (n = 2 each; 11%). Earlier contributions were sporadic, with single publications (5% each) in 2013, 2016, 2018, and 2020.
Geographically, research was predominantly conducted in Asia (n = 9; 47%) and Europe (n = 7; 37%), followed by North America (n = 2; 11%) and Latin America (n = 1; 5%). Regarding study design, observational frameworks prevailed: cross-sectional (n = 7; 36.8%) and retrospective (n = 7; 36.8%) designs were most frequent. The remaining sample included systematic reviews (n = 3; 15.8%), predictive modeling studies (n = 1; 5.3%), and narrative reviews (n = 1; 5.3%).
The professional profile of the authors was multidisciplinary, although predominantly medical (65.2%), with a focus on emergency medicine. Nursing professionals accounted for 6.5% of the authorship. Significant contributions also came from engineering fields, including data science (6.5%), computer science (4.3%), and electrical, electronic, and industrial engineering (combined 5.4%). Other collaborators included specialists in public health (8.1%), epidemiology, statistics, biomedicine, and informatics (1% each).
3.3. AI approaches and architectures
All identified AI applications fell within the broad category of machine learning (ML), as well as its subfield, deep learning (DL). This paradigm enables machines to be trained to process information, make decisions, and solve problems based on data that forms patterns. Traditional ML algorithms—such as Logistic Regression and Gradient Boosting—were frequently used. In addition, Federated Learning stood out, characterized by the training of models using decentralized data. Several studies also implemented more complex DL architectures, particularly Artificial Neural Networks (ANNs) and Deep Neural Networks (DNNs), which were sometimes integrated with neuro-fuzzy systems.
In terms of application domains, Natural Language Processing (NLP) was a recurring theme for processing unstructured textual and speech data. Within this domain, recent studies (n = 2; 10%) evaluated the performance of Large Language Models (LLMs), specifically ChatGPT. Only one study (5%) described an AI-based clinical decision support system without explicitly specifying the underlying algorithm. Detailed characteristics of the included studies are presented in Table 2.
The keywords identified across the studies displayed significant diversity. The term “Triage” was present in all 19 (100%) articles, often associated with “Risk Classification”. “Machine Learning” appeared in 12 (60%) studies, followed by “Emergency Department” (and its variants) in 11 (55%), and “Artificial Intelligence” in 6 (30%). These core concepts strongly align with the descriptors selected for this scoping review.
Regarding the samples, the inclusion methodologies varied significantly. The systematic reviews (n = 3; 15.8%) sampled existing literature based on specific eligibility criteria. Primary studies, including observational (n = 3; 15.8%), cross-sectional (n = 3; 15.8%), and retrospective (n = 1; 5.3%) designs, utilized patient-level data. Other studies (n = 3; 15.8%) focused on large-scale database analysis. Notably, two studies did not specify the exact sample size, delimiting only the clinical sector for data collection. Sample size concepts were considered not applicable to the narrative review and the preliminary/comparative observational studies.
3.4. Key findings and model performance
The included research primarily focused on applying ML to enhance the risk classification process. Eight of the 19 studies conducted comparative evaluations against established protocols, with the KTAS and the MTS scales serving as benchmarks in four studies.
Overall, ML-based models consistently outperformed traditional triage methods in terms of predictive accuracy and the identification of high-acuity patients. For instance, Gao et al. reported AUC levels exceeding 0.90 across all severity tiers [22]. Key physiological variables explored included oxygen saturation, systolic/diastolic blood pressure, heart rate, respiratory rate, and biomarkers such as troponin and lactate.
Furthermore, integrating textual data via NLP refined predictive performance in the research by Choi et al. and Kim et al. [19,25]. In contrast, studies evaluating ChatGPT revealed low-to-moderate agreement with traditional systems, suggesting that while LLMs may assist in identifying critical cases, they currently lack the reliability required for independent risk classification [19,20].
To provide a deeper technical overview, Table 3 details the specific algorithms, category, sample sizes, comparators used, external validation and performance metrics. Notably, to address the imbalanced nature of emergency department data, several studies utilized robust performance metrics beyond global accuracy, including Sensitivity, Specificity, Precision, and the Area Under the Receiver Operating Characteristic Curve (AUC).
3.5. Synthesis of evidence (PAGER framework)
The evidence was synthesized using the PAGER framework, categorizing findings into Patterns, Advances, Gaps, Evidence for Practice, and Research Recommendations (Table 4). The analysis indicates that AI is a robust decision-support tool when used as a complementary resource, significantly improving the precision of traditional triage systems.
4 Discussion
4.1. Application contexts and AI techniques
The included studies demonstrate a high degree of methodological and contextual diversity, yet they converge on the common objective of evaluating the efficacy of AI models in triage and risk classification. The highest concentration of research was identified in Asia, followed by Europe and North America, with a more limited presence in Latin America. This geographical distribution likely reflects the substantial investments in healthcare technology and the extensive availability of large-scale clinical datasets in high-income regions [35].
Regarding data provenance, the studies leveraged a wide array of sources, ranging from historical literature identified in systematic reviews to real-world patient records extracted from electronic triage systems and emergency department encounters. The use of actual clinical data, as opposed to purely simulated scenarios, enhances the ecological validity of the findings and provides a more robust basis for model validation.
In the analyzed literature, AI was predominantly operationalized through ML and its advanced subfield, DL. While traditional algorithms—such as Logistic Regression and Gradient Boosting—remain prevalent, there is a growing shift toward DL architectures, specifically ANNs and DNNs [16,21–23,29,31]. Furthermore, NLP has emerged as a critical domain for extracting clinical value from unstructured data, such as nursing triage notes. More recently, this has evolved into the evaluation of LLMs, such as ChatGPT, for clinical decision support [19,25–27]. To establish clinical utility, these models were frequently benchmarked against validated triage protocols, including the KTAS, the MTS, and the Emergency Severity Index (ESI) [16,18,20,24,26,33].
A critical observation across the included studies is the limitation of accuracy as a standalone metric. In emergency triage, relying exclusively on accuracy is methodologically misleading due to the inherent class imbalance of patient populations, which are typically skewed toward lower-acuity categories (e.g., ESI/MTS levels 3, 4, or 5). Consequently, an algorithm may achieve high global accuracy by disproportionately predicting low-acuity levels while failing to identify critically ill patients—a phenomenon known as undertriage. To address this, studies focusing on high-stakes outcomes, such as cardiovascular events, ICU admission, or mortality [20,24], expanded their evaluative frameworks to include metrics better suited for imbalanced clinical data, such as sensitivity, specificity, and the Area Under the Receiver Operating Characteristic Curve (AUC). These metrics provide a more reliable detection of critical cases, thereby minimizing life-threatening errors and enhancing patient safety [18,23,27,29]. This performance likely stems from the ability of advanced models, particularly Deep Neural Networks and XGBoost, to capture complex non-linear relationships and subtle patterns within vital signs and medical histories that may be overlooked by traditional, human-led protocols.
4.2. Main findings
The literature emphasizes that clinical judgment in triage is inherently susceptible to error, resulting in both overtriage and undertriage. These inaccuracies persist even with validated 5-level instruments due to the high subjectivity and the cognitive load of processing vast amounts of patient data under significant time constraints [24,36]. Such pressures often lead to imprecise classifications and interpersonal conflicts. However, the integration of AI into this workflow has demonstrated the potential to mitigate these errors and enhance overall triage effectiveness [16,20].
Research by Porto [16] confirms that ML and NLP models consistently outperform traditional hospital methods. Beyond superior technical performance, these tools reduce the rates of misclassification and alleviate the workload of frontline professionals. The clinical benefits of AI extend to specialized domains, including the improved prediction of cardiovascular events and sepsis, early diagnosis of respiratory diseases, and even pandemic modeling to prevent future outbreaks [19].
While AI-assisted triage streamlines patient flow, certain discrepancies between AI decision support systems and established protocols like the MTS [27] underscore potential safety risks. Instances of misclassification in urgent scenarios highlight the imperative for rigorous clinical validation before full-scale implementation. Nevertheless, the primary efficiency of AI-guided triage lies in patient prioritization. By enabling rapid assessment in critical cases such as strokes or myocardial infarctions, these systems can reduce waiting times by up to 20%, directly translating into better clinical outcomes [34].
Studies benchmarking AI models against the Korean Triage and Acuity Scale (KTAS) have yielded significant results. For instance, Chang et al. [18] developed a model that achieved an AUC greater than 0.70 across diverse medical centers, suggesting that site-specific variations in emergency departments do not significantly hinder the model’s generalizability. In that study, Federated Learning (FL) was employed to address disparities in data quality by enabling the collaborative training of models while preserving data privacy. Because FL maintains data within each participating institution, it effectively secures sensitive information through decentralization. Notably, such systems are not intended to replace the KTAS methodology; instead, they function as Clinical Decision Support Systems (CDSS) to identify necessary revisions in initial acuity classifications. Although these tools offer robust assistance, their implementation still necessitates professional experience and clinical oversight [19].
In the context of suspected Cardiovascular Disease (CVD), specific ML-based algorithms, such as XGBoost, have reached Area Under the Curve (AUC) levels exceeding 0.90 and an accuracy of 0.78 [20]. Jiang et al. [20] advocate for AI implementation specifically to manage the high volume of low-risk visits (levels 3 and 4), which often contribute to overcrowding and inflated costs. Redirecting these low-severity populations to more appropriate healthcare tracks is a strategic recommendation to preserve emergency resources for critical cases [37].
A significant advancement in this field is the utilization of unstructured data. Kim et al. [25] and Choi et al. [19] demonstrated that incorporating textual nursing notes and voice data significantly improves a model’s ability to predict KTAS levels and identify primary symptoms. Conversely, recent evaluations of LLMs like ChatGPT show that their current performance is unsatisfactory compared to the decision-making capabilities of experienced nurses. While ChatGPT can distinguish high-acuity cases, it tends to over-classify (assigning high criticality to non-critical patients), which can be as detrimental as undertriage in a resource-constrained environment [20,26,38].
Finally, the most influential variables in these models remain physiological parameters—specifically oxygen saturation, blood pressure, heart rate, and respiratory rate—alongside biomarkers like troponin and lactate [24]. Advanced ML models excel at recognizing “faint patterns” within these variables—subtle physiological shifts that may elude human perception but indicate severe underlying pathology [23,39]. By integrating these signs with demographic data such as age and sex, AI systems can contextually adjust the definition of “normal” for each patient, ensuring a highly personalized and accurate risk estimation [21,40].
4.3. Limitations
The findings of this scoping review should be interpreted considering several limitations. First, there is a notable scarcity of studies specifically addressing AI applications in nurse-led triage. Most current investigations focus on medical clinical decision-making, creating a significant gap in the nursing literature. This is particularly relevant in contexts like the Brazilian healthcare system, where nurses occupy a central, autonomous role in the risk classification process.
Second, the lack of external validation across most of the identified models hinders their immediate applicability in real-world clinical settings. Without validation in diverse, multi-center environments, the reliability of these algorithms remains unproven outside their original training datasets. Furthermore, the results of this review cannot be generalized to the pediatric population, as pediatric triage was beyond the initial scope of this study.
Another challenge involves the technical complexity of the literature. Many studies employ highly specialized terminology that may be less accessible to healthcare practitioners, potentially creating a barrier to the clinical adoption of these technologies. The sheer diversity of AI methodologies—ranging from traditional ML to complex DL architectures—precluded a robust quantitative meta-analysis, limiting our ability to directly compare model performances.
Furthermore, our search strategy intentionally excluded broad terms such as ‘triage’ and ‘emergency department’ to prioritize precision over recall. Preliminary testing showed that these terms yielded an unmanageable volume of irrelevant results. Consequently, we acknowledge a potential selection bias, as studies using alternative descriptors—such as ‘risk classification’ or ‘risk assessment’—may have been omitted, potentially limiting the comprehensiveness of this review.
Finally, while the inclusion of grey literature ensured a comprehensive mapping of the field, it also introduced variability in methodological quality. As a formal risk-of-bias assessment is not mandatory for scoping reviews, our findings characterize an exploratory landscape rather than definitive clinical evidence. Consequently, the performance metrics reported herein should be interpreted with caution. These limitations emphasize the urgent need for robust, peer-reviewed clinical trials to establish the safety and efficacy of AI in emergency triage.
4.4. Real-world implementation challenges and explainability
Despite the promising performance metrics of AI in simulated environments (in silico), transitioning these tools into real-world clinical workflows presents formidable challenges [8,34]. A primary barrier to clinical integration is the “black box” nature of complex algorithms, particularly Deep Neural Networks [7,8]. For safe and widespread adoption, healthcare practitioners require Explainable AI frameworks that provide transparent, interpretable reasoning behind each triage output. If nurses and physicians cannot discern the logic an algorithm used to assign a specific risk level, clinical trust will remain low, potentially increasing the risk of inappropriate or delayed interventions [8,32].
Furthermore, successful implementation hinges on seamless interoperability with existing health information systems and EHRs [32,34]. Emergency departments operate in high-pressure environments that demand real-time data processing, necessitating robust IT infrastructure and high-fidelity data integration [32]. From an ethical and regulatory standpoint, the deployment of AI raises significant concerns regarding clinical accountability in cases of misclassification. There is also the persistent risk of “automation bias”, where triage nurses might over-rely on algorithmic suggestions, inadvertently suppressing their own critical clinical judgment and expertise [7,32,34].
Finally, a pervasive gap identified in the literature is the lack of external validation [8,28,34]. Most analyzed models were developed and tested using single-center datasets, suggesting that their predictive performance may degrade significantly when applied to facilities with different demographic profiles, disease prevalences, or distinct clinical workflows [8,28].
Emerging regulatory frameworks provide a foundational structure for the safe integration of artificial intelligence into clinical practice. Recent initiatives, such as the risk-based approach of the European Union Artificial Intelligence Act for high-risk healthcare systems and the U.S. Food and Drug Administration (FDA) framework for AI/ML-enabled Software as a Medical Device (SaMD), represent significant strides in governance and performance oversight [41,42]. Notably, these frameworks have advanced the discourse on adaptive algorithms, lifecycle regulation, and predetermined change control plans—elements that are particularly vital for AI systems used in emergency decision-making [41].
Despite these advances, substantial challenges persist regarding transparency, explainability, and accountability, especially for continuously learning models deployed in dynamic clinical environments [41–43]. Gaps in harmonized regulatory standards and real-world monitoring mechanisms underscore that regulatory readiness, coupled with rigorous multicenter external validation, remains a critical prerequisite. Only through these measures can AI technologies be safely integrated as reliable, complementary tools within emergency departments [41–43].
4.5. Final resolution
The integration of AI into risk classification represents a transformative opportunity to bolster decision-making within healthcare systems, particularly within primary care and emergency entry points. However, the findings of this review highlight a critical need to expand the current research scope. Future investigations must prioritize nursing-specific practices, the rigorous external validation of predictive models, and the adaptation of these technologies to diverse clinical environments. Cultivating interdisciplinary research—where nursing clinical expertise converges with AI innovation—is fundamental to developing tools that are safe, effective, and seamlessly integrated into daily healthcare delivery.
5. Conclusion
This scoping review underscores the significant potential of AI as a robust decision-support tool for risk classification and triage in emergency services. The analyzed literature reveals a broad spectrum of methodologies that demonstrate high precision in predicting clinical outcomes, while simultaneously facilitating reduced wait times and more efficient patient flow.
Despite these technological strengths, AI models should be viewed as complementary assets rather than replacements for the nuanced clinical judgment of nurses and physicians. To reach full maturity, these systems require further technological refinement, validation across heterogeneous care settings, and longitudinal analysis of their clinical impact. Ultimately, this work provides a comprehensive map of AI’s current role in triage and calls for continued interdisciplinary collaboration to ensure that its implementation enhances the safety, efficacy, and overall quality of healthcare decision-making.
Supporting information
S1 File. PRISMA-ScR Checklist.
Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist.
https://doi.org/10.1371/journal.pone.0352338.s001
(DOCX)
S2 Data. Data extraction matrix.
Complete dataset containing the extracted characteristics and results of the included studies.
https://doi.org/10.1371/journal.pone.0352338.s002
(XLSX)
References
- 1. Sacoman TM, Beltrammi DGM, Andrezza R, Cecílio LCdO, Reis AACd. Implementation of the Manchester Triage System in a Municipal Emergency Network. Saúde em Debate. 2019;43(121):354–67.
- 2. Menezes E, Coelho A, Bianchi C, Bianchi Mezetti A, Rodrigues F, Souza L, et al. Manchester classification system: fragilities and limitations. Rev Faipe.
- 3. Valadares DNR, Deltreggia F, Breckenfeld GAA. Multiple victims scenario: triage and patient classification. Braz J Hea Rev. 2024;7(1):5296–307.
- 4. Fekonja Z, Kmetec S, Fekonja U, Mlinar Reljić N, Pajnkihar M, Strnad M. Factors contributing to patient safety during triage process in the emergency department: A systematic review. J Clin Nurs. 2023;32(17–18):5461–77. pmid:36653922
- 5. Oliveira FMM, Beltrão ICSL, Lisboa KWSC, Gadelha NAS, Pinheiro WR, Lucas TR. Nurses’ discourse on the application of the Manchester protocol in hospital urgency and emergency. Revista de Enfermagem e Atenção à Saúde. 2024;13(3):1–19.
- 6. Seo YH, Lee K, Jang K. Factors influencing the classification accuracy of triage nurses in emergency department: analysis of triage nurses’ characteristics. BMC Nurs. 2024;23(1):764. pmid:39420318
- 7. Lobo LC. Artificial Intelligence and Medicine. Rev Bras Educ Med. 2017;41(2):185–93.
- 8. Mueller B, Kinoshita T, Peebles A, Graber MA, Lee S. Artificial intelligence and machine learning in emergency medicine: a narrative review. Acute Med Surg. 2022;9(1):e740.
- 9. Hong WS, Haimovich AD, Taylor RA. Predicting hospital admission at emergency department triage using machine learning. PLoS One. 2018;13(7):e0201016. pmid:30028888
- 10. Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K, et al. Improving ED Emergency Severity Index Acuity Assignment Using Machine Learning and Clinical Natural Language Processing. J Emerg Nurs. 2021;47(2):265-278.e7. pmid:33358394
- 11. Tam HL, Chung SF, Lou CK. A review of triage accuracy and future direction. BMC Emerg Med. 2018;18(1):58. pmid:30572841
- 12. Peters MDJ, Godfrey C, McInerney P, et al. JBI Reviewer’s Manual. JBI. 2020. https://jbi-global-wiki.refined.site/space/MANUAL/355863557/Previous%20versions?attachment=%2Fdownload%2Fattachments%2F355863557%2FJBI_Reviewers_Manual_2020June.pdf&type=application%2Fpdf&filename=JBI_Reviewers_Manual_2020June.pdf
- 13. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
- 14. Peters MDJ, Godfrey C, McInerney P, Khalil H, Larsen P, Marnie C, et al. Best practice guidance and reporting items for the development of scoping review protocols. JBI Evid Synth. 2022;20(4):953–68. pmid:35102103
- 15. Bradbury-Jones C, Aveyard H, Herber OR, Isham L, Taylor J, O’Malley L. Scoping reviews: the PAGER framework for improving the quality of reporting. International Journal of Social Research Methodology. 2021;25(4):457–70.
- 16. Porto BM. Improving triage performance in emergency departments using machine learning and natural language processing: a systematic review. BMC Emerg Med. 2024;24(1):219. pmid:39558255
- 17. Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos-Cabrera MB, Velando-Soriano A, et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022;60:101109. pmid:34952482
- 18. Chang H, Yu JY, Lee GH, Heo S, Lee SU, Hwang SY, et al. Clinical support system for triage based on federated learning for the Korea triage and acuity scale. Heliyon. 2023;9(8):e19210. pmid:37654468
- 19. Choi SW, Ko T, Hong KJ, Kim KH. Machine Learning-Based Prediction of Korean Triage and Acuity Scale Level in Emergency Department Patients. Healthc Inform Res. 2019;25(4):305–12. pmid:31777674
- 20. Jiang H, Mao H, Lu H, Lin P, Garry W, Lu H, et al. Machine learning-based models to support decision-making in emergency department triage for patients with suspected cardiovascular disease. Int J Med Inform. 2021;145:104326. pmid:33197878
- 21. Liu Y, Gao J, Liu J, Walline JH, Liu X, Zhang T, et al. Development and validation of a practical machine-learning triage algorithm for the detection of patients in need of critical care in the emergency department. Sci Rep. 2021;11(1):24044. pmid:34911945
- 22. Gao Z, Qi X, Zhang X, Gao X, He X, Guo S, et al. Developing and Validating an Emergency Triage Model Using Machine Learning Algorithms with Medical Big Data. Risk Manag Healthc Policy. 2022;15:1545–51. pmid:36017058
- 23. Joseph JW, Leventhal EL, Grossestreuer AV, Wong ML, Joseph LJ, Nathanson LA, et al. Deep-learning approaches to identify critically Ill patients at emergency department triage using limited information. J Am Coll Emerg Physicians Open. 2020;1(5):773–81. pmid:33145518
- 24. Levin S, Toerper M, Hamrock E, et al. Machine-learning-based electronic triage more accurately differentiates patients with respect to clinical outcomes compared with the emergency severity index. Ann Emerg Med. 2018;71(5):565–74.e2.
- 25. Kim D, Oh J, Im H, Yoon M, Park J, Lee J. Automatic Classification of the Korean Triage Acuity Scale in Simulated Emergency Rooms Using Speech Recognition and Natural Language Processing: a Proof of Concept Study. J Korean Med Sci. 2021;36(27):e175. pmid:34254471
- 26. Zaboli A, Brigo F, Sibilio S, Mian M, Turcato G. Human intelligence versus Chat-GPT: who performs better in correctly classifying patients in triage?. Am J Emerg Med. 2024;79:44–7. pmid:38341993
- 27. Sarbay İ, Berikol GB, Özturan İU. Performance of emergency triage prediction of an open access natural language processing based chatbot application (ChatGPT): A preliminary, scenario-based cross-sectional study. Turk J Emerg Med. 2023;23(3):156–61. pmid:37529789
- 28. Gao F, Boukebous B, Pozzar M, Alaoui E, Sano B, Bayat-Makoei S. Predictive models for emergency department triage using machine learning: a systematic review. Obstet Gynecol Res. 2022;5(2):136–57.
- 29. Azeez D, Ali MAM, Gan KB, Saiboon I. Comparison of adaptive neuro-fuzzy inference system and artificial neutral networks model to categorize patients in the emergency department. Springerplus. 2013;2:416. pmid:24052927
- 30. Vântu A, Vasilescu A, Băicoianu A. Medical emergency department triage data processing using a machine-learning solution. Heliyon. 2023;9(8):e18402. pmid:37576318
- 31. Zlotnik A, Alfaro MC, Pérez MCP, Gallardo-Antolín A, Martínez JMM. Building a Decision Support System for Inpatient Admission Prediction With the Manchester Triage System and Administrative Check-in Variables. Comput Inform Nurs. 2016;34(5):224–30. pmid:26974710
- 32. Shafaf N, Malek H. Applications of Machine Learning Approaches in Emergency Medicine; a Review Article. Arch Acad Emerg Med. 2019;7(1):34. pmid:31555764
- 33. Lindner G, Ravioli S. Performance of the artificial intelligence-based Swiss medical assessment system versus Manchester triage system in the emergency department: A retrospective analysis. Am J Emerg Med. 2025;94:46–9. pmid:40273637
- 34. Da’Costa A, Teke J, Origbo JE, Osonuga A, Egbon E, Olawade DB. AI-driven triage in emergency departments: A review of benefits, challenges, and future directions. Int J Med Inform. 2025;197:105838. pmid:39965433
- 35. Schmallenbach L, Bärnighausen TW, Lerchenmueller MJ. The global geography of artificial intelligence in life science research. Nat Commun. 2024;15(1):7527. pmid:39266506
- 36. Elsayed Z, El-Zeny A, Moustafa M, Ellouly H. Comparison between Australasian triage scale and emergency severity index. Egypt J Surg. 2020;39(2):455.
- 37. Nummedal MA, King S, Uleberg O, Pedersen SA, Bjørnsen LP. Non-emergency department (ED) interventions to reduce ED utilization: a scoping review. BMC Emerg Med. 2024;24(1):117. pmid:38997631
- 38. Ellebrecht N. Why Is Treatment Urgency Often Overestimated? An Experimental Study on the Phenomenon of Over-triage. Disaster Med Public Health Prep. 2020;14(5):563–7. pmid:31416493
- 39. Razo C, Welgan CA, Johnson CO, McLaughlin SA, Iannucci V, Rodgers A, et al. Effects of elevated systolic blood pressure on ischemic heart disease: a Burden of Proof study. Nat Med. 2022;28(10):2056–65. pmid:36216934
- 40. Oyetunji TA, Chang DC, Crompton JG. Redefining hypotension in the elderly: normotension is not reassuring. Arch Surg. 2011;146(7):865–9.
- 41. Zhou K, Gattinger G. The Evolving Regulatory Paradigm of AI in MedTech: A Review of Perspectives and Where We Are Today. Ther Innov Regul Sci. 2024;58(3):456–64. pmid:38528278
- 42. Larson DB, Harvey H, Rubin DL, Irani N, Tse JR, Langlotz CP. Regulatory Frameworks for Development and Evaluation of Artificial Intelligence-Based Diagnostic Imaging Algorithms: Summary and Recommendations. J Am Coll Radiol. 2021;18(3 Pt A):413–24. pmid:33096088
- 43. Babic B, Glenn Cohen I, Stern AD, Li Y, Ouellet M. A general framework for governing marketed AI/ML medical devices. NPJ Digit Med. 2025;8(1):328. pmid:40450160