Figures
Abstract
Objective
To establish whether or not a natural language processing technique could identify two common inpatient neurosurgical comorbidities using only text reports of inpatient head imaging.
Materials and methods
A training and testing dataset of reports of 979 CT or MRI scans of the brain for patients admitted to the neurosurgery service of a single hospital in June 2021 or to the Emergency Department between July 1–8, 2021, was identified. A variety of machine learning and deep learning algorithms utilizing natural language processing were trained on the training set (84% of the total cohort) and tested on the remaining images. A subset comparison cohort (n = 76) was then assessed to compare output of the best algorithm against real-life inpatient documentation.
Results
For “brain compression”, a random forest classifier outperformed other candidate algorithms with an accuracy of 0.81 and area under the curve of 0.90 in the testing dataset. For “brain edema”, a random forest classifier again outperformed other candidate algorithms with an accuracy of 0.92 and AUC of 0.94 in the testing dataset. In the provider comparison dataset, for “brain compression,” the random forest algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. For “brain edema,” the algorithm again demonstrated better accuracy (0.92 vs 0.84) and AUC (0.45 vs 0.09) than provider documentation.
Citation: Sastry RA, Setty A, Liu DD, Zheng B, Ali R, Weil RJ, et al. (2024) Natural language processing augments comorbidity documentation in neurosurgical inpatient admissions. PLoS ONE 19(5): e0303519. https://doi.org/10.1371/journal.pone.0303519
Editor: Vijayalakshmi Kakulapati, Sreenidhi Institute of Science and Technology, INDIA
Received: August 10, 2022; Accepted: April 4, 2024; Published: May 9, 2024
Copyright: © 2024 Sastry et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant anonymized data and code are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Timely and accurate medical documentation is a quality and safety imperative. Precise documentation can advance efficacious inpatient care, enhance transitions across the healthcare ecosystem, reduce needless variation and excess utilization, facilitate clinical research efforts, and capture the intensity and quality of care, on which hospital reimbursements are based [1–3]. Education and training regarding best practices in documentation, however, can be perceived as extraneous or of minimal importance, especially in the context of resident education and training. Such inattention can result in substantial underestimations of the intensity of care provided to operative and non-operative surgical inpatients [2–6]. Estimated revenue losses of up to 40% have clear and obvious consequences for hospital operations, particularly in the context of hospitalized trauma patients who do not undergo surgical intervention [3]. An array of interventions, including provider education, constant clinician review of electronic medical records (EMR), manually generated documentation queries, and others, have been implemented at various centers. However, they are often additive to the work of busy clinicians and trainees, who already spend historically large amounts of time on documentation, or by expanding the numbers of clinical documentation staff to constantly assess provider practices [5, 7–10]. These additive measures are of the work harder, not smarter, framework and have been documented across healthcare to be a growing source of clinician discontent and burnout [11, 12].
In the United States, inpatient reimbursements are determined by broad classifications of patient diagnoses known as diagnosis-related groups (DRGs), which were originally implemented as part of Medicare’s Prospective Payment System in 1983 [1, 13, 14]. Medicare Severity DRGs (MS-DRGs), the most common system used in the United States, are stratified into three categories: (1) DRG without complication or comorbidity (CC) and without major CC; (2) DRG with a CC; and (3) DRG with MCC [1, 14]. Inpatient and relevant outpatient documentation of pertinent medical and surgical diagnoses, as well as the specific treatments or interventions that treat these diagnoses, determines the CCs and MCCs used as secondary diagnoses during and after admission. In this context, our neurosurgery department at a large American level 1 trauma center recently implemented a provider-based initiative to improve inpatient documentation and comorbidity capture rates [4].
Given the success of machine learning (ML) approaches in a variety of medical contexts [15–17], we hypothesized that a natural language processing (NLP)-based ML algorithm may be able to identify neurosurgical inpatients likely to have 1 or more commonly encountered CC/MCCs based solely on text interpretations of computed tomography (CT) or magnetic resonance imaging (MRI) reports obtained during hospital admission regardless of underlying pathology.
Materials and methods
Patient cohort
The protocol for this study was reviewed and approved by the Institutional Review Board of Rhode Island Hospital (Providence, RI). As the proposed research was a retrospective observational study, the need for patient consent was waived by the aforementioned Institutional Review Board. Data were fully anonymized at the time of chart review. A retrospective cohort of 979 images was comprised of all scans and respective radiological impressions of patients admitted to the neurosurgery service in June 2021 who underwent either CT or MRI of the brain and all scans of patients seen in the emergency department from July 1–8, 2021 who underwent either CT or MRI of the brain. This cohort was devised in order to include an appropriate number of positive and negative controls for algorithm training and development even though the target population of this effort consists exclusively of neurosurgical admissions. Given the nature of these inclusion criteria, in some cases, multiple scans were included for a given patient/admission. A separate provider comparison cohort, which was comprised of 76 patients who were admitted to the neurosurgery service in October 2021 and underwent inpatient CT or MRI of the brain, was also identified in order to facilitate comparison between algorithm performance and real-world documentation. October 2021 was selected as a representative month because it reflected steady state documentation practices after the recent implementation of a documentation improvement protocol and progress note template [4]. In this cohort, only the first scan obtained within our hospital’s system, regardless of indication or modality, was included (thus resulting in one scan per patient). This cohort was specifically only used as a subset of the test data so that we can compare model performance to provider performance on the entire cohort. The combined dataset (n = 1055) was split into a training cohort (n = 885, 83.9%) and a testing set (n = 170, 16.2%) for the purpose of algorithm selection and training. Class imbalance was a major consideration in our data split structure as model fitting can be biased by highly imbalanced datasets. We ensured that our training and testing set had relatively similar class proportions Table 1. Images were not excluded on the basis of elective vs. emergent admission or surgical vs. non-surgical pathologies.
Gold labels
All 1,055 patient images were reviewed by a single author (RAS) in a blinded fashion and were assessed for the presence or absence of “brain compression” or “brain edema,” both of which are common neurosurgical CC/MCCs that were the primary targets of a recent intra-departmental documentation improvement effort [4].
Human prediction
Records for patients in the provider comparison cohort, the temporal range of which was chosen to reflect documentation practices after successful implementation of a provider-education intervention in late 2020, were also manually queried for discharge summary documentation of “brain compression” and “brain edema”; as such, for patients in this cohort, presence or absence of either term in the discharge summary were used to assess the performance of real-world provider documentation against the gold standard of author review.
Data pre-processing
We only used the impression texts of CT and MRI radiology reports to predict “brain compression” and “brain edema” classifications. All word/data tokenization was completed using the Natural Language Toolkit (NLTK) [18] package in the Python programming language (Python Software Foundation, https://www.python.org/). All texts were first “tokenized” into single word vectors by splitting the text on white space thereby one word becoming one word “token”. The list of tokens was then parsed and all words were casted to lowercase, all stop words were removed, and all punctuation was removed to isolate significant word tokens. The list of word vectors was then scored based on two different word tokenizing strategies: term frequency-inverse document frequency (TF-IDF) and frequency (TF) (Fig 1). These word vectors were then fed into ML and deep learning (DL) algorithms with a bag of words technique to predict lesion classification. Bag of words featurization allows for sentences to be vectorized based on the words they contain. The dimension of the sentence vector space is set to the number of unique word tokens where each index of the vector is representative of a unique word. Each sentence vector is constructed by either using a TF approach or a TF-IDF approach. The TF approach assigns values to each index in a sentence vector based on the frequency of that word occurring in the sentence. The TF-IDF approach discounts words that occur in high frequency in the corpus by a discounting factor of log(N/df) where N represents the number of reports present in the dataset and df representing the number of documents that a specific term was present in. The values at each index of a sentence vector constructed by using the TF-IDF approach is the product of the discounting factor and the respective TF value (Fig 1).
TF-IDF = Term Frequency Inverse Document Frequency.
Overall, the entire dataset of radiology reports only mentioned “compression” in 1.7% of the reports and mentioned “edema” in 15.5% of the reports. The presence of these specific tokens do not necessarily correlate deterministically with a positive label (27% and 54% of reports with compression and edema tokens were positive for compression and edema respectively) which demonstrates the need of a more sophisticated NLP approach for classification.
As previously noted, the primary patient cohort data was split into an 84% training set and 16% testing set. After tokenizing and preprocessing the radiology data, we used a multitude of ML and DL supervised learning models to predict lesion classifications.
Machine learning and deep learning prediction
We used python packages scikit-learn Python [19] and Tensorflow [20] to fit various ML and DL models respectively. We trained a random forest, logistic, support vector machine, and Naive Bayes classifiers on both word tokenization techniques (TF-IDF and TF). As for DL models, we fit a single layer perceptron and a multilayer perceptron for classification. Each model was fit once for brain compression and once for brain edema; all models were binary classifiers. Both DL methods used a binary cross-entropy loss function, the random forest used a Gini Impurity loss function, and the rest of the ML classifiers used their respective default loss functions/techniques that were prebuilt in scikit-learn’s library.
Hyperparameter optimization was carried out through grid search optimizing for area under the curve (AUC) to avoid biased model fitting due to our slightly unbalanced datasets. Each model was fit using 5-fold cross validation where the training data was split into 5 equally sized groups, and the model was trained with 4 out of the 5 folds where the last fold was used as a validation set. 5-fold validation was carried out with shuffle split to ensure training data was shuffled and randomized before being divided into folds. The training and validation accuracies were only used to select the best hyperparameters for each model and are not reported here. For each model, AUC and accuracy, the proportion of correct binary classifications among the samples in the test set, are reported in addition to other select metrics.
Results
Characteristics of the patient cohort and included imaging studies are summarized in Table 2. The performance of included ML and DL algorithms for prediction of “brain compression” among the testing cohort are presented in Fig 2. Among ML methods, a random forest classifier with TF-IDF tokenization outperformed other candidate algorithms with an accuracy of 0.81 (Standard Deviation [SD] 0.01) and AUC of 0.90 (SD 0.01). Among DL methods, a multilayer perceptron method with frequency tokenization outperformed other candidate algorithms with an accuracy of 0.78 (SD 0.02) and AUC of 0.88 (SD 0.01). The performance of included ML and DL algorithms for prediction of “brain edema” are presented in Fig 3. Among ML methods, the random forest classifier with term frequency-inverse document frequency again outperformed other candidate algorithms with an accuracy of 0.92 (SD 0.02) and AUC of 0.94 (SD 0.01). Among DL methods, a multi-layer perceptron method with term frequency-inverse document frequency outperformed other candidate algorithms with an accuracy of 0.89 (SD 0.01) and AUC of 0.87 (SD <0.01).
(A) Machine learning classifiers’ performance methods with both frequency (TF) and term frequency-inverse document frequency (TFIDF) tokenization strategies. (B) Deep learning classifiers’ performance methods with both frequency and term frequency-inverse document frequency (TFIDF) tokenization strategies. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.
(A) Machine learning classifiers’ performance methods with both frequency (TF) and term frequency-inverse document frequency (tfidf) tokenization strategies. (B) Deep learning classifiers’ performance methods with both frequency and term frequency-inverse document frequency (TFIDF) tokenization strategies. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.
Receiver operating characteristic (ROC) curves for both random forest classifiers are shown in Fig 4. For the optimal compression classifier, we chose a point on the ROC curve that corresponded to a classifier with an accuracy of 0.81, specificity of 0.88, and sensitivity of 0.65. For the optimal edema classifier, we chose a point on the ROC curve that corresponded to a classifier with an accuracy of 0.92, specificity of 1.0, and sensitivity of 0.48. A more complete characterization of each ML and DL classifier’s performance is broken down in Table 3A and 3B.
(A) Estimator trained for brain compression classification. (B) Estimator trained for brain edema classification. AUC = area under the curve.
A. Performance Metrics for ML and DL Models on Brain Compression. B. Performance Metrics for ML and DL Models on Brain Edema.
Based on these data, the random forest classifier was selected as the best performing algorithm and compared against discharge summary documentation in Fig 5. For documentation of “brain compression”, the Random Forest ML algorithm demonstrated better accuracy (0.76 vs 0.70) and sensitivity (0.73 vs 0.43) than provider documentation. The logarithmic regression also performed very well with an identical accuracy (0.76) and a slightly higher sensitivity (0.76 vs 0.73) albeit having a slightly lower AUC when compared to the Random Forest (0.87 vs 0.89). For “brain edema,” the Random Forest ML algorithm again demonstrated better accuracy (0.92 vs 0.84) and sensitivity (0.45 vs 0.09) than provider documentation. The logarithmic regression also performed very well with an identical accuracy (0.92) and a higher sensitivity (0.54 vs 0.45) albeit having a lower AUC compared to the Random Forest (0.88 vs 0.91).
(A) Estimators for compression dataset. (B) Estimators for edema dataset. SVM = support vector machine; NB = Naïve bayes; Log = Logistic regression.
Overall, our optimal classifiers, for both brain compression and edema, vastly outperformed provider documentation in sensitivity due its ability to more readily classify true brain compression and brain edema cases. We do see, however, a lower specificity in our brain compression classifiers when compared to providers (0.80 vs 0.95), but this can likely be attributed to the fact that provider documentation overwhelmingly failed to document brain compression. This lead to a high number of true negatives and low number of false positives resulting in a very high specificity. A detailed metric comparison of our classifiers’ performance compared to provider documentation performance is presented In Table 4A and 4B.
A. Estimator and Provider Performance Comparison for Brain Compression. B. Estimator and Provider Performance Comparison for Brain Compression.
Discussion
In contemporary American healthcare, the benefits of improved documentation are at best infrequently and indirectly apparent to those on whom the burden of documentation falls. As such, despite the longevity of the DRG-based reimbursement system, sporadic hospital- and practice-based efforts to optimize inpatient documentation abound [1, 2, 4–6, 10, 14, 21–29]. Given the relatively large financial impact of neurosurgical procedures to overall hospital finances and the significant costs of non-operative trauma care, developing simple, reproducible, and efficacious mechanisms for documentation improvement for inpatient neurosurgical practitioners is of paramount importance [2, 5, 30]. In this context, we report the successful development and validation of an NLP/ML-based algorithm for the identification of two common neurosurgical CC/MCC’s from the reports of CTs or MRIs of the brain. When assessed against real-life performance of inpatient neurosurgical providers, our algorithm outperformed baseline provider documentation after the recent implementation of a documentation improvement effort. These results suggest that ML-based decision support should be considered as efficient and cost-effective components of future documentation improvement efforts and, in this specific context, could suggest diagnoses that could be documented along with diagnosis-specific treatment plans. More broadly, the implementation of an efficient, text-based algorithm could have many applications to inpatient care outside of neurosurgery alone.
Time spent documenting in EMR already consumes multiple hours in the average surgical workday [9, 31]. This documentation burden is increasingly significant for inpatient medical and surgical residents, who, along with inpatient APPs, perform the majority of consequential documentation for hospital inpatients in academic centers [7, 8, 14, 23, 31, 32]. Surgeon perception that additional documentation may not be clinically meaningful necessarily limits the implementation of documentation improvement programs, nearly all of which require investment in the form of time, personnel, or both [33]. As previously noted, many previous interventions have coupled targeted provider education sessions with ongoing chart review to provide providers with feedback or to generate further documentation queries [2–6, 14, 21, 24, 25, 27, 28]. For instance, Fox et al report a cost greater than $350,000 and return on investment of 220% for a program that involved personalized documentation teaching sessions and allocation of documentation specialists to round with a trauma surgery team and to review notes at a Level 1 trauma center [10]. Similarly, for a similar intervention, Spurgeon et al reported that nurses working 10–15 hours/week on documentation improvement were only able to review less than half of inpatient neurosurgical notes over an 8-week time period [5]. Efforts to minimize time investment required by providers to update notes underpinned the development of the documentation query, in which a provider need only respond “yes” or “no” for the presence or absence of a given diagnosis [34]; however, even with simple systems, time-consuming manual review by documentation experts is still required to generate queries. The progress note template, which standardizes common comorbidities during documentation efforts, is another low-cost documentation improvement intervention, though contemporary success of simple, paper-based checklists have, perplexingly, been shown to yield more thorough documentation than EMR-based approaches [29]. In clinical contexts, documentation of “brain compression” and “brain edema” can only be reliably extracted from neuroimaging and rarely convey meaningful clinical information relative to more commonly used expressions; as such, ML-based approaches to extract these diagnoses, which are both common and commonly undocumented, may yield significant benefits relative to low costs.
ML and NLP applications in neurosurgery and neuroimaging are numerous and varied [15, 35–39] and have only increased in breadth in depth since the widespread popularization of large language models (LLMs) such as ChatGPT [40–44]. A variety of methods utilizing either radiology reports, raw images, or both, have been successfully applied in a variety of clinical applications [45, 46]. Decision support for clinical documentation may offer a particularly fruitful application of these technologies, especially given that the imperative is to augment documentation at provider discretion without necessarily changing the course of patient care. Documentation efforts likely require a flexible approach for ML applications–certain diagnoses, such as “brain compression”, can be exclusively learned through imaging reports. Others, such as “encephalopathy”, cannot and instead would require parsing provider documentation and medication administration. Another challenge of clinical documentation efforts is that documentation requirements for various stakeholders may not necessarily overlap and, furthermore, may change over time with the release of new documentation standards. A final challenge, which will likely become more prominent with the availability of public LLMs, is the requirement to protect patient data confidentiality [47]. For instance, while an internally developed algorithm such as our own may not jeopardize patient health information (PHI) as both the training and implementation of the model are local; however, use of public LLMs may easily risk transmission of PHI to servers of an external organization. Future applications of these technologies will need to be aware of these particular risks. Nevertheless, the opportunities for NLP are significant and likely extend beyond comorbidity documentation to clinical decision support, safety oversight, telehealth, clinical encounter documentation, and informed patient consent, among many others.
While this project does demonstrate the feasibility of NLP-based decision support for clinical neurosurgical documentation, it does have notable limitations. Our optimal random forest classifiers demonstrated relatively low sensitivity (0.65 and 0.48, for compression and edema, respectively) relative to their high specificity (0.88 and 1.0, for compression and edema respectively). However, for a clinical decision support system, a high specificity in the context of a lower sensitivity is preferred to low specificity and high sensitivity, as an optimal clinical decision support system should generate few false positives and a high number of true positives. Our lower sensitivity numbers are likely attributable to the use of a more naïve NLP approach by only looking at the presence of individual word tokens rather than processing and interpreting word tokens in context of the report as a whole. Future studies should focus on increasing sensitivity, which would likely occur with a larger dataset and the use of more intricate NLP infrastructures such as recursive neural networks or transformers that can more readily contextualize blocks of text as a whole. From the perspective of data collection, the development of this particular NLP model was based on radiology reports generated within a single health care system; as such, its applicability to different reporting systems or in other languages may be limited. Furthermore, the reports were reviewed by a single author. As previously noted, this effort evaluated the diagnosis of two particular comorbidities that could be readily ascertained from neuroimaging. Finally, tactful EMR implementation will be necessary to present results of this algorithm to clinicians in a way that encourages responses and meaningfully improves clinical documentation.
Conclusions
An NLP-based ML algorithm can reliably detect 2 major comorbidities for neurosurgical patients from radiology reports. Algorithm performance exceeds real-life documentation performance.
References
- 1. Aiello FA, Judelson DR, Durgin JM, Doucet DR, Simons JP, Durocher DM, et al. A physician-led initiative to improve clinical documentation results in improved health care documentation, case mix index, and increased contribution margin. J Vasc Surg. 2018;68: 1524–1532. pmid:29735302
- 2. Barnes SL, Waterman M, MacIntyre D, Coughenour J, Kessel J. Impact of standardized trauma documentation to the hospital’s bottom line. Surgery. 2010;148: 793–798. pmid:20797746
- 3. Reyes C, Greenbaum A, Porto C, Russell JC. Implementation of a Clinical Documentation Improvement Curriculum Improves Quality Metrics and Hospital Charges in an Academic Surgery Department. J Am Coll Surg. 2017;224: 301–309. pmid:27919741
- 4. Ali R, Syed S, Sastry RA, Abdulrazeq H, Shao B, Roye GD, et al. Toward more accurate documentation in neurosurgical care. Neurosurg Focus. 2021;51: E11. pmid:34724645
- 5. Spurgeon A, Hiser B, Hafley C, Litofsky NS. Does Improving Medical Record Documentation Better Reflect Severity of Illness in Neurosurgical Patients? Neurosurgery. 2011;58: 155–163. pmid:21916142
- 6. Momin SR, Lorenz RR, Lamarre ED. Effect of a Documentation Improvement Program for an Academic Otolaryngology Practice. JAMA Otolaryngol Neck Surg. 2016;142: 533–537. pmid:27055147
- 7. Oxentenko AS, West CP, Popkave C, Weinberger SE, Kolars JC. Time Spent on Clinical Documentation: A Survey of Internal Medicine Residents and Program Directors. Arch Intern Med. 2010;170: 377–380. pmid:20177042
- 8. Hripcsak G, Vawdrey DK, Fred MR, Bostwick SB. Use of electronic clinical documentation: time spent and team interactions. J Am Med Inform Assoc JAMIA. 2011;18: 112–117. pmid:21292706
- 9. Golob JFJ, Como JJ, Claridge JA. The painful truth: The documentation burden of a trauma surgeon. J Trauma Acute Care Surg. 2016;80: 742–747. pmid:26886003
- 10. Fox N, Swierczynski P, Willcutt R, Elberfeld A, Mazzarelli AJ. Lost in translation: Focused documentation improvement benefits trauma surgeons. Injury. 2016;47: 1919–1923. pmid:27156039
- 11. Shanafelt TD, Dyrbye LN, West CP. Addressing Physician Burnout: The Way Forward. JAMA. 2017;317: 901–902. pmid:28196201
- 12. Downing NL, Bates DW, Longhurst CA. Physician Burnout in the Electronic Health Record Era: Are We Ignoring the Real Cause? Ann Intern Med. 2018;169: 50–51. pmid:29801050
- 13. Steinwald B, Dummit LA. Hospital Case-Mix Change: Sicker Patients Or Drg Creep? Health Aff (Millwood). 1989;8: 35–47. pmid:2501203
- 14. Rosenbaum BP, Lorenz RR, Luther RB, Knowles-Ward L, Kelly DL, Weil RJ. Improving and Measuring Inpatient Documentation of Medical Care within the MS-DRG System: Education, Monitoring, and Normalized Case Mix Index. Perspect Health Inf Manag. 2014;11: 1c. pmid:25214820
- 15. Raju B, Jumah F, Ashraf O, Narayan V, Gupta G, Sun H, et al. Big data, machine learning, and artificial intelligence: a field guide for neurosurgeons. J Neurosurg. 2020;1: 1–11. pmid:33007750
- 16. Luo JW, Chong JJR. Review of Natural Language Processing in Radiology. Neuroimaging Clin N Am. 2020;30: 447–458. pmid:33038995
- 17. Kehl KL, Elmarakeby H, Nishino M, Van Allen EM, Lepisto EM, Hassett MJ, et al. Assessment of Deep Natural Language Processing in Ascertaining Oncologic Outcomes From Radiology Reports. JAMA Oncol. 2019;5: 1421–1429. pmid:31343664
- 18. Bird S, Klein E, Loper E. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc.; 2009.
- 19. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. Mach Learn PYTHON.: 6.
- 20. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. ArXiv160304467 Cs. 2016 [cited 14 Mar 2022]. Available: http://arxiv.org/abs/1603.04467
- 21. Arquiette JM, Moss HA, Truong T, Pieper CF, Havrilesky LJ. Impact of a documentation intervention on health-assessment metrics on an inpatient gynecologic oncology service. Gynecol Oncol. 2019;153: 385–390. pmid:30824212
- 22. Campbell S, Giadresco K. Computer-assisted clinical coding: A narrative review of the literature on its benefits, limitations, implementation and impact on clinical coding professionals. Health Inf Manag J. 2020;49: 5–18. pmid:31159578
- 23. Castaldi M, McNelis J. Introducing a Clinical Documentation Specialist to Improve Coding and Collectability on a Surgical Service. J Healthc Qual JHQ. 2019;41: e21. pmid:31094954
- 24. Elkbuli A, Godelman S, Miller A, Boneva D, Bernal E, Hai S, et al. Improved clinical documentation leads to superior reportable outcomes: An accurate representation of patient’s clinical status. Int J Surg. 2018;53: 288–291. pmid:29653245
- 25. Frazee RC, Matejicka AV, Abernathy SW, Davis M, Isbell TS, Regner JL, et al. Concurrent Chart Review Provides More Accurate Documentation and Increased Calculated Case Mix Index, Severity of Illness, and Risk of Mortality. J Am Coll Surg. 2015;220: 652–656. pmid:25724608
- 26. Grogan EL, Speroff T, Deppen SA, Roumie CL, Elasy TA, Dittus RS, et al. Improving documentation of patient acuity level using a progress note template. J Am Coll Surg. 2004;199: 468–475. pmid:15325618
- 27. Johnson CE, Peralta J, Lawrence L, Issai A, Weaver FA, Ham SW. Focused Resident Education and Engagement in Quality Improvement Enhances Documentation, Shortens Hospital Length of Stay, and Creates a Culture of Continuous Improvement. J Surg Educ. 2019;76: 771–778. pmid:30552003
- 28. Spellberg B, Harrington D, Black S, Sue D, Stringer W, Witt M. Capturing the Diagnosis: An Internal Medicine Education Program to Improve Documentation. Am J Med. 2013;126: 739–743.e1. pmid:23791207
- 29. Weinberg JA, Chapple KM, Gagliano RA, Israr S, Petersen SR. Back to the Future: Impact of a Paper-Based Admission H&P on Clinical Documentation Improvement at a Level 1 Trauma Center. Am Surg. 2019;85: 611–619.
- 30. Resnick AS, Corrigan D, Mullen JL, Kaiser LR. Surgeon Contribution to Hospital Bottom Line. Ann Surg. 2005;242: 530–539. pmid:16192813
- 31. Cox ML, Farjat AE, Risoli TJ, Peskoe S, Goldstein BA, Turner DA, et al. Documenting or Operating: Where Is Time Spent in General Surgery Residency? J Surg Educ. 2018;75: e97–e106. pmid:30522828
- 32. Chaiyachati KH, Shea JA, Asch DA, Liu M, Bellini LM, Dine CJ, et al. Assessment of Inpatient Time Allocation Among First-Year Internal Medicine Residents Using Time-Motion Observations. JAMA Intern Med. 2019;179: 760–767. pmid:30985861
- 33. Zalatimo O, Ranasinghe M, Harbaugh RE, Iantosca M. Impact of improved documentation on an academic neurosurgical practice: Clinical article. J Neurosurg. 2014;120: 756–763. pmid:24359011
- 34. Morrison RJ, Malloy KM, Bakshi RR. Improved Comorbidity Capture Using a Standardized 1-Step Quality Improvement Documentation Tool. Otolaryngol Neck Surg. 2018;159: 143–148. pmid:29557262
- 35. Jumah F, Raju B, Nagaraj A, Shinde R, Lescott C, Sun H, et al. Uncharted Waters of Machine and Deep Learning for Surgical Phase Recognition in Neurosurgery. World Neurosurg. 2022;160: 4–12. pmid:35026457
- 36. English M, Kumar C, Ditterline BL, Drazin D, Dietz N. Machine Learning in Neuro-Oncology, Epilepsy, Alzheimer’s Disease, and Schizophrenia. Acta Neurochir Suppl. 2022;134: 349–361. pmid:34862559
- 37. Muhlestein WE, Akagi DS, Davies JM, Chambless LB. Predicting Inpatient Length of Stay After Brain Tumor Surgery: Developing Machine Learning Ensembles to Improve Predictive Performance. Neurosurgery. 2019;85: 384–393. pmid:30113665
- 38. Muhlestein WE, Akagi DS, Kallos JA, Morone PJ, Weaver KD, Thompson RC, et al. Using a Guided Machine Learning Ensemble Model to Predict Discharge Disposition following Meningioma Resection. J Neurol Surg Part B Skull Base. 2018;79: 123. pmid:29868316
- 39. Merali ZA, Colak E, Wilson JR. Applications of Machine Learning to Imaging of Spinal Disorders: Current Status and Future Directions. Glob Spine J. 2021;11: 23S–29S. pmid:33890805
- 40. Ali R, Connolly ID, Tang OY, Mirza FN, Johnston B, Abdulrazeq HF, et al. Bridging the literacy gap for surgical consents: an AI-human expert collaborative approach. Npj Digit Med. 2024;7: 1–6. pmid:38459205
- 41. Roman A, Al-Sharif L, AL Gharyani M. The Expanding Role of ChatGPT (Chat-Generative Pre-Trained Transformer) in Neurosurgery: A Systematic Review of Literature and Conceptual Framework. Cureus. 15: e43502. pmid:37719492
- 42. Dubinski D, Won S-Y, Trnovec S, Behmanesh B, Baumgarten P, Dinc N, et al. Leveraging artificial intelligence in neurosurgery—unveiling ChatGPT for neurosurgical discharge summaries and operative reports. Acta Neurochir (Wien). 2024;166: 38. pmid:38277081
- 43. Goodman KE, Yi PH, Morgan DJ. AI-Generated Clinical Summaries Require More Than Accuracy. JAMA. 2024;331: 637–638. pmid:38285439
- 44. Patel SB, Lam K. ChatGPT: the future of discharge summaries? Lancet Digit Health. 2023;5: e107–e108. pmid:36754724
- 45. Pons E, Braun LMM, Hunink MGM, Kors JA. Natural Language Processing in Radiology: A Systematic Review. Radiology. 2016;279: 329–343. pmid:27089187
- 46. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, et al. Deep Learning: A Primer for Radiologists. RadioGraphics. 2017;37: 2113–2131. pmid:29131760
- 47. Kanter GP, Packel EA. Health Care Privacy Risks of AI Chatbots. JAMA. 2023;330: 311–312. pmid:37410449