Urinary tract infection (UTI) is a common emergency department (ED) diagnosis with reported high diagnostic error rates. Because a urine culture, part of the gold standard for diagnosis of UTI, is usually not available for 24–48 hours after an ED visit, diagnosis and treatment decisions are based on symptoms, physical findings, and other laboratory results, potentially leading to overutilization, antibiotic resistance, and delayed treatment. Previous research has demonstrated inadequate diagnostic performance for both individual laboratory tests and prediction tools.
Our aim, was to train, validate, and compare machine-learning based predictive models for UTI in a large diverse set of ED patients.
Single-center, multi-site, retrospective cohort analysis of 80,387 adult ED visits with urine culture results and UTI symptoms. We developed models for UTI prediction with six machine learning algorithms using demographic information, vitals, laboratory results, medications, past medical history, chief complaint, and structured historical and physical exam findings. Models were developed with both the full set of 211 variables and a reduced set of 10 variables. UTI predictions were compared between models and to proxies of provider judgment (documentation of UTI diagnosis and antibiotic administration).
The machine learning models had an area under the curve ranging from 0.826–0.904, with extreme gradient boosting (XGBoost) the top performing algorithm for both full and reduced models. The XGBoost full and reduced models demonstrated greatly improved specificity when compared to the provider judgment proxy of UTI diagnosis OR antibiotic administration with specificity differences of 33.3 (31.3–34.3) and 29.6 (28.5–30.6), while also demonstrating superior sensitivity when compared to documentation of UTI diagnosis with sensitivity differences of 38.7 (38.1–39.4) and 33.2 (32.5–33.9). In the admission and discharge cohorts using the full XGboost model, approximately 1 in 4 patients (4109/15855) would be re-categorized from a false positive to a true negative and approximately 1 in 11 patients (1372/15855) would be re-categorized from a false negative to a true positive.
Citation: Taylor RA, Moore CL, Cheung K-H, Brandt C (2018) Predicting urinary tract infections in the emergency department with machine learning. PLoS ONE 13(3): e0194085. https://doi.org/10.1371/journal.pone.0194085
Editor: Qunfeng Dong, University of North Texas, UNITED STATES
Received: December 6, 2017; Accepted: February 23, 2018; Published: March 7, 2018
Copyright: © 2018 Taylor et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: The authors received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
In the United States, there are more than 3 million emergency department (ED) visits each year for urinary tract infections (UTI) with annual direct and indirect costs estimated to be more than $2 billion.[1–3] Compared with the general population, ED patients with UTIs have higher acuity (approximately 10% of visits are for pyelonephritis) and are more likely to present with non-classic symptoms such as altered mental status, fatigue, and nausea. Because a urine culture, part of the gold standard for diagnosis of UTI, is usually not available for 24–48 hours after an ED visit, diagnosis and treatment decisions are based on symptoms, physical findings, and other laboratory results, potentially leading to overutilization, antibiotic resistance, and delayed treatment. 
Diagnostic error for UTI in the ED has been reported to be as high as 30–50%.[6–8] While women of child-bearing age exhibiting classic symptoms of dysuria, frequency, and hematuria have a high likelihood of disease, in more generalized cohorts of ED patients historical, physical, and laboratory findings are less accurate.[9, 10] In a systematic review of ED studies pertaining to urinalysis results, Meister et al. found that only the presence of nitrite was specific enough to rule in the disease, while no single test or simple combination of tests was able to rule out the disease. Furthermore, many of these prior studies examining UTI focused on high prevalence populations with uncomplicated UTI, creating concern for spectrum bias in the results. These findings have led to calls for development of more sophisticated clinical decision support systems with predictive models that incorporate multiple aspects of both history, physical, and laboratory findings to improve diagnostic accuracy.
While some predictive models for UTI have been developed, [12–17] they are limited in several ways. Most use only a few variables (e.g. only urine dipstick or urinalysis results), were derived from small datasets, and fail to model for complex interactions between variables which results in poor to moderate diagnostic performance. Others, like the neural network developed by Heckerling et al., have improved diagnostic accuracy but were derived on female-only data sets of generally healthy outpatient populations with high prevalences of UTI, limiting their generalizability. Yet, now with the recent widespread adoption of Electronic Health Records (EHRs) and advances in data science, there is the opportunity to move beyond these limited predictive models and develop and deploy sophisticated machine learning algorithms, trained on thousands to millions of examples to assist with UTI diagnosis and potentially reduce diagnostic error.
Our aim, therefore, was to train, validate, and compare predictive models for UTI in a diverse set of ED patients using machine learning algorithms on a large single-center, multi-site, electronic health record (EHR) dataset. Within the validation dataset, we further sought to compare the best performing model to proxies of clinical judgement by examining provider patterns of UTI diagnosis and antibiotic prescription to gain insight about the potential impact of the model.
Single-center, multi-site, retrospective cohort analysis of adult emergency department visits with urine culture results. This study was approved by the institutional review board (Yale Human Research Protection Program) and waived the requirement for informed consent. Data were de-identified after initial database access, but prior to analysis. Only de-identified data was stored and used in analyses (see S1 File for minimal data set and S2 File for code used in analyses). We adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement on reporting predictive models.
Study setting and population
Data were obtained from four EDs between March 2013 and May 2016. All EDs were part of a single health care system and have been described previously. All EDs use a single EHR vendor, Epic (Verona, WI) with a centralized data warehouse. We included all visits for adult patients (≥18 years) who had a urine culture obtained during their ED visit and who had symptoms potentially attributable to a UTI (Table 1). The requirement to have symptoms potentially attributable to UTI was made to eliminate visits where patients may have asymptomatic bacteriuria.
Data set creation and definitions
All data elements for each ED visit were obtained from the enterprise data warehouse. Only data available during the ED visit until the time of admission or discharge were used as prediction variables. Medications received during the ED visit and ED diagnosis were not included as variables to eliminate the influence of provider knowledge on the prediction model. Predictor variables included demographic information (age, sex, race, etc.), vitals, laboratory results, urinalysis and urine dipstick results, current outpatient medications, past medical history, chief complaint, and structured historical and physical exam findings (S1 Table).
Data were preprocessed according to methods previously described. Errant text data in categorical fields were improved through regular expression searches. Continuous data (labs, vitals) within the EHR are often not missing at random and provide additional information if encoded in some way. For example, in patients who are viewed as “not sick” labs are often not ordered. Continuous data were therefore smoothed and discretized using k-means clustering (k value = 5) allowing incorporation of a “not recorded” category. Medications and comorbidities were grouped using the Anatomical Therapeutic Chemical (ATC) Classification System and Clinical Classification Software categories[23, 24]
The primary outcome for all analyses was the presence of a positive urine culture defined by >104 colony forming units (CFU)/high powered field (HPF), a threshold pre-established by the laboratory of our healthcare system for reporting positive results. Mixed flora results were only considered positive if there was the presence of Escherichia coli. For the secondary aim, we compared the best performing model to clinical judgement. While EHR data readily allows the accumulation of large amounts of data to develop prediction models, it is much more limited in allowing unbiased assessment of provider diagnosis and management. Providers may fail to document a UTI diagnosis in the EHR and antibiotics are often given for other diagnoses in patients with UTI symptoms. We therefore chose to compare the best-performing full and reduced models to 1) provider documentation of UTI diagnosis and 2) if the provider gave antibiotics OR documented a diagnosis of UTI, the provider was given credit for a UTI diagnosis. Cases where antibiotics were given and there was a clear alternative diagnosis (pneumonia, diverticulitis, colitis, cholecystitis, enteritis, obstruction, peritonitis, and cellulitis–captured by key word search) were not labeled as a UTI diagnosis. We believed examining provider UTI diagnosis alone would provide a reasonable upper bound for provider diagnostic specificity, and, likewise, a combination of UTI diagnosis or antibiotics for provider diagnostic sensitivity. Comparisons were performed for overall, admitted, and discharge cohorts. For these scenarios, we identified all medications prescribed or given within the ED meeting the ATC “infective” or “antibiotic” categories and urinary tract infection diagnoses by ICD9 and ICD10 codes (S2 Table).
We developed models for UTI prediction using seven machine learning algorithms: random forest, extreme gradient boosting, adaptive boosting, support vector machine, elastic net, neural network, and logistic regression (R packages included: randomForest, xgboost, adaboost, e1071, glmnet, lme4, nnet, and caret). The first six algorithms were chosen for their ability to model nonlinear associations, resiliency to overfitting, relative ease in implementation, and general acceptance in the machine learning community. Logistic regression, commonly used in the medical field, was chosen as a baseline comparison. Data preprocessing steps, specified above, were common to all models. Models were developed using the full variable set (211 variables) and a reduced set of 10 variables selected through expert knowledge and literature review (Table 2). Expert and literature review-based selection was chosen over automated variable selection techniques to address user acceptance of model variables. Ten was chosen as a number that was felt to represent a reasonable upper threshold for development of an online calculator/app addressing usability concerns around manual data entry. Supported by prior literature, interaction terms were only assessed for selected urinalysis variables.[7, 9, 10] Where applicable, models were tuned through 10-fold cross validation and grid searches on respective hyperparameters within the training data set. All models were trained and validated on a randomly partitioned 80%/20% split of the data.
Descriptive statistics were used for baseline characteristics and outcomes. Univariate chi-square tests were used to compare categorical variables, and t-tests and ANOVA were used to compare continuous variables. We report the area under the curve (AUC) of the receiver operating characteristic (ROC) as the primary measure of model prediction.  AUC comparison was performed to evaluate significance via chi-square statistics using the method developed by Delong et al. In order to account for multiple comparisons, a Bonferroni adjusted p-value of 0.004 was considered statistically significant. Additional statistics for comparison included sensitivity, specificity, positive and negative likelihood ratios with 95% confidence intervals (CI) and are reported at the optimal threshold for AUC.
For comparison to the two scenarios of clinical judgement, confusion matrices (i.e. 2x2 contingency matrices) were constructed. Sensitivity, specificity, and accuracy with 95%CI were calculated. The sensitivity is defined as the proportion of positive results out of the number of samples which were actually positive and specificity as the proportion of negative results out of the number of samples which were actually negative. Diagnostic accuracy was defined as the proportion of all tests that give a correct result. Exact binomial confidence limits were calculated for test sensitivity and specificity. Confidence intervals for positive and negative likelihood ratios were based on formulae provided by Simel et al. To increase interpretability, when comparing the models to UTI diagnosis alone, we set the specificity of the best performing models to that of UTI diagnosis allowing assessment of the differences in sensitivity. Similarly, when comparing the best performing models to UTI diagnosis OR antibiotic administration we set the sensitivity of each model to that of UTI diagnosis OR antibiotic administration allowing assessment of the differences in specificity. Differences in sensitivity and specificity between the models and proxies for provider judgement were analyzed using the adjusted Wald method and displayed with 95%CI.
During the study time period, there were 560,515 ED visits (410,173 patients). A total of 80,387 ED visits (55,365 patients) had urine culture results, symptoms potentially attributable to a UTI, and were ultimately included in the final analyses. There were 18,284 (23%) positive urine cultures, 14,335 (35%) in females, and 3,755 (18%) in males. Further demonstration of the training/validation cohorts and processing steps are demonstrated in Fig 1. The median age for the visits was 53 [IQR 34–72] and 68% were female. Additional basic demographic information and selected patient characteristics stratified by urine culture result are demonstrated in Table 3.
Classification results for the machine learning models are presented in Fig 2 and Table 4. The top classifier for the full models was XGBoost with an AUC of .904 (95%CI .898-.910) and was statistically better than all other models except Random Forest. The top classifier for the reduced models was XGBoost (AUC .877, 95%CI .871-.884). All full models were statistically better than the reduced models except for the reduced XGBoost model.
In the validation cohort, 1616 (22.1%) admitted visits and 1712 (20.1%) discharge visits were diagnosed with UTI. Within this cohort, the number of admit and discharge visits with a documented diagnosis of UTI receiving antibiotics was 1610 (99.6%) and 1693 (98.9%), respectively. Comparison of the top-performing (XGBoost) model with provider diagnosis and antibiotic prescribing are presented in the form of confusion matrices with associated sensitivities, specificities, accuracies, and differences (Tables 5 and 6). While setting the sensitivity of the best-performing models to the same value as the combination of antibiotics OR documentation of UTI diagnosis, the best performing full and reduced model demonstrated far superior specificity with a 33.3 (31.3–34.3) and 29.6 (28.5–30.6) difference, respectively. Framed within a more clinical perspective, in applying the model to the overall validation admitted/discharge cohort approximately 1 in 4 patients (4109/15855) would be re-categorized from a false positive to a true negative when compared to provider judgement as determined by UTI diagnosis and antibiotic prescribing. Comparing only UTI diagnosis to the best performing models set at the same specificity, the best performing full and reduced model also demonstrated far superior sensitivity with a 38.7 (38.1–39.4) and 33.2 (32.5–33.9) difference, respectively. In the overall validation admitted/discharge cohort approximately 1 in 11 patients (1372/15855) would be re-categorized from a false negative to a true positive when compared to provider judgement as determined by UTI diagnosis alone. Among admit visits receiving antibiotics, there were 156 visits (13.2%) with clear alternative infectious diagnoses in those with positive urine cultures and 529 (21.3%) in those with negative urine cultures. Among discharge visits who received antibiotics, there were 52 (4.3%) visits with clear alternative infectious diagnoses and 200 (9.0%) in those with negative urine cultures.
In this retrospective observational study of urinary tract infections, a common ED diagnosis with high rates of diagnostic error, we used machine learning algorithms and a large dataset to accurately diagnose positive urine culture results. The top-performing algorithm, XGBoost, achieved an AUC of .904(.898-.910), and overall accuracy of 87.5% (95%CI 87.0–88.0), almost ten percentage points higher accuracy than the best performing model in the literature. Even for models trained on a more limited set of variables, the best models achieved excellent results with an AUC of .877(.871-.884) and an accuracy of 85.9%(95%CI 85.3–86.4). In comparison to proxies of provider judgment, the best performing models were far more specific than a combination of antibiotics OR documentation of UTI diagnosis and far more sensitive than documentation of UTI diagnosis alone.
Previous studies developing predictive models for UTI are limited by small data sets, poor generalizability to the ED, and diagnostic performance. [12–17] The idea that a predictive model would be useful for UTI diagnosis in the ED has been around for some time. Wigton et al. in 1985 developed a scoring model (derived from discriminant analysis) based on history, physical, and laboratory in 248 female patients in the ED with validation on 298 patients. In this study the prevalence of UTI was 61% and the reported AUC was 0.78, accuracy 74%, sensitivity 93%, and specificity 44%. This is the only model developed on ED patients of which we are aware. Subsequent models, almost all some form of clinical decision rule on a few variables, were developed predominantly in outpatient settings on several hundred patients with prevalence values of 53–62% and generally did not have separate validation data sets. Accuracy for these studies was 67–76% with sensitivity values of 64.9–82.0% and specificity values of 53.7–94.8%. The best performing model we found in the literature was by Heckeling et al. and used neural networks with a genetic algorithm for variable selection. The model by Heckerling et al. was developed in an outpatient setting on 212 female patients and had an AUC of 0.78, and accuracy of 78%, but lacked testing on a separate validation data set. Our models, in contrast, were developed on a data set approximately 100 times in size, utilizing hundreds of variables and machine learning algorithms on a diverse set of ED patients. We achieved a top-performing AUC 0.12 points higher than Wigton et al. and Heckerling et al. with 9–12% greater accuracy. The reduced models, while generally not performing as well as the full models, still achieved much higher results than previously reported models and decision aids.
A model that fails to indicate an ability to improve current care has little value, regardless of its predictive ability, and recent evidence suggests that most clinical decisions rules fail to outperform clinical judgement. In examining the literature, only one of the prior models for UTI prediction demonstrated its potential clinical impact. McIsaac et al. showed that with implementation of their simple decision aid unnecessary antibiotics would be reduced by 40.2%. Recognizing the limitations of EHR data and retrospective analysis, we chose to compare the models to two proxies for provider judgment, 1) the provider was considered to have diagnosed the patient with a UTI if, and only if, the diagnosis was documented—optimizing specificity, and 2) if the provider gave antibiotics or diagnosed the patient with UTI the provider was given credit for a UTI diagnosis, thus optimizing sensitivity. These scenarios are “optimal” from the provider standpoint in that it is likely that a portion of visits which eventually have a positive urine culture patients were given antibiotics for some other suspected cause and that in visits with an eventual negative urine culture there is a portion of patients who did not have a documented UTI, but the provider nevertheless likely had that diagnosis in mind (e.g. patient diagnosed with dysuria and given antibiotics but eventual urine culture is negative). In comparison to these proxies of provider judgment, the best performing models were far more specific than a combination of antibiotics OR documentation of UTI diagnosis and far more sensitive than documentation of UTI diagnosis alone. This was true in both discharge and admit visits with the larger difference in admit visits possibly a consequence of a lower threshold for antibiotic administration, complexity of presentation, and higher acuity visits. Moreover, even in a theoretical scenario where provider judgement is assigned both optimal bounds (sensitivity assigned from UTI or antibiotics scenario– 73.8% and specificity assigned from the UTI diagnosis only scenario– 84.7%), both the full and reduced models still demonstrate overall superior performance. Viewed from another perspective, our findings suggest that implementation of the algorithm has the potential to greatly reduce the number of false positives and false negatives for UTI diagnosis. For example, in the overall cohort (both discharged and admitted patients) approximately 1 in 4 patients (4111/15855) were re-categorized from a false positive to a true negative when comparing XGBoost to antibiotics OR documentation of UTI diagnosis.
Advances in machine learning, coupled with training on large EHR datasets, have the ability to disrupt the areas of diagnosis and prognosis in emergency medicine. Already in other fields, expert level, or above expert level, performance has been achieved in areas as diverse as the diagnosis of diabetic retinopathy and heart failure prediction. UTI diagnosis is an area particular ripe for improvement through machine learning based clinical decision support. UTI diagnosis has a high error rate, the primary information that is used for diagnosis are abstract lab values with multiple categories, and there is a lack of reinforcement learning (ED providers rarely see the final culture results). Incorporation of machine learning algorithms into existing workflows, however, is not without difficulty. Models that use hundreds of variables make manual entry unfeasible and are currently difficult to “hard” code within EHR platforms/databases or to export to 3rd party applications. Progress is being made in this area with tools incorporating the predictive modeling markup language (PMML) facilitating interoperable exchange of models. Importantly, for UTI diagnosis, our results suggest using a reduced model in, for example, an online app would result in only a small performance loss compared to the full model and still significantly improve diagnostic accuracy. The app could incorporate pretest probabilities of disease facilitating personalized decisions for each patient based on patient/doctor determined testing and treatment thresholds. Future implementation studies could then examine the effect of clinical decision support system app on diagnostic error and outcomes.
The current study has several limitations. First, we recognize that without prospectively collecting data on clinical diagnosis, uncertainty exists regarding the performance of clinical judgement in our study. We, however, believe that the scenarios examined serve to minimize this risk. Second, there is currently no clear accepted level for a positive urine culture with a range in the literature from 10^2 cfu/mL to 10^5 cfu/mL. [12–17] Conceivably different thresholds would result in different test performances. Our choice of 10^4 cfu/mL is a middle ground and was unable to be adjusted due to standardized laboratory reporting within the EHR. Third, our model was built on data from a single healthcare institution within a confined geographic region and would require further validation at other institutions prior to implementation at those sites. Alternately, institutions could take the methods and variables used here and build their own models. Fourth, our data only included visits with urine culture results limiting its extension to patients who may have only had urinalysis or urine dipstick test. Last, our approach was limited to data elements available during each ED visit and does not include unstructured data elements, such as features in clinical notes, that may further improve the predictive accuracy.
In this study developing and validating models for prediction of urinary tract infections in emergency department visits on a large EHR dataset, the best performing machine learning algorithm, XGBoost, accurately diagnosed positive urine culture results, and outperformed previously developed models in the literature and several proxies for provider judgment. Futures implementation studies should prospectively examine the impact of the model on outcomes and diagnostic error.
S1 File. Minimal data set.
Minimal Data set necessary for analyses.
S2 File. Code for analysis.
R code for analyses. Please see code for further description.
S1 Table. Variable list.
Full variable list for machine learning models.
- 1. National Center for Health Statistics. National hospital ambulatory medical care survey (NHAMCS), 2010. Hyattsville (MD). Public-use data file and documentation. Available at:ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHAMCS/. Accessed May 18, 2017.
- 2. Gordon LB, Waxman MJ, Ragsdale L, Mermel LA. Overtreatment of presumed urinary tract infection in older women presenting to the emergency department. Journal of the American Geriatrics Society. 2013;61(5):788–92. pmid:23590846.
- 3. Foxman B. Urinary tract infection syndromes: occurrence, recurrence, bacteriology, risk factors, and disease burden. Infect Dis Clin North Am. 2014;28(1):1–13. pmid:24484571.
- 4. Brown P, Ki M, Foxman B. Acute pyelonephritis among adults: cost of illness and considerations for the economic evaluation of therapy. Pharmacoeconomics. 2005;23(11):1123–42. pmid:16277548.
- 5. Schito GC, Naber KG, Botto H, Palou J, Mazzei T, Gualco L, et al. The ARESC study: an international survey on the antimicrobial resistance of pathogens involved in uncomplicated urinary tract infections. Int J Antimicrob Agents. 2009;34(5):407–13. pmid:19505803.
- 6. Tomas ME, Getman D, Donskey CJ, Hecker MT. Overdiagnosis of Urinary Tract Infection and Underdiagnosis of Sexually Transmitted Infection in Adult Women Presenting to an Emergency Department. J Clin Microbiol. 2015;53(8):2686–92. pmid:26063863; PubMed Central PMCID: PMCPMC4508438.
- 7. Schmiemann G, Kniehl E, Gebhardt K, Matejczyk MM, Hummers-Pradier E. The diagnosis of urinary tract infection: a systematic review. Dtsch Arztebl Int. 2010;107(21):361–7. pmid:20539810; PubMed Central PMCID: PMCPMC2883276.
- 8. McIsaac WJ, Hunchak CL. Overestimation error and unnecessary antibiotic prescriptions for acute cystitis in adult women. Med Decis Making. 2011;31(3):405–11. pmid:21191120.
- 9. Aubin C. Does this woman have an acute uncomplicated urinary tract infection? Ann Emerg Med. 2007;49(1):106–8. WOS:000243448300023. pmid:17203544
- 10. Meister L, Morley EJ, Scheer D, Sinert R. History and physical examination plus laboratory testing for the diagnosis of adult female urinary tract infection. Acad Emerg Med. 2013;20(7):631–45. pmid:23859578.
- 11. Lachs MS, Nachamkin I, Edelstein PH, Goldman J, Feinstein AR, Schwartz JS. Spectrum Bias in the Evaluation of Diagnostic-Tests—Lessons from the Rapid Dipstick Test for Urinary-Tract Infection. Annals of internal medicine. 1992;117(2):135–40. WOS:A1992JD12300008. pmid:1605428
- 12. Wigton RS, Hoellerich VL, Ornato JP, Leu V, Mazzotta LA, Cheng IH. Use of clinical findings in the diagnosis of urinary tract infection in women. Arch Intern Med. 1985;145(12):2222–7. pmid:2934038.
- 13. Little P, Turner S, Rumsby K, Warner G, Moore M, Lowes JA, et al. Developing clinical rules to predict urinary tract infection in primary care settings: sensitivity and specificity of near patient tests (dipsticks) and clinical scores. Br J Gen Pract. 2006;56(529):606–12. pmid:16882379; PubMed Central PMCID: PMCPMC1874525.
- 14. McIsaac WJ, Moineddin R, Ross S. Validation of a decision aid to assist physicians in reducing unnecessary antibiotic drug use for acute cystitis. Arch Intern Med. 2007;167(20):2201–6. pmid:17998492.
- 15. Winkens R, Nelissen-Arets H, Stobberingh E. Validity of the urine dipslide under daily practice conditions. Fam Pract. 2003;20(4):410–2. pmid:12876111.
- 16. Heckerling PS, Canaris GJ, Flach SD, Tape TG, Wigton RS, Gerber BS. Predictors of urinary tract infection based on artificial neural networks and genetic algorithms. Int J Med Inform. 2007;76(4):289–96. pmid:16469531.
- 17. Papageorgiou EI. Fuzzy cognitive map software tool for treatment management of uncomplicated urinary tract infection. Comput Methods Programs Biomed. 2012;105(3):233–45. pmid:22001398.
- 18. Obermeyer Z, Emanuel EJ. Predicting the Future—Big Data, Machine Learning, and Clinical Medicine. New Engl J Med. 2016;375(13):1216–9. WOS:000384265000003. pmid:27682033
- 19. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594. pmid:25569120.
- 20. Taylor RA, Pare JR, Venkatesh AK, Mowafi H, Melnick ER, Fleischman W, et al. Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. Acad Emerg Med. 2016;23(3):269–78. pmid:26679719.
- 21. Nicolle LE, Bradley S, Colgan R, Rice JC, Schaeffer A, Hooton TM, et al. Infectious Diseases Society of America guidelines for the diagnosis and treatment of asymptomatic bacteriuria in adults. Clin Infect Dis. 2005;40(5):643–54. pmid:15714408.
- 22. Maslove DM, Podchiyska T, Lowe HJ. Discretization of continuous features in clinical datasets. J Am Med Inform Assoc. 2013;20(3):544–53. Epub 2012/10/13. pmid:23059731; PubMed Central PMCID: PMCPMC3628044.
- 23. Chen L, Zeng WM, Cai YD, Feng KY, Chou KC. Predicting Anatomical Therapeutic Chemical (ATC) classification of drugs by integrating chemical-chemical interactions and similarities. PLoS One. 2012;7(4):e35254. pmid:22514724; PubMed Central PMCID: PMCPMC3325992.
- 24. HCUP Clinical Classification Software (CCS) for ICD-9-CM. Healthcare Cost and Utilization Project (HCUP). 2006–2009. Agency for Healthcare Research and Quality, Rockville, MD. Available at:http://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp. Accessed July 11, 2016.
- 25. Hooton TM, Roberts PL, Cox ME, Stapleton AE. Voided midstream urine culture and acute cystitis in premenopausal women. N Engl J Med. 2013;369(20):1883–91. pmid:24224622; PubMed Central PMCID: PMCPMC4041367.
- 26. Rea S, Pathak J, Savova G, Oniki TA, Westberg L, Beebe CE, et al. Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project. J Biomed Inform. 2012;45(4):763–71. WOS:000308258200019. pmid:22326800
- 27. Mazurowski MA, Habas PA, Zurada JA, Lo JY, Baker JA, Tourassi GD. Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance. Neural Networks. 2008;21(2–3):427–36. WOS:000255238800034. pmid:18272329
- 28. Delong ER, Delong DM, Clarkepearson DI. Comparing the Areas under 2 or More Correlated Receiver Operating Characteristic Curves—a Nonparametric Approach. Biometrics. 1988;44(3):837–45. WOS:A1988Q069100016. pmid:3203132
- 29. D C. Modelling Binary Data. Chapman & Hall/CRC, Boca Raton Florida, pp. 241999.
- 30. Simel DL, Samsa GP, Matchar DB. Likelihood ratios with confidence: sample size estimation for diagnostic test studies. J Clin Epidemiol. 1991;44(8):763–70. pmid:1941027.
- 31. Wenzel D, Zapf A. Difference of two dependent sensitivities and specificities: Comparison of various approaches. Biometrical J. 2013;55(5):705–18. WOS:000327816900005. pmid:23828661
- 32. Wigton RS, Hoellerich VL, Ornato JP, Leu V, Mazzotta LA, Cheng IHC. Use of Clinical Findings in the Diagnosis of Urinary-Tract Infection in Women. Arch Intern Med. 1985;145(12):2222–7. WOS:A1985AVM5000014. pmid:2934038
- 33. Schriger DL, Elder JW, Cooper RJ. Structured Clinical Decision Aids Are Seldom Compared With Subjective Physician Judgment, and Are Seldom Superior. Ann Emerg Med. 2017;70(3):338–44 e3. pmid:28238497.
- 34. Janke AT, Overbeek DL, Kocher KE, Levy PD. Exploring the Potential of Predictive Analytics and Big Data in Emergency Care. Ann Emerg Med. 2016;67(2):227–36. WOS:000369124400013. pmid:26215667
- 35. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316(22):2402–10. pmid:27898976.
- 36. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? Plos One. 2017;12(4). ARTN e0174944 10.1371/journal.pone.0174944. WOS:000399352000025.
- 37. Zhang YY, Jiao YQ. Design and Implementation of Predictive Model Markup Language Interpretation Engine. 2015 International Conference on Network and Information Systems for Computers (ICNISC). 2015:527–31. 10.1109/Icnisc.2015.105. WOS:000380542600064.