There is a growing interest in the application of artificial intelligence (AI) to orthopaedic surgery. This review aims to identify and characterise research in this field, in order to understand the extent, range and nature of this work, and act as springboard to stimulate future studies. A scoping review, a form of structured evidence synthesis, was conducted to summarise the use of AI in orthopaedics. A literature search (1946–2019) identified 222 studies eligible for inclusion. These studies were predominantly small and retrospective. There has been significant growth in the number of papers published in the last three years, mainly from the USA (37%). The majority of research used AI for image interpretation (45%) or as a clinical decision tool (25%). Spine (43%), knee (23%) and hip (14%) were the regions of the body most commonly studied. The application of artificial intelligence to orthopaedics is growing. However, the scope of its use so far remains limited, both in terms of its possible clinical applications, and the sub-specialty areas of the body which have been studied. A standardized method of reporting AI studies would allow direct assessment and comparison. Prospective studies are required to validate AI tools for clinical use.
Citation: Federer SJ, Jones GG (2021) Artificial intelligence in orthopaedics: A scoping review. PLoS ONE 16(11): e0260471. https://doi.org/10.1371/journal.pone.0260471
Editor: Thippa Reddy Gadekallu, Vellore Institute of Technology: VIT University, INDIA
Received: May 19, 2021; Accepted: November 11, 2021; Published: November 23, 2021
Copyright: © 2021 Federer, Jones. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The dataset may be found at this URL: https://data.mendeley.com/datasets/xvkr6t263v/1.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Interest in the application of artificial intelligence (AI) in healthcare has surged in recent years . Computer systems are increasingly able to perform tasks that normally require human intelligence, facilitated by improvements in data storage and computer processing. Despite the interest, incorporation of AI into clinical practice is in its infancy . AI tools are currently in use, for example; in segmentation of three-dimensional optical coherence tomography scans to aid referrals in ophthalmology , and detection of atrial fibrillation by a smartphone algorithm and a single lead electrocardiography device in primary care . The increase in digital medical imaging and information collected in databases and orthopaedic registries, provide large datasets ideal for the development of AI algorithms. These have the potential to improve patient care at a number of levels including; diagnosis, management, research and systems analysis .
The volume and variety of data collected from individuals has facilitated the advancement of AI across multiple industries. Concerns regarding how personal data is stored and utilised prompted legislation to protect this information. The General Data Protection Regulation (GDPR) was introduced in the European Union (EU) in 2018, and some medical registries have struggled to gather data in the same volume since. However, registries where patient consent has been a priority, such as the National Joint Registry (NJR) in the UK, have not seen a sharp decrease. The NJR holds information on over 3 million arthroplasty procedures since 2003 . Orthopaedic registries are some of the largest in healthcare and are primed for the application of AI.
Artificial intelligence remains a relatively new field for most orthopaedic surgeons, and understanding the extent, range and nature of work conducted so far is useful as a springboard to identify potential new applications and areas for research. With this goal in mind, we conducted a scoping review, which is a form of structured evidence synthesis suited to this task. The aims were to: 1) identify the number of research studies using AI in orthopaedics and 2) summarize how and where these studies have applied AI to the field of orthopaedics.
A scoping review was chosen due to the breadth of the research topic and the expected variation in study design, and was conducted using the Arksey and O’Malley framework . The PRISMA-ScR checklist was utilised to ensure completeness (S1 Table) .
Literature search and eligible studies
A literature search of studies in English was conducted (1946–2019) using Ovid (Embase & Medline) and Scopus. The search timeframe was chosen to ensure early studies were not missed. The literature search was performed on 30/8/19. The search strategy is shown in Fig 1. The search terms used are shown in S2 and S3 Tables.
PRISMA flow diagram showing the search strategy and number of included and excluded studies.
The review focused on summarising the use of AI in applications relevant to clinical practice rather than related basic science. Hence, the following inclusion criteria were used: (a) studies which directly applied artificial intelligence to orthopaedic clinical practice or b) the outcomes of the study had the potential to be directly applied to orthopaedic clinical practice.
Abstracts, conference proceedings, articles not in English and review, commentary or editorial articles were not eligible for inclusion. Articles relating to the following were also excluded: cancer/oncology, biomechanics, gait analysis without clinical application, image segmentation alone without a direct clinical application, basic science, neuromuscular disorders, rehabilitation, prosthetics, natural language processing of radiology reports and wearable sensors. These articles were excluded to ensure the review maintained a clinical focus and was applicable to a general orthopaedic audience.
The literature search was performed by one investigator (SF). Abstract screening and full text reviews were performed independently by two investigators (SF and GJ). There was full agreement on the studies selected for inclusion. References from the literature search were imported into Mendeley (v1.19.6, Elsevier, Amsterdam, Netherlands) where duplicates were removed. Covidence systematic review software (Veritas Health Innovation Ltd, Melbourne, Australia. Available at www.covidence.org) was used to synthesize and extract eligible studies.
Data extraction and collation
Data was extracted from eligible studies into an evidence table to summarize the following: year of publication, country, area of body, procedure, health condition, orthopaedic care function, study design and number of patients. A formal quality appraisal of eligible studies was not performed as this is beyond the remit of a scoping review. The data collected in the evidence table was used to define the main themes of research and the summarised data represented below.
After removal of duplicates, the search retrieved 3649 documents for title and abstract screening. Of those, 512 met the eligibility criteria for full text screening and 222 met the final inclusion criteria. A reference list of included studies can be found in S4 Table. The study with the earliest publication date, 1989, used a machine learning method (inductive learning) to predict operative findings of disc prolapse or nerve entrapment . 139 studies used one AI technique and 83 used more than one. Machine learning techniques were used 236 times and deep learning techniques 162 times. The most used machine learning techniques were Support Vector Machines, 55 times, and Random Forests, 38 times. Of the studies that used deep learning techniques, 26 implemented convolution layers in their neural networks. Characteristics of all the studies are summarised below in categories of data extraction.
101 studies used AI to interpret an imaging modality to establish a diagnosis. A number of early papers assessed and quantified the curvature of the spine in scoliosis [10–12], and developed algorithms capable of calculating the Cobb angle using surface topography before using radiographs and three-dimensional imaging. Subsequently, AI was applied to the detection of other spinal pathologies e.g disc herniation or vertebral fractures [13–16]. More recently the scope of AI to aid diagnostic imaging has expanded outside of the spine, with uses ranging from the identification of hip fractures to soft tissue meniscal tears in the knee [17–19]. There has also been a shift to algorithms providing a more nuanced grading of disease, rather than binary outputs .
Orthopaedic care function
106 studies used AI to aid diagnostic decision support and 95 studies used AI to predict an aspect of a patient’s care. The first paper to use AI in orthopaedics predicted operative findings during low back surgery . The data comprised of preoperative clinical features and was analysed using an inductive learning method. More recently, research has focused on algorithms predicting patient outcomes post-surgery, utilizing the large orthopaedic data sets collected at local and national level. In particular, two centres in the USA have developed algorithms using local hospital data across different patient groups and procedures [21–25].
Area of body
96 studies focused on the spine, 51 on the knee, 31 on the hip and 24 involved multiple areas. Other areas had 5 publications or fewer (Fig 2).
68 publications related to spinal pathologies, 64 to trauma and 62 to arthritis. Other conditions were reported in 5 studies or fewer.
141 publications did not relate to a specific orthopaedic procedure. 34 related to arthroplasty and 26 to spinal procedures. Other procedures were reported in 5 publications or fewer.
Size of dataset used
There was a large range in the size of dataset used in the studies. The largest dataset used 1106234 patients , the smallest only 4 . The median number of patients used was 250. 68 studies had a dataset of fewer than 100 patients. Arthroplasty registries were the sources of some of the larger datasets with information from over 1 million patients being used to build AI models [23, 26, 28–31].
Year of publication
The number of studies has increased in the last half a decade, with 14 publications in 2016 and 70 in 2019 (Fig 3). Between 1989 and 2010 the maximum number of publications per year was 6.
83 studies (37%) were published from the USA, 24 from Canada, 23 from China, 11 from South Korea and 10 from India. Other countries had fewer than 10 published studies (Fig 4). Several papers from the USA emanate from the same institution, who have applied similar AI models to a range of applications [24, 32, 33].
We have reviewed and summarised the characteristics of 222 publications that included AI and orthopaedics. This scoping review was conducted to establish where and how AI has been used in orthopaedics. We have described the overarching features of these publications to highlight where the research has been focused and guide future avenues of research. The predominant findings were 1) Nearly half of the publications related to imaging interpretation to establish a diagnosis; 2) The spine was the most studied musculoskeletal region; and 3) Predicting patient outcomes is an emerging area of interest. Overall, research in AI and orthopaedics is at an early stage when compared to radiology , for example, but entering a phase of significant growth.
AI was used in 101 publications (45%) to interpret an imaging modality to establish a diagnosis. This focus can be explained by the large volume of organized data acquired during imaging and the relative ease with which AI models can be built to interpret this data. Radiology, accordingly, has seen one of the biggest increases in the use of AI to interpret scans . The overlap between radiology and orthopaedics, for example, in fracture detection  or Cobb angle measurement from radiographs  could also explain the predominance of imaging related studies.
The initial search identified many publications relating to image segmentation, whereby an algorithm is used to automatically segment a specific structure(s), such as an intervertebral disc, from an imaging modality . Papers that described segmentation of normal scans or were unable to detect pathology were not felt to be of direct clinical relevance and hence were excluded. Segmentation is, however, an important step in the process of establishing a diagnosis from imaging and it is relevant to mention the volume of research to date in this area. The use of real-time image segmentation with augmented reality is now being used as a navigation tool in spinal surgery , and this technique could be applied elsewhere in orthopaedics.
The spine, hip and knee were the regions most studied. The joint management of spinal pathology with neurosurgery could explain the greater proportion of papers on the spine. Large arthroplasty registries could suggest why hip and knee have seen more interest than the sub-specialty areas of foot & ankle and hand. More research should be focused on sub-specialty areas other than spine, hip and knee.
A significant volume of research found through the literature search related to translational engineering. A number of studies were published in engineering journals and so may not have reached readers from a clinical background [38–40]. Comparatively few papers from rheumatology were found in this study [41, 42]. This may be due to the inclusion criteria used and the health conditions of interest. The interplay between different specialties and industries presents an opportunity to promote interdisciplinary research. Specialists in data science are needed to progress AI in healthcare, and joint projects between specialties will make future research more efficient.
AI works best with high quality, large datasets. It was noted that the size of dataset in the published literature was highly variable. Sixty-eight (31%) of the studies had fewer than 100 patients. Whilst there is no set minimum dataset size for AI algorithms, the reliability of studies performed using small numbers may be questioned. Registries provided the largest sources of data in publications identified in this study [23, 26, 28–31]. They will continue to be a valuable resource for further studies predicting personalised patient outcomes. Albeit, there is concern that population-based data may be unable to solve clinical problems at a patient level . Data sharing is needed for ongoing training and improvement of AI algorithms . Legislation, such as GDPR, ensures that consent for data sharing is obtained and appropriate security measures are in place for the storage of data. Data privacy and protection is of utmost importance going forward.
There is scope for AI tools to assist in decision making regarding the management of patients. AI models that have been developed to retrospectively look at registry data could be used to design prospective studies. A decision-making aide would be a useful adjunct, for example, in understanding which patients will have favourable outcomes after arthroplasty. Predictive models will also provide insights into cost savings and efficiencies that will be of interest to healthcare providers.
AI is a rapidly advancing discipline with new algorithmic models constantly in development, often described using new and different terminology. Machine learning, deep learning and neural networks are some of the terms encountered in the literature that come under the umbrella term of AI. This variation in terminology has led to differences in how the papers are keyworded and recorded in databases. A PubMed (PubMed.gov, National Center for Biotechnology Information, Bethesda, MD, USA) search of “Artificial Intelligence Orthopaedics” in August 2019 yielded a mere 120 results. It was clear that many appropriate papers were missed and led to refinement of the search strategy for this study. A standardised method of reporting AI studies is currently lacking and would allow direct assessment and comparison of studies. Similarly, consistency in terminology and keywords would allow researchers to search for relevant papers more easily. “Artificial Intelligence” is, perhaps, too broad, and not clearly defined to be used as an umbrella term for keyword searches. We propose that the umbrella term “Machine learning” should be included on all papers for standardisation.
There was a geographical split in the location of papers published. As represented in Fig 3, most papers (n = 83) originated from the USA, followed by Canada (n = 24) and China (n = 23). These results may have been skewed by our inclusion only of papers written in English but highlights the dominance of institutions from the USA. Additionally, it is important to note that the search terms, whilst broader than a previous literature review  were not exhaustive, and despite our best efforts valid publications may have been missed. Some time has passed since the literature search was performed, and progress has been made in AI in orthopaedics and more widely in healthcare. Efforts to quantify the diagnostic accuracy of deep learning in medical imaging and guidelines for reporting such studies are two examples of how the field has progressed [44, 45].
The use of AI in orthopaedics is increasing. Studies using large datasets exist and novel AI tools with the ability to have clinical impact are being developed. More research is needed before the potential of AI can translate to a significant change in the day-to-day clinical practice of orthopaedic surgeons.
S2 Table. Database search terms for Ovid—Embase and Medline.
- 1. Cabitza F, Locoro A, Banfi G. Machine learning in orthopedics: A literature review. Front Bioeng Biotechnol. 2018;6:75. pmid:29998104
- 2. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25(1):30–6. pmid:30617336
- 3. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med. 2018;24(9):1342–50. pmid:30104768
- 4. Himmelreich JCL, Karregat EPM, Lucassen WAM, van Weert HCPM, de Groot JR, Louis Handoko M, et al. Diagnostic accuracy of a smartphone-operated, single-lead electrocardiography device for detection of rhythm and conduction abnormalities in primary care. Ann Fam Med. 2019;17(5):403–11. pmid:31501201
- 5. Panchmatia JR, Visenio MR, Panch T. The role of artificial intelligence in orthopaedic surgery. Br J Hosp Med. 2018;79(12):676–81. pmid:30526106
- 6. Registry NJ. National Joint Registry - 17th Annual Report 2020. Natl Jt Regist. 2020;(December 2019):138. pmid:33439585
- 7. Arksey H, O’Malley L. Scoping studies: Towards a methodological framework. Int J Soc Res Methodol Theory Pract. 2005;8(1):19–32.
- 8. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
- 9. Mathew B, Norris D, Mackintosh I, Waddell G. Artificial intelligence in the prediction of operative findings in low back surgery. Br J Neurosurg. 1989;3(2):161–70. pmid:2679685
- 10. Jaremko JL, Poncet P, Ronsky J, Harder J, Dansereau J, Labelle H, et al. Estimation of spinal deformity in scoliosis from torso surface cross sections. Spine (Phila Pa 1976). 2001;26(14):1583–91. pmid:11462091
- 11. Ramirez L, Durdle NG, Raso VJ, Hill DL. A support vector machines classifier to assess the severity of idiopathic scoliosis from surface topography. IEEE Trans Inf Technol Biomed. 2006;10(1):84–91. pmid:16445253
- 12. Duong L, Cheriet F, Labelle H. Three-dimensional classification of spinal deformities using fuzzy clustering. Spine (Phila Pa 1976). 2006;31(8):923–30. pmid:16622383
- 13. Eller-Vainicher C, Chiodini I, Santi I, Massarotti M, Pietrogrande L, Cairoli E, et al. Recognition of morphometric vertebral fractures by artificial neural networks: Analysis from gismo Lombardia database. PLoS One. 2011;6(11):e27277. pmid:22076144
- 14. Koh J, Chaudhary V, Dhillon G. Disc herniation diagnosis in MRI using a CAD framework and a two-level classifier. Int J Comput Assist Radiol Surg. 2012;7(6):861–9. pmid:22392057
- 15. Oktay AB, Albayrak NB, Akgul YS. Computer aided diagnosis of degenerative intervertebral disc diseases from lumbar MR images. Comput Med Imaging Graph. 2014;38(7):613–9. pmid:24972858
- 16. Hao S, Jiang J, Guo Y, Li H. Active learning based intervertebral disk classification combining shape and texture similarities. Neurocomputing. 2013;101:252–7.
- 17. Saygılı A, Albayrak S. An efficient and fast computer-aided method for fully automated diagnosis of meniscal tears from magnetic resonance images. Artif Intell Med. 2019;97:118–30. pmid:30527276
- 18. Carballido-Gamio J, Yu A, Wang L, Su Y, Burghardt AJ, Lang TF, et al. Hip Fracture Discrimination Based on Statistical Multi-parametric Modeling (SMPM). Ann Biomed Eng. 2019;47(11):2199–212. pmid:31240508
- 19. Couteaux V, Si-Mohamed S, Nempont O, Lefevre T, Popoff A, Pizaine G, et al. Automatic knee meniscus tear detection and orientation classification with Mask-RCNN. Diagn Interv Imaging. 2019;100(4):235–42. pmid:30910620
- 20. Huber FA, Stutz S, Vittoria de Martini I, Mannil M, Becker AS, Winklhofer S, et al. Qualitative versus quantitative lumbar spinal stenosis grading by machine learning supported texture analysis—Experience from the LSOS study cohort. Eur J Radiol. 2019;114:45–50. pmid:31005175
- 21. Haeberle HS, Helm JM, Navarro SM, Karnuta JM, Schaffer JL, Callaghan JJ, et al. Artificial Intelligence and Machine Learning in Lower Extremity Arthroplasty: A Review. J Arthroplasty. 2019;34(10):2201–3. pmid:31253449
- 22. Lee HK, Jin R, Feng Y, Bain PA, Goffinet J, Baker C, et al. An Analytical Framework for TJR Readmission Prediction and Cost-Effective Intervention. IEEE J Biomed Heal Informatics. 2019;23(4):1760–72. pmid:30047916
- 23. Ramkumar PN, Karnuta JM, Navarro SM, Haeberle HS, Scuderi GR, Mont MA, et al. Deep Learning Preoperatively Predicts Value Metrics for Primary Total Knee Arthroplasty: Development and Validation of an Artificial Neural Network Model. J Arthroplasty. 2019;34(10):2220–2227.e1. pmid:31285089
- 24. Karhade A V., Ogink PT, Thio QCBS, Broekman MLD, Cha TD, Hershman SH, et al. Machine learning for prediction of sustained opioid prescription after anterior cervical discectomy and fusion. Spine J. 2019;19(6):976–83. pmid:30710731
- 25. Karhade A V., Ogink P, Thio Q, Broekman M, Cha T, Gormley WB, et al. Development of machine learning algorithms for prediction of discharge disposition after elective inpatient surgery for lumbar degenerative disc disorders. Neurosurg Focus. 2018;45(5):E6. pmid:30453463
- 26. Hyer JM, Ejaz A, Tsilimigras DI, Paredes AZ, Mehta R, Pawlik TM. Novel Machine Learning Approach to Identify Preoperative Risk Factors Associated with Super-Utilization of Medicare Expenditure Following Surgery. JAMA Surg. 2019;154(11):1014–21. pmid:31411664
- 27. D’Lima DD, Patil S, Steklov N, Colwell CW. ‘Lab’-in-a-Knee: In vivo knee forces, kinematics, and contact analysis. Clin Orthop Relat Res. 2011;469(10):2953–70. pmid:21598121
- 28. Huber M, Kurz C, Leidl R. Predicting patient-reported outcomes following hip and knee replacement surgery using supervised machine learning. BMC Med Inform Decis Mak. 2019;19(1):3. pmid:30621670
- 29. Karnuta JM, Navarro SM, Haeberle HS, Helm JM, Kamath AF, Schaffer JL, et al. Predicting Inpatient Payments Prior to Lower Extremity Arthroplasty Using Deep Learning: Which Model Architecture Is Best? J Arthroplasty. 2019;34(10):2235–2241.e1. pmid:31230954
- 30. Harris AHS, Kuo AC, Weng Y, Trickey AW, Bowe T, Giori NJ. Can Machine Learning Methods Produce Accurate and Easy-to-use Prediction Models of 30-day Complications and Mortality after Knee or Hip Arthroplasty? Clin Orthop Relat Res. 2019;477(2):452–60. pmid:30624314
- 31. Han SS, Azad TD, Suarez PA, Ratliff JK. A machine learning approach for predictive models of adverse events following spine surgery. Spine J. 2019;19(11):1772–81. pmid:31229662
- 32. Karhade A V., Schwab JH, Bedair HS. Development of Machine Learning Algorithms for Prediction of Sustained Postoperative Opioid Prescriptions After Total Hip Arthroplasty. J Arthroplasty. 2019;34(10):2272–2277.e1. pmid:31327647
- 33. Ogink PT, Karhade A V., Thio QCBS, Gormley WB, Oner FC, Verlaan JJ, et al. Predicting discharge placement after elective surgery for lumbar spinal stenosis using machine learning methods. Eur Spine J. 2019; pmid:30941521
- 34. Saba L, Biswas M, Kuppili V, Cuadrado Godia E, Suri HS, Edla DR, et al. The present and future of deep learning in radiology. Eur J Radiol. 2019;114(September 2018):14–24. pmid:31005165
- 35. Zhang J, Li H, Lv L, Zhang Y. Computer-Aided Cobb Measurement Based on Automatic Detection of Vertebral Slopes Using Deep Neural Network. Int J Biomed Imaging. 2017;2017:9083916. pmid:29118806
- 36. Zhu X, He X, Wang P, He Q, Gao D, Cheng J, et al. A method of localization and segmentation of intervertebral discs in spine MRI based on Gabor filter bank. Biomed Eng Online. 2016;15(1):32. pmid:27000749
- 37. Auloge P, Cazzato RL, Ramamurthy N, de Marini P, Rousseau C, Garnon J, et al. Augmented reality and artificial intelligence-based navigation during percutaneous vertebroplasty: a pilot randomised clinical trial. Eur Spine J. 2020;29(7):1580–9. pmid:31270676
- 38. Baka N, Leenstra S, Van Walsum T. Ultrasound Aided Vertebral Level Localization for Lumbar Surgery. IEEE Trans Med Imaging. 2017;36(10):2138–47. pmid:28809678
- 39. Chalmers E, Pedrycz W, Lou E. Human experts’ and a fuzzy model’s predictions of outcomes of scoliosis treatment: A comparative analysis. IEEE Trans Biomed Eng. 2015;62(3):1001–7. pmid:25494498
- 40. Duong L, Cheriet F, Labelle H. Automatic detection of scoliotic curves in posteroanterior radiographs. IEEE Trans Biomed Eng. 2010;57(5):1143–51. pmid:20142161
- 41. Hussain D, Han SM. Computer-aided osteoporosis detection from DXA imaging. Comput Methods Programs Biomed [Internet]. 2019;173:87–107. Available from: http://www.elsevier.com/locate/cmpb pmid:31046999
- 42. Zhang M, Gong H, Zhang K, Zhang M. Prediction of lumbar vertebral strength of elderly men based on quantitative computed tomography images using machine learning. Osteoporos Int. 2019;30(11):2271–82. pmid:31401661
- 43. Bayliss L, Jones LD. The role of artificial intelligence and machine learning in predicting orthopaedic outcomes. Bone Jt J. 2019;101-B(12):1476–8. pmid:31786999
- 44. Aggarwal R, Sounderajah V, Martin G, Ting DSW, Karthikesalingam A, King D, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. npj Digit Med. 2021;4(1). pmid:33828217
- 45. Sounderajah V, Ashrafian H, Golub RM, Shetty S, De Fauw J, Hooft L, et al. Developing a reporting guideline for artificial intelligence-centred diagnostic test accuracy studies: the STARD-AI protocol. BMJ Open. 2021;11(6):e047709. pmid:34183345