Deep Learning outperforms physicians in myopathy and neuropathy classification based on Needle Electromyography Signal

Ilhan Yoo; Jaesung Yoo; Dongmin Kim; Ina Youn; Hyodong Kim; Michelle Youn; Jun Hee Won; Woosup Cho; Youho Myong; Sehoon Kim; Ri Yu; Sung-Min Kim; Kwangsoo Kim; Seung-Bo Lee; Keewon Kim

doi:10.1371/journal.pone.0339691

Abstract

Needle electromyography (nEMG) is a valuable tool for diagnosing patients with neuromuscular diseases. However, it is labor-intensive and is prone to diagnostic inaccuracies stemming from human biases. To address these challenges, we validated an nEMG diagnosis-aiding system with minimal preprocessing using deep learning model to classify patients into three categories: normal, myopathy, and neuropathy. Using 376 nEMG signals from 57 patients from a tertiary university hospital database through nested k-fold cross validation, deep learning model surpassed the classification performance of six electromyographers. The median patient classification accuracy, precision, sensitivity, and specificity of the deep learning model was 0.70, 0.70, 0.70, and 0.85, respectively, whereas those of the physicians were 0.55, 0.60, 0.54, and 0.78, respectively. Model interpretability and failure analysis showed that the deep learning model classifies based on relevant signal features. Despite higher accuracy of DL model, the number of unanimously misclassified cases were higher in the DL model than physicians. Our study validates deep learning is a fast, accurate, and practical application to aid physicians in diagnosing patients using nEMG signals.

Citation: Yoo I, Yoo J, Kim D, Youn I, Kim H, Youn M, et al. (2026) Deep Learning outperforms physicians in myopathy and neuropathy classification based on Needle Electromyography Signal. PLoS One 21(5): e0339691. https://doi.org/10.1371/journal.pone.0339691

Editor: Claudia Brogna, Fondazione Policlinico Universitario Gemelli IRCCS, ITALY

Received: May 3, 2025; Accepted: December 9, 2025; Published: May 19, 2026

Copyright: © 2026 Yoo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data can be accessed at: https://doi.org/10.34740/kaggle/ds/9946640.

Funding: This research was supported by the Patient-Centered Clinical Research Coordinating Center (PACEN) funded by the Ministry of Health & Welfare, Republic of Korea (Grant Number: HC21C0064, https://www.mohw.go.kr/eng/). Keewon Kim has received the fundings. The funders were not involved in any parts of the study.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Needle electromyography (nEMG) is an essential electrophysiological measurement that is utilized as a diagnostic test, to diagnose neuromuscular diseases. It is used alongside clinical evaluation, serum studies, tissue biopsy, and genetic testing, with converging evidence supporting the diagnosis. In nEMG, a needle electrode is inserted into a muscle and records motor unit action potentials (MUAP) generated by the nerves, muscles, and neuromuscular junctions during volitional state [1–6]. The nEMG signal, comprised of MUAPs, reflects the anatomical and physiological states of peripheral nerves and muscles, and its signal abnormalities are used to diagnose neuropathy and myopathy [1–6]. Patients with neuropathy commonly exhibit MUAPs with large amplitudes, long durations, and reduced recruitment, whereas those with myopathy commonly exhibit MUAPs with small amplitudes, short durations, and early recruitment [1,5–12]. In clinical settings, MUAP could plays a critical role in distinguishing neuropathy and myopathy as their symptoms are sometimes similar [1,5–12].

Despite the important role of nEMG in distinguishing neuromuscular diseases, nEMG evaluation possesses some limitations. First, the accuracy of nEMG evaluation is reliant on the examiner’s proficiency in which the reliability may vary across examiners by 62–81% [13]. Second, the manual evaluation of nEMG signal abnormalities is labor intensive. The rising incidence of neuromuscular diseases increases pressure on physicians as more patients require nEMG evaluations [14–17]. Thus, developing a precise and efficient method for interpreting nEMG data could aid rapid and accurate diagnoses by physicians.

In recent years, deep learning (DL) [18–21] has been used to develop fast and efficient prediction models by leveraging large clinical datasets including electrocardiographic and electroencephalographic data [22–25]. Studies have shown that the clinical performance of the DL model was comparable to or surpassed that of humans [26–28]. While there have been DL studies with EMG data [29–34], the performance of DL in analyzing MUAP signals and validating its clinical practicality remains unknown.

To lift the practical challenges of nEMG diagnosis, we validated an nEMG diagnosis-aiding system with a DL model that classifies patients into myopathy, neuropathy, or normal state based on nEMG signals during volitional state. After training our DL model, we also investigated how our DL model is making decisions with an interpretability tool.

Materials and methods

Study design and preparation

We retrospectively reviewed the electronic medical records of individuals who visited Seoul National University Hospital and underwent nEMG between June 10th 2015 and August 10th 2020. The queried data contained identifiable patient IDs, which were excluded from all downstream analyses. Patients who showed chronic symptoms for at least three months were selected. All myopathic patients underwent biopsy, and those with congenital myopathy also received genetic testing. Polyneuropathy in neuropathic patients was diagnosed based on clinical symptoms, nerve conduction studies, and MRI scans. This study was approved by the Institutional Review Board (IRB) of Seoul National University Hospital (No. 2008-055-1147) and informed consent was not obtained because the study was retrospective. The data was accessed throughout the study period, from February 1st 2021 to August 31st, 2024.

nEMG measurement was performed using the Nicolet EDX EMG system and monopolar needle electrode (Natus, Middleton, WI, USA). The filter was set at 20 Hz (low cut) and 10 kHz (high cut). The nEMG signals were recorded with a sampling rate of 48 kHz. Target muscle was selected by certified neurologists or physiatrists. The nEMG electrophysiological diagnosis is as follows. The signal during resting state was inspected for several seconds after initially inserting a needle. Subsequently, signal during minimal, moderate, and maximal muscle contractions was inspected. The diagnosis of the patient was determined comprehensively considering duration and amplitude of individual MUAP, and the recruitment and interference pattern of MUAPs. The nEMG signal during muscle contraction was saved to the system and the resting potentials were discarded. Signal artifacts from needle insertion or patient movements at the beginning and the end of the signal were removed.

Initially, 20 patients each diagnosed with myopathy, neuropathy, and normal were randomly selected from the database, resulting with 60 patients in total. After manual curation by certified neurologists and physiatrists, one myopathic and neuropathic patient were excluded due to atypical disease characteristics and poor signal quality. Additionally, one normal-class patient was removed because all signal measurements were shorter than the required 0.4 seconds minimum window for the DL model classification. Certified neurologists and physiatrists reviewed the retrospective nEMG and patient information data, confirming the diagnoses for all patients to ensure reliable DL training as well as DL and physician evaluation. In total, there were 19 patients each diagnosed with myopathy, neuropathy, and normal, total of 57 patients. (Fig 1). The 57 patients were used in DL training and inference.

Download:

Fig 1. Summary of needle electromyography dataset.

The whole dataset was divided in nested k-fold split to train the deep learning model and classify on the test set. The patients and muscle signals without physician classification labels were discarded in evaluating physician and deep learning performance. Abbreviations: M (Myopathy), N (Neuropathy), NL (Normal).

https://doi.org/10.1371/journal.pone.0339691.g001

During evaluation of the DL model and the physicians, the missing classification labels of physicians for signals and patients were removed. The diagnosis classification task is divided into signal classification and patient classification, where accurate patient classification is the goal. The diagnosis labels of each signal follow that of the patient. Patients and signals with any missing classification labels from the physician’s assessment were removed from evaluation. Total of 50 patients and 373 signals were used to evaluate the performance of DL model and the physicians. The summary of the dataset is shown in Fig 1 and the demographic characteristics are presented in Table 1.

Download:

Table 1. Demographic characteristics of the patients. Abbreviations: SD (Standard deviation), NA (Not applicable).

https://doi.org/10.1371/journal.pone.0339691.t001

Classification by the deep learning model

The nEMGNet and divide-and-vote (DiVote) algorithm was used to classify the diagnosis of a patient [34]. The nEMGNet is a fast DL model optimized for diagnosis classification based on raw nEMG signals with minimal preprocessing, and the DiVote algorithm allows to classify the diagnosis of a patient given heterogeneous types, numbers, and duration of muscle signals per patient. The method have shown stable performance and explainability [34] with the highest score of The Checklist for Artificial Intelligence in Medical Imaging (CLAIM) among existing nEMG classification methods, which is a guideline for developing artificial intelligence for medicine [35,36].

The following hyperparameters were used for the nEMGNet and DiVote classifier. The nEMG data were down-sampled to 10 kHz for computational brevity. The nEMGNet-B with 2 residual blocks was used as the DL model. Early stopping [37] was performed by evaluating the accuracy of the validation set every 30 updates and the patience value was set to 100. Cross-entropy loss [38] was used as the loss function with weights inversely proportional to the number of training segments per class. Learning rate of 1e-3, and a batch size of 32 was used in training. Performance was measured through 5 × 3-fold nested cross validation where 5 outer folds were used as test set and the 3 inner folds were used as train and validation set. The nested k-fold cross validation validates the DL model more rigorously compared to conventional k-fold cross validation. Total of 3 DL model with different random seeds were trained for each fold combination, resulting in 9 DL classification results for test set from 3 random seeds and 3 validation folds.

After training and performance evaluation, the characteristic signal patterns that the DL model has learned for each disease type were inspected to ensure the model is classifying disease types based on relevant MUAP patterns. The feature visualization [39] was used which is an explainable DL method.

Classification by physicians

A web-based nEMG signal labelling platform was developed for classification by physicians. Two neurologists and four physiatrists with more than five years and 1000 cases of nEMG interpretation experience were recruited to annotate each nEMG signal and patient without additional clinical information (6 physicians in total). The physicians were informed about which nEMG muscle signals corresponded to each anonymized patient, allowing them to classify the individual muscle nEMG signals and patients accordingly.

Evaluation

The signal and patient classification performance of the DL model and physicians were measured with the following metrics: accuracy, positive predictive value (precision), sensitivity (recall), specificity, and F1 score. All metrics except accuracy are binary classification metrics which were measured by using the one-versus-rest method for each class and averaged over all classes. The area under the receiver operating characteristic curve (AUROC) was not measured during one-vs-rest binary classification evaluation, as threshold-based binary classification performance was not the focus, which is the primary purpose of AUROC. The metrics were calculated using the following formulas:

(1)

(2)

(3)

(4)

(5)

Failure analysis of the DL model and physicians were performed by inspecting signals with highly consistent signal classification labels among DL models and physicians. The signals with 8 out of 9 DL models or 5 out of 6 physicians classifying into the same label were considered as highly consistent. The signals predicted as correct class by the DL model and incorrect class by the physicians were inspected, and vice versa.

Statistical analyses were performed using R statistical software (version 4.1.0; R Foundation for Statistical Computing, Vienna, Austria) and Python programming language (version 3.6; Python Software Foundation, Delaware, United States) with the tableone library [40]. Normal distribution for the continuous variables was assessed using the Shapiro–Wilk test. Differences in categorical and continuous variables across myopathy, neuropathy, and normal states were assessed using Pearson’s chi-square and Kruskal–Wallis tests, respectively. Statistics are expressed as the mean±standard deviation for continuous variables and as a number (%) for categorical variables. A p-value < 0.05 was considered statistically significant.

Results

Signal classification

The signal classification scores of the DL model and physicians are shown in Table 2. The median accuracy of the DL model (0.61) was higher than that of physicians (0.54) with statistical significance (p = 0.045). The ROC curve shows the DL model and physicians’ per-class classification results are similar for myopathy and neuropathy, while the DL model performs better for normal signals (S1 Fig). The precision-recall curve shows the physicians outperform the DL model for myopathy signals whereas the DL model outperforms physicians for normal signals.

Download:

Table 2. Signal classification scores of the deep learning model and physicians. Deep learning model scores are measured with 9 scores from 3 validation folds and 3 random seeds. Physicians scores are measured with 6 scores from 6 physicians. The full score is listed in S1 File.

https://doi.org/10.1371/journal.pone.0339691.t002

The per-class accuracy of myopathy, neuropathy, and normal signals by the DL model were 68.04±4.99%, 63.85±4.18%, and 47.28±8.69% respectively, and 40.91±12.23%, 68.99±9.86%, and 46.45±15.64% by the physicians, respectively (S2 Fig). The DL model classified myopathy, neuropathy, and normal signals with descending per-class accuracy, while the order for physicians were neuropathy, normal, and myopathy. The normal label signals misclassified as neuropathy by physicians were notably high with 50.71±16.02%, which accounts for the physicians’ bias to classify signals as neuropathy. The full signal classification results and scores are can be found in S1 File and S3 File, respectively.

Patient classification

The patient classification scores of the DL model and physicians are shown in Table 3. The median accuracy of the DL model (0.70) was higher than that of physicians (0.55) with statistical significance (p = 0.001). The ROC curve and precision-recall curve shows that the DL model and physicians’ per-class classification results are similar for myopathy and neuropathy, while the DL model performs better for normal label patients (Fig 2).

Download:

Table 3. Patient classification scores of the deep learning model and physicians. Deep learning model scores are measured with 9 scores from 3 validation folds and 3 random seeds. Physicians scores are measured with 6 scores from 6 physicians. The full score is listed in S2 File.

https://doi.org/10.1371/journal.pone.0339691.t003

Download:

Fig 2. Per-class receiver operating characteristic and precision-recall curves of the physicians and the deep learning model in patient classification.

(a) Receiver-operating characteristic curve. (b) Precision-recall curve.

https://doi.org/10.1371/journal.pone.0339691.g002

The per-class accuracy of myopathy, neuropathy, and normal patients by the DL model were 80.70±6.08%, 60.78±5.55%, and 69.84±7.36% respectively, and 46.49±13.04%, 79.41±2.94%, and 32.14±12.20% by the physicians, respectively (Fig 3). The DL model classified myopathy, normal, and neuropathy label patients with descending per-class accuracy, while the order for physicians were neuropathy, myopathy, and normal. The normal label patients misclassified as neuropathy by physicians were notably high with 64.29±14.87%, which accounts for the physicians’ bias to classify patients as neuropathy as well as the signals. The full patient classification results and scores are can be found in S2 File and S4 File, respectively.

Download:

Fig 3. Confusion matrices of the deep learning model and physicians in patient classification.

Entries indicate mean ± standard deviation. (a) Classification by deep learning models. (b) Classification by physicians.

https://doi.org/10.1371/journal.pone.0339691.g003

The median accuracy of the DL model is significantly higher (p < 0.001) for patient classification (0.70) than signal classification (0.61) which is an observed improvement from previous work [34]. However, the median accuracy of physicians were not significantly higher (p = 1.00) for patient (0.55) and signal (0.54) classification.

Learned features of the deep learning model

The signals generated from feature visualization represent the signals that the DL model perceives most likely as myopathy, neuropathy, or normal (Fig 4). The signal characteristics the DL model has learned to capture were visually similar to the typical characteristic nEMG signals of neuropathy, myopathy, and normal states. Waveforms that were interpreted as myopathy to the DL model showed early recruitment patterns (small amplitudes and short durations, Fig 4a), whereas those interpreted as neuropathy showed delayed recruitment patterns (high amplitudes and long durations, Fig 4b). Thus, the feature visualization results validated that the DL model predictions were based on relevant features not artifacts.

Download:

Fig 4. Example signals that the deep learning model is most likely to predict as belonging to each class.

The signals are generated artificially using feature visualization.

https://doi.org/10.1371/journal.pone.0339691.g004

Failure analysis of the deep learning model and physicians

To inspect the vulnerability of the DL model and the physicians in recognizing the signal patterns, the misclassified signals were investigated where majority of DL model classified correctly and physicians did not, and vice versa (Fig 5). While the signal classification accuracy of the DL model was higher than the physicians (Table 2), the majority of DL models misclassified 10 signals which more than how majority of physicians misclassified 3 signals. This can be explained by physicians having inter-rater variability [13] while the DL models tend to be unanimous in their prediction results.

Download:

Fig 5. Number of misclassified signals.

The entries indicate the number of signals. (a) Signals that the majority of deep learning model classified correctly and majority of physicians misclassified. (b) Signals that majority of physicians classified correctly and majority of deep learning model misclassified. Majority is defined as 8 out of 9 deep learning models or 5 out of 6 physicians.

https://doi.org/10.1371/journal.pone.0339691.g005

Among the signals which majority of the DL model were correct but the physicians were not (Fig 5a), the myopathic signal misclassified as normal shows many motor units with small amplitude and early recruitment which indicates myopathic characteristics (Fig 6a). The normal signal misclassified as neuropathy might be interpretted as neuropathy from its large amplitude and the featues that look like neuropathic MUAP caused by needle insertion noise, but the recruitment and inference patterns indicate normal characteristics which the DL model captured when the physicians could not (Fig 6b).

Download:

Fig 6. Misclassified signals.

(a)-(b): Signals that majority of deep learning model classified correctly and majority of physicians misclassified. (c)-(d): Signals that majority of physicians classified correctly and majority of deep learning model misclassified. Abbreviations: M (Myopathy), N (Neuropathy), NL (Normal).

https://doi.org/10.1371/journal.pone.0339691.g006

Among the signals which majority of the physicians were correct but the DL model were not (Fig 5b), the neuropathic signal misclassified as myopathy shows reduced recruitment, a characteristic of neuropathy, along with small amplitude (Fig 6c). The physicians considered the recruitment patterns and amplitude while the DL model may have focused on the small amplitude. This signal appears to be measured in a poor volitional state, indicative of a patient with severe neuropathy, which contrasts with other signals in the DL training data that were recorded with adequate volition. The neuropathic signal misclassified as normal may have some characteristics of normal signal caused by needle insertion noise, but the large amplitude and reduced ‌‌recruitment indicates neuropathy characteristics (Fig 6d).

Discussion

The results of our study indicate that the DL model can classify myopathy, neuropathy, and normal patients with higher performance than physicians, based on nEMG signals during volitional state. The algorithm mitigates the heterogeneity of patients’ signal types and quantities [34], an important step in classifying neuromuscular disease of patients [36]. The algorithm uses raw data with no preprocessing, which makes it practical and applicable to clinical assistance. The DL model classifies signals in under 1 second [34], supporting clinical applicability and potential real-time predictions. Additionally, the algorithm’s high CLAIM score underscores its rigor among recent nEMG disease classification algorithms [36]. The post-hoc analysis with feature visualization indicates that the DL model is capturing signal characteristics that resemble each disease type (Fig 4), and the misclassified signals by the DL model indicate relatively ambiguous features that the model could improve upon capturing (Fig 6). Therefore, our findings suggest that the DL model is an accurate, fast, and relatively stable method for aiding EMG diagnosis.

The classification pattern of physicians was certainly different from the DL model. First, the patient classification accuracy of 0.55 was lower than expected. We find the reason from the study cohort where 19 patients were from each myopathy, neuropathy, and normal class. The disease type prevalence of the cohort was considerably higher than the real-world prevalence of approximately 200 per 100,000 individuals [41]. Additionally, the distribution of patients diagnosed with myopathy, neuropathy, and normal at Seoul National University Hospital was 61 (0.8%), 3544 (47.9%), and 3791 (51.3%), respectfully. This ratio differs from that of our study cohort even within a hospital setting. Physicians may be less familiar with classifying less prevalent neuromuscular diseases, such as myopathy. This study highlights this bias, which can be addressed to enhance diagnostic accuracy. Second, the classification by physicians differed from the real-world diagnostic process. Physicians usually consider both the nEMG signals and additional patient information holistically, such as patient history, demographics, and symptoms, which were absent in this study. Third, there was no significant performance improvement from signal classification to patient classification (Table 2,3)). The physicians classified the patients based on signal classification, potentially biasing them to assign the same class to both signals and patients. In contrast, the algorithm classified the patient by integrating the signal classification results independently. Finally, the physicians had a lower consensus in their predictions than the DL model resulting in smaller number of misclassified signals by majority of the raters (Fig 5). This indicates that the inter-rater variability actually works in favor for reducing misclassification among many raters, working like a buffer when new or ambiguous signal is presented, which is a caveat DL model presents from its high consensus prediction results.

The study has several limitations, suggesting important future directions for research. First, an external validation could provide a stronger evidence for the effectiveness of the nEMG diagnosis-aiding system. Evaluating both physicians and the DL model on highly curated data is more critical than relying solely on large datasets, as performance assessment becomes difficult if the test data is unreliable. While our data is highly curated and provides a robust foundation for training a DL model and evaluating both DL model and physicians, including data from multiple institutions or using prospective data could enhance the model’s applicability as a diagnostic aid for neuromuscular diseases.

Second, a larger nEMG dataset could help validate the stability and performance of the DL model. We employed nested k-fold cross validation, a more rigorous approach than conventional k-fold cross-validation, to address the small sample size and ensure the model’s generalizability. Nevertheless, a larger curated nEMG cohort could further confirm the DL model’s performance. Additionally, larger cohort would likely include more diverse subtypes of neuromuscular diseases such as inclusion body myositis, which shows both neuropathic and myopathic features simultaneously in nEMG. This would enable the development of a classifier model with detailed disease subtypes [42].

In conclusion, we demonstrated that a simple and fast DL method can analyze nEMG signals with higher signal and patient classification accuracy than physicians. Although the DL model misclassifies based on ambiguous and relevant signal features, it also exhibited more unanimously misclassified signals than physicians. Therefore, under physician supervision to ensure safety, the DL model can serve as an effective tool to accelerate diagnosis [43].

Supporting information

S1 Fig. Per-class receiver operating characteristic and precision-recall curves of the physicians and the deep learning model in signal classification.

(a) Receiver-operating characteristic curve. (b) Precision-recall curve.

https://doi.org/10.1371/journal.pone.0339691.s001

(PNG)

S2 Fig. Confusion matrices of the deep learning model and physicians in signal classification.

Entries indicate mean ± standard deviation.

https://doi.org/10.1371/journal.pone.0339691.s002

(PNG)

S1 File. Full signal classification scores.

https://doi.org/10.1371/journal.pone.0339691.s003

(XLSX)

S2 File. Full patient classification scores.

https://doi.org/10.1371/journal.pone.0339691.s004

(XLSX)

S3 File. Signal classification results.

https://doi.org/10.1371/journal.pone.0339691.s005

(XLSX)

S4 File. Patient classification results.

https://doi.org/10.1371/journal.pone.0339691.s006

(XLSX)

Acknowledgments

The code for nEMGNet and divide-and-vote (DiVote) algorithm can be found in the following link (https://github.com/jsyoo61/nEMGNet_DiVote). The dataset is available online (https://doi.org/10.34740/kaggle/ds/9946640).

References

1. Daube JR, Rubin DI. Needle electromyography. Muscle Nerve. 2009;39(2):244–70. pmid:19145648
- View Article
- PubMed/NCBI
- Google Scholar
2. Kimura J. Electrodiagnosis in diseases of nerve and muscle: principles and practice. Oxford University Press. 2013.
3. Mills KR. The basics of electromyography. J Neurol Neurosurg Psychiatry. 2005;76 Suppl 2(Suppl 2):ii32-5. pmid:15961866
- View Article
- PubMed/NCBI
- Google Scholar
4. Oh SJ. Clinical electromyography: nerve conduction studies. Lippincott Williams & Wilkins. 2003.
5. Rubin DI. Needle electromyography: Basic concepts. Handb Clin Neurol. 2019;160:243–56. pmid:31277852
- View Article
- PubMed/NCBI
- Google Scholar
6. Whittaker RG. The fundamentals of electromyography. Pract Neurol. 2012;12(3):187–94. pmid:22661353
- View Article
- PubMed/NCBI
- Google Scholar
7. Aminoff MJ, Goodin DS, Parry GJ, Barbaro NM, Weinstein PR, Rosenblum ML. Electrophysiologic evaluation of lumbosacral radiculopathies: electromyography, late responses, and somatosensory evoked potentials. Neurology. 1985;35(10):1514–8. pmid:2993952
- View Article
- PubMed/NCBI
- Google Scholar
8. Bromberg MB. The motor unit and quantitative electromyography. Muscle Nerve. 2020;61(2):131–42. pmid:31579956
- View Article
- PubMed/NCBI
- Google Scholar
9. Gutiérrez Gutiérrez G, Barbosa López C, Navacerrada F, Miralles Martínez A. Use of electromyography in the diagnosis of inflammatory myopathies. Reumatología Clínica (English Edition). 2012;8(4):195–200.
- View Article
- Google Scholar
10. Leblhuber F, Reisecker F, Boehm-Jurkovic H, Witzmann A, Deisenhammer E. Diagnostic value of different electrophysiologic tests in cervical disk prolapse. Neurology. 1988;38(12):1879–81. pmid:3194066
- View Article
- PubMed/NCBI
- Google Scholar
11. Sawada K, Horii M, Imoto D, Ozaki K, Toyama S, Saitoh E, et al. Usefulness of Electromyography to Predict Future Muscle Weakness in Clinically Unaffected Muscles of Polio Survivors. PM R. 2020;12(7):692–8. pmid:31702870
- View Article
- PubMed/NCBI
- Google Scholar
12. Tonzola RF, Ackil AA, Shahani BT, Young RR. Usefulness of electrophysiological studies in the diagnosis of lumbosacral root disease. Ann Neurol. 1981;9(3):305–8. pmid:6261675
- View Article
- PubMed/NCBI
- Google Scholar
13. Kendall R, Werner RA. Interrater reliability of the needle examination in lumbosacral radiculopathy. Muscle Nerve. 2006;34(2):238–41. pmid:16609977
- View Article
- PubMed/NCBI
- Google Scholar
14. Arthur KC, Calvo A, Price TR, Geiger JT, Chiò A, Traynor BJ. Projected increase in amyotrophic lateral sclerosis from 2015 to 2040. Nat Commun. 2016;7:12408. pmid:27510634
- View Article
- PubMed/NCBI
- Google Scholar
15. Longinetti E, Fang F. Epidemiology of amyotrophic lateral sclerosis: an update of recent literature. Curr Opin Neurol. 2019;32(5):771–6. pmid:31361627
- View Article
- PubMed/NCBI
- Google Scholar
16. Parker MJS, Oldroyd A, Roberts ME, Ollier WE, New RP, Cooper RG, et al. Increasing incidence of adult idiopathic inflammatory myopathies in the City of Salford, UK: a 10-year epidemiological study. Rheumatol Adv Pract. 2018;2(2):rky035. pmid:31431976
- View Article
- PubMed/NCBI
- Google Scholar
17. Rose L, McKim D, Leasa D, Nonoyama M, Tandon A, Bai YQ, et al. Trends in incidence, prevalence, and mortality of neuromuscular disease in Ontario, Canada: A population-based retrospective cohort study (2003-2014). PLoS One. 2019;14(3):e0210574. pmid:30913206
- View Article
- PubMed/NCBI
- Google Scholar
18. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press. 2016.
19. Cahan EM, Hernandez-Boussard T, Thadaney-Israni S, Rubin DL. Putting the data before the algorithm in big data addressing personalized healthcare. NPJ Digit Med. 2019;2:78. pmid:31453373
- View Article
- PubMed/NCBI
- Google Scholar
20. Yoo J, Choi S, Yang YS, Kim S, Choi J, Lim D, et al. Review learning: Real world validation of privacy preserving continual learning across medical institutions. Comput Biol Med. 2025;192(Pt B):110239. pmid:40339524
- View Article
- PubMed/NCBI
- Google Scholar
21. Yoo J, Torre F de la, Yang GR. Dual Policy as Self-Model for Planning. JKIIS. 2024;34(1):15–24.
- View Article
- Google Scholar
22. Alfaras M, Soriano MC, Ortín S. A fast machine learning model for ECG-based heartbeat classification and arrhythmia detection. Frontiers in Physics. 2019;7.
- View Article
- Google Scholar
23. Lu X, Wu Y, Yan R, Cao S, Wang K, Mou S, et al. Pulse waveform analysis for pregnancy diagnosis based on machine learning. In: 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2018. 1075–9. https://doi.org/10.1109/iaeac.2018.8577535
24. Gemein LAW, Schirrmeister RT, Chrabąszcz P, Wilson D, Boedecker J, Schulze-Bonhage A, et al. Machine-learning-based diagnostics of EEG pathology. Neuroimage. 2020;220:117021. pmid:32534126
- View Article
- PubMed/NCBI
- Google Scholar
25. Roy Y, Banville H, Albuquerque I, Gramfort A, Falk TH, Faubert J. Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng. 2019;16(5):051001. pmid:31151119
- View Article
- PubMed/NCBI
- Google Scholar
26. Bien N, Rajpurkar P, Ball RL, Irvin J, Park A, Jones E, et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLoS Med. 2018;15(11):e1002699. pmid:30481176
- View Article
- PubMed/NCBI
- Google Scholar
27. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25(1):65–9. pmid:30617320
- View Article
- PubMed/NCBI
- Google Scholar
28. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11):e1002686. pmid:30457988
- View Article
- PubMed/NCBI
- Google Scholar
29. Akef Khowailed I, Abotabl A. Neural muscle activation detection: A deep learning approach using surface electromyography. J Biomech. 2019;95:109322. pmid:31466716
- View Article
- PubMed/NCBI
- Google Scholar
30. Atzori M, Cognolato M, Müller H. Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands. Front Neurorobot. 2016;10:9. pmid:27656140
- View Article
- PubMed/NCBI
- Google Scholar
31. Nam S, Sohn MK, Kim HA, Kong H-J, Jung I-Y. Development of Artificial Intelligence to Support Needle Electromyography Diagnostic Analysis. Healthc Inform Res. 2019;25(2):131–8. pmid:31131148
- View Article
- PubMed/NCBI
- Google Scholar
32. Nodera H, Osaki Y, Yamazaki H, Mori A, Izumi Y, Kaji R. Deep learning for waveform identification of resting needle electromyography signals. Clin Neurophysiol. 2019;130(5):617–23. pmid:30870796
- View Article
- PubMed/NCBI
- Google Scholar
33. Wei W, Dai Q, Wong Y, Hu Y, Kankanhalli M, Geng W. Surface-Electromyography-Based Gesture Recognition by Multi-View Deep Learning. IEEE Trans Biomed Eng. 2019;66(10):2964–73. pmid:30762526
- View Article
- PubMed/NCBI
- Google Scholar
34. Yoo J, Yoo I, Youn I, Kim S-M, Yu R, Kim K, et al. Residual one-dimensional convolutional neural network for neuromuscular disorder classification from needle electromyography signals with explainability. Comput Methods Programs Biomed. 2022;226:107079. pmid:36191354
- View Article
- PubMed/NCBI
- Google Scholar
35. Mongan J, Moy L, Kahn CE Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell. 2020;2(2):e200029. pmid:33937821
- View Article
- PubMed/NCBI
- Google Scholar
36. de Jonge S, Potters WV, Verhamme C. Artificial intelligence for automatic classification of needle EMG signals: A scoping review. Clin Neurophysiol. 2024;159:41–55. pmid:38246117
- View Article
- PubMed/NCBI
- Google Scholar
37. Prechelt L. Early stopping - but when?. Berlin, Heidelberg: Springer Berlin Heidelberg. 1998. p. 55–69. https://doi.org/10.1007/3-540-49430-8_3
38. Zhang Z, Sabuncu MR. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. Adv Neural Inf Process Syst. 2018;32:8792–802. pmid:39839708
- View Article
- PubMed/NCBI
- Google Scholar
39. Olah C, Mordvintsev A, Schubert L. Feature Visualization. Distill. 2017;2(11).
- View Article
- Google Scholar
40. Pollard TJ, Johnson AEW, Raffa JD, Mark RG. tableone: An open source Python package for producing summary statistics for research papers. JAMIA Open. 2018;1(1):26–31. pmid:31984317
- View Article
- PubMed/NCBI
- Google Scholar
41. Carey IM, Banchoff E, Nirmalananthan N, Harris T, DeWilde S, Chaudhry UAR, et al. Prevalence and incidence of neuromuscular conditions in the UK between 2000 and 2019: A retrospective study using primary care data. PLoS One. 2021;16(12):e0261983. pmid:34972157
- View Article
- PubMed/NCBI
- Google Scholar
42. Mano T, Iguchi N, Eura N, Iwasa N, Yamada N, Horikawa H, et al. Electromyography varies by stage in inclusion body myositis. Front Neurol. 2024;14:1295396. pmid:38249752
- View Article
- PubMed/NCBI
- Google Scholar
43. Taha MA, Morren JA. The role of artificial intelligence in electrodiagnostic and neuromuscular medicine: Current state and future directions. Muscle Nerve. 2024;69(3):260–72. pmid:38151482
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Daube JR, Rubin DI. Needle electromyography. Muscle Nerve. 2009;39(2):244–70. pmid:19145648
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Kimura J. Electrodiagnosis in diseases of nerve and muscle: principles and practice. Oxford University Press. 2013.

[ref3] 3. Mills KR. The basics of electromyography. J Neurol Neurosurg Psychiatry. 2005;76 Suppl 2(Suppl 2):ii32-5. pmid:15961866
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref4] 4. Oh SJ. Clinical electromyography: nerve conduction studies. Lippincott Williams & Wilkins. 2003.

[ref5] 5. Rubin DI. Needle electromyography: Basic concepts. Handb Clin Neurol. 2019;160:243–56. pmid:31277852
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref6] 6. Whittaker RG. The fundamentals of electromyography. Pract Neurol. 2012;12(3):187–94. pmid:22661353
View Article
PubMed/NCBI
Google Scholar

[16] View Article

[17] PubMed/NCBI

[18] Google Scholar

[ref7] 7. Aminoff MJ, Goodin DS, Parry GJ, Barbaro NM, Weinstein PR, Rosenblum ML. Electrophysiologic evaluation of lumbosacral radiculopathies: electromyography, late responses, and somatosensory evoked potentials. Neurology. 1985;35(10):1514–8. pmid:2993952
View Article
PubMed/NCBI
Google Scholar

[20] View Article

[21] PubMed/NCBI

[22] Google Scholar

[ref8] 8. Bromberg MB. The motor unit and quantitative electromyography. Muscle Nerve. 2020;61(2):131–42. pmid:31579956
View Article
PubMed/NCBI
Google Scholar

[24] View Article

[25] PubMed/NCBI

[26] Google Scholar

[ref9] 9. Gutiérrez Gutiérrez G, Barbosa López C, Navacerrada F, Miralles Martínez A. Use of electromyography in the diagnosis of inflammatory myopathies. Reumatología Clínica (English Edition). 2012;8(4):195–200.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref10] 10. Leblhuber F, Reisecker F, Boehm-Jurkovic H, Witzmann A, Deisenhammer E. Diagnostic value of different electrophysiologic tests in cervical disk prolapse. Neurology. 1988;38(12):1879–81. pmid:3194066
View Article
PubMed/NCBI
Google Scholar

[31] View Article

[32] PubMed/NCBI

[33] Google Scholar

[ref11] 11. Sawada K, Horii M, Imoto D, Ozaki K, Toyama S, Saitoh E, et al. Usefulness of Electromyography to Predict Future Muscle Weakness in Clinically Unaffected Muscles of Polio Survivors. PM R. 2020;12(7):692–8. pmid:31702870
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref12] 12. Tonzola RF, Ackil AA, Shahani BT, Young RR. Usefulness of electrophysiological studies in the diagnosis of lumbosacral root disease. Ann Neurol. 1981;9(3):305–8. pmid:6261675
View Article
PubMed/NCBI
Google Scholar

[39] View Article

[40] PubMed/NCBI

[41] Google Scholar

[ref13] 13. Kendall R, Werner RA. Interrater reliability of the needle examination in lumbosacral radiculopathy. Muscle Nerve. 2006;34(2):238–41. pmid:16609977
View Article
PubMed/NCBI
Google Scholar

[43] View Article

[44] PubMed/NCBI

[45] Google Scholar

[ref14] 14. Arthur KC, Calvo A, Price TR, Geiger JT, Chiò A, Traynor BJ. Projected increase in amyotrophic lateral sclerosis from 2015 to 2040. Nat Commun. 2016;7:12408. pmid:27510634
View Article
PubMed/NCBI
Google Scholar

[47] View Article

[48] PubMed/NCBI

[49] Google Scholar

[ref15] 15. Longinetti E, Fang F. Epidemiology of amyotrophic lateral sclerosis: an update of recent literature. Curr Opin Neurol. 2019;32(5):771–6. pmid:31361627
View Article
PubMed/NCBI
Google Scholar

[51] View Article

[52] PubMed/NCBI

[53] Google Scholar

[ref16] 16. Parker MJS, Oldroyd A, Roberts ME, Ollier WE, New RP, Cooper RG, et al. Increasing incidence of adult idiopathic inflammatory myopathies in the City of Salford, UK: a 10-year epidemiological study. Rheumatol Adv Pract. 2018;2(2):rky035. pmid:31431976
View Article
PubMed/NCBI
Google Scholar

[55] View Article

[56] PubMed/NCBI

[57] Google Scholar

[ref17] 17. Rose L, McKim D, Leasa D, Nonoyama M, Tandon A, Bai YQ, et al. Trends in incidence, prevalence, and mortality of neuromuscular disease in Ontario, Canada: A population-based retrospective cohort study (2003-2014). PLoS One. 2019;14(3):e0210574. pmid:30913206
View Article
PubMed/NCBI
Google Scholar

[59] View Article

[60] PubMed/NCBI

[61] Google Scholar

[ref18] 18. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT Press. 2016.

[ref19] 19. Cahan EM, Hernandez-Boussard T, Thadaney-Israni S, Rubin DL. Putting the data before the algorithm in big data addressing personalized healthcare. NPJ Digit Med. 2019;2:78. pmid:31453373
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref20] 20. Yoo J, Choi S, Yang YS, Kim S, Choi J, Lim D, et al. Review learning: Real world validation of privacy preserving continual learning across medical institutions. Comput Biol Med. 2025;192(Pt B):110239. pmid:40339524
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref21] 21. Yoo J, Torre F de la, Yang GR. Dual Policy as Self-Model for Planning. JKIIS. 2024;34(1):15–24.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref22] 22. Alfaras M, Soriano MC, Ortín S. A fast machine learning model for ECG-based heartbeat classification and arrhythmia detection. Frontiers in Physics. 2019;7.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref23] 23. Lu X, Wu Y, Yan R, Cao S, Wang K, Mou S, et al. Pulse waveform analysis for pregnancy diagnosis based on machine learning. In: 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), 2018. 1075–9. https://doi.org/10.1109/iaeac.2018.8577535

[ref24] 24. Gemein LAW, Schirrmeister RT, Chrabąszcz P, Wilson D, Boedecker J, Schulze-Bonhage A, et al. Machine-learning-based diagnostics of EEG pathology. Neuroimage. 2020;220:117021. pmid:32534126
View Article
PubMed/NCBI
Google Scholar

[79] View Article

[80] PubMed/NCBI

[81] Google Scholar

[ref25] 25. Roy Y, Banville H, Albuquerque I, Gramfort A, Falk TH, Faubert J. Deep learning-based electroencephalography analysis: a systematic review. J Neural Eng. 2019;16(5):051001. pmid:31151119
View Article
PubMed/NCBI
Google Scholar

[83] View Article

[84] PubMed/NCBI

[85] Google Scholar

[ref26] 26. Bien N, Rajpurkar P, Ball RL, Irvin J, Park A, Jones E, et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLoS Med. 2018;15(11):e1002699. pmid:30481176
View Article
PubMed/NCBI
Google Scholar

[87] View Article

[88] PubMed/NCBI

[89] Google Scholar

[ref27] 27. Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med. 2019;25(1):65–9. pmid:30617320
View Article
PubMed/NCBI
Google Scholar

[91] View Article

[92] PubMed/NCBI

[93] Google Scholar

[ref28] 28. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11):e1002686. pmid:30457988
View Article
PubMed/NCBI
Google Scholar

[95] View Article

[96] PubMed/NCBI

[97] Google Scholar

[ref29] 29. Akef Khowailed I, Abotabl A. Neural muscle activation detection: A deep learning approach using surface electromyography. J Biomech. 2019;95:109322. pmid:31466716
View Article
PubMed/NCBI
Google Scholar

[99] View Article

[100] PubMed/NCBI

[101] Google Scholar

[ref30] 30. Atzori M, Cognolato M, Müller H. Deep Learning with Convolutional Neural Networks Applied to Electromyography Data: A Resource for the Classification of Movements for Prosthetic Hands. Front Neurorobot. 2016;10:9. pmid:27656140
View Article
PubMed/NCBI
Google Scholar

[103] View Article

[104] PubMed/NCBI

[105] Google Scholar

[ref31] 31. Nam S, Sohn MK, Kim HA, Kong H-J, Jung I-Y. Development of Artificial Intelligence to Support Needle Electromyography Diagnostic Analysis. Healthc Inform Res. 2019;25(2):131–8. pmid:31131148
View Article
PubMed/NCBI
Google Scholar

[107] View Article

[108] PubMed/NCBI

[109] Google Scholar

[ref32] 32. Nodera H, Osaki Y, Yamazaki H, Mori A, Izumi Y, Kaji R. Deep learning for waveform identification of resting needle electromyography signals. Clin Neurophysiol. 2019;130(5):617–23. pmid:30870796
View Article
PubMed/NCBI
Google Scholar

[111] View Article

[112] PubMed/NCBI

[113] Google Scholar

[ref33] 33. Wei W, Dai Q, Wong Y, Hu Y, Kankanhalli M, Geng W. Surface-Electromyography-Based Gesture Recognition by Multi-View Deep Learning. IEEE Trans Biomed Eng. 2019;66(10):2964–73. pmid:30762526
View Article
PubMed/NCBI
Google Scholar

[115] View Article

[116] PubMed/NCBI

[117] Google Scholar

[ref34] 34. Yoo J, Yoo I, Youn I, Kim S-M, Yu R, Kim K, et al. Residual one-dimensional convolutional neural network for neuromuscular disorder classification from needle electromyography signals with explainability. Comput Methods Programs Biomed. 2022;226:107079. pmid:36191354
View Article
PubMed/NCBI
Google Scholar

[119] View Article

[120] PubMed/NCBI

[121] Google Scholar

[ref35] 35. Mongan J, Moy L, Kahn CE Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell. 2020;2(2):e200029. pmid:33937821
View Article
PubMed/NCBI
Google Scholar

[123] View Article

[124] PubMed/NCBI

[125] Google Scholar

[ref36] 36. de Jonge S, Potters WV, Verhamme C. Artificial intelligence for automatic classification of needle EMG signals: A scoping review. Clin Neurophysiol. 2024;159:41–55. pmid:38246117
View Article
PubMed/NCBI
Google Scholar

[127] View Article

[128] PubMed/NCBI

[129] Google Scholar

[ref37] 37. Prechelt L. Early stopping - but when?. Berlin, Heidelberg: Springer Berlin Heidelberg. 1998. p. 55–69. https://doi.org/10.1007/3-540-49430-8_3

[ref38] 38. Zhang Z, Sabuncu MR. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. Adv Neural Inf Process Syst. 2018;32:8792–802. pmid:39839708
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref39] 39. Olah C, Mordvintsev A, Schubert L. Feature Visualization. Distill. 2017;2(11).
View Article
Google Scholar

[136] View Article

[137] Google Scholar

[ref40] 40. Pollard TJ, Johnson AEW, Raffa JD, Mark RG. tableone: An open source Python package for producing summary statistics for research papers. JAMIA Open. 2018;1(1):26–31. pmid:31984317
View Article
PubMed/NCBI
Google Scholar

[139] View Article

[140] PubMed/NCBI

[141] Google Scholar

[ref41] 41. Carey IM, Banchoff E, Nirmalananthan N, Harris T, DeWilde S, Chaudhry UAR, et al. Prevalence and incidence of neuromuscular conditions in the UK between 2000 and 2019: A retrospective study using primary care data. PLoS One. 2021;16(12):e0261983. pmid:34972157
View Article
PubMed/NCBI
Google Scholar

[143] View Article

[144] PubMed/NCBI

[145] Google Scholar

[ref42] 42. Mano T, Iguchi N, Eura N, Iwasa N, Yamada N, Horikawa H, et al. Electromyography varies by stage in inclusion body myositis. Front Neurol. 2024;14:1295396. pmid:38249752
View Article
PubMed/NCBI
Google Scholar

[147] View Article

[148] PubMed/NCBI

[149] Google Scholar

[ref43] 43. Taha MA, Morren JA. The role of artificial intelligence in electrodiagnostic and neuromuscular medicine: Current state and future directions. Muscle Nerve. 2024;69(3):260–72. pmid:38151482
View Article
PubMed/NCBI
Google Scholar

[151] View Article

[152] PubMed/NCBI

[153] Google Scholar

Figures

Abstract

Introduction

Materials and methods

Study design and preparation

Classification by the deep learning model

Classification by physicians

Evaluation

Results

Signal classification

Patient classification

Learned features of the deep learning model

Failure analysis of the deep learning model and physicians

Discussion

Supporting information

S1 Fig. Per-class receiver operating characteristic and precision-recall curves of the physicians and the deep learning model in signal classification.

S2 Fig. Confusion matrices of the deep learning model and physicians in signal classification.

S1 File. Full signal classification scores.

S2 File. Full patient classification scores.

S3 File. Signal classification results.

S4 File. Patient classification results.

Acknowledgments

References