Figures
Abstract
Background
Degenerative meniscus tears are often accompanied by varying degrees of osteoarthritis, making the prognostic outcome of arthroscopic partial meniscectomy (APM) difficult to predict. Our research objective is to develop and validate a multimodal deep learning radiology (MDLR) model based on the integration of multimodal data using deep learning radiology (DLR) scores from preoperative magnetic resonance imaging (MRI) images and clinical variables.
Materials and methods
From February 2020 to February 2022, 452 eligible patients with degenerative meniscus tear who underwent APM were retrospectively enrolled in cohorts. DLR features were extracted from MRI of the patient’s knee. Then, an MDLR model was used for the patient prognosis after arthroscopy. The MDLR model for prognostic risk stratification incorporated DLR signatures and clinical variable.
Results
The standalone DLR model performed poorly, with a micro average receiver operating characteristic (ROC) curve and macro average ROC line of 0.780 and 0.765 in the training set, 0.747 and 0.747 in the validation set, and 0.720 and 0.732 in the test set, respectively, for predicting postoperative outcomes in degenerative meniscus tears. Multivariate analysis identified gender, height, weight, duration of pain, ESR, and VAS as indicators of poor prognosis. After combining the above clinical features, the performance of the MDLR model has been significantly improved, with the best performance achieved under the Light Gradient Boosting Machine (GBM) algorithm. The micro average ROC curve and macro average ROC line of this model for predicting the postoperative effect of degenerative meniscus tear were 0.917 and 0.919 in the training set, 0.874 and 0.882 in the validation set, and 0.921 and 0.951 in the test set, respectively. With these variables, the MDLR model provides four levels of prognosis for arthroscopic partial meniscectomy: Poor, pain relief 0–25%, Average, pain relief 25–50%, Good, pain relief 50–75%, Excellent, pain relief 75–100%.
Conclusion
A tool based on MDLR was developed to consider that the pain exacerbation time is an important prognosis factor for arthroscopic partial meniscectomy in degenerative meniscus tear patients. MDLR showed outstanding performance for the prognostic efficiency stratification of degenerative meniscus tear patients who underwent arthroscopic partial meniscectomy and may help physicians with therapeutic decision making and surveillance strategy selection in clinical practice.
Citation: He Y, Wei J, Sun Y, Bao W, Huang D, Fan Y, et al. (2025) A multimodal deep learning radiomics model for predicting degenerative meniscus tear after arthroscopy. PLoS One 20(8): e0328299. https://doi.org/10.1371/journal.pone.0328299
Editor: Wencai Liu, Shanghai Jiaotong University: Shanghai Jiao Tong University, CHINA
Received: April 29, 2025; Accepted: June 28, 2025; Published: August 13, 2025
Copyright: © 2025 He et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and Supporting Information files.
Funding: The study supported by the Medical Science and Technology Research Program of Chongqing Banan Science and Technology Bureau and Chongqing Banan Health Commission (Grant No. BNWJ202500130).
Competing interests: NO authors have competing interests.
Abbreviations: MDLR, multimodal deep learning radiology; DLR, deep learning radiology; DL, deep learning; APM, arthroscopic partial meniscectomy; MRI, magnetic resonance imaging; ROC, receiver operating characteristic; ACC, Accuracy; AUC, Area Under ROC Curve; NPV, Negative Predictive Value; PPV, Positive Predictive Value; FI, F1-Score; GBM, Gradient Boosting Machine; OA, Osteoarthritis; PFP, patellofemoral pain; RCTs, randomized controlled trials; BMI, body mass index; WBC, white blood cell; ESR, erythrocyte sedimentation rate; LASSO, Least Absolute Shrinkage and Selection Operator; AUC, area under curve; IKDC, International Knee Documentation Committee
Introduction
Knee joint pain is the most common type of joint pain, accounting for 5% of all outpatient visits [1]. Among the various causes of knee pain, meniscus tears rank among the top three worldwide, along with osteoarthritis (OA), which affects 654 million people and constitutes 23% of adults over age 40, [2] and patellofemoral pain (PFP), which has a lifetime incidence of about 25% [3]. Meniscus tears impact 620 million adults, representing 12% of the general adult population [4].
The meniscus is a fibrocartilaginous structure in the knee joint, made up of two crescent-shaped parts: the medial and lateral menisci. These structures help distribute weight and contribute to joint stability [5]. Meniscal tears, which refer to the separation of this fibrous tissue, can be classified as either traumatic, caused by excessive shear forces, or degenerative, resulting from repeated stress on a worn-out meniscus. According to the report, the annual incidence of clinically diagnosed meniscal tears is 79 cases per 100,000 people (95% CI, 63–94) [6]. Acute traumatic tears are most commonly seen in young, active individuals aged 18–40 who regularly participate in sports and often experience accompanying anterior cruciate ligament injuries [7].
For traumatic meniscal tears, several randomized controlled trials have established more standardized treatment protocols. Depending on the type of tear, either conservative treatment or arthroscopic surgery may be utilized, and the postoperative outcomes are generally satisfactory[8–13].
Degenerative tears typically occur in older adults, particularly those aged 40 and above, and are often associated with knee osteoarthritis. A population-based study in the United States found that 63% of older adults with symptomatic osteoarthritis had meniscal tears detected by MRI [14]. Additionally, meniscal tears are frequently discovered incidentally during MRI examinations. A meta-analysis revealed that 19% (95% CI, 13%−26%) of individuals aged 40 and over, who had no prior history of knee pain or injury, had asymptomatic meniscal tears detected by MRI, [15] and the degenerative meniscal tears are generally non-traumatic and a normal consequence of aging. This condition is often linked with knee osteoarthritis and accompanying degenerative changes [16].
There is no consensus on the best treatment methods for degenerative meniscal tears, various randomized controlled trials (RCTs) have produced differing results [17–20]. One systematic review found that certain factors—such as a prolonged symptom duration (over one year), radiographic evidence of osteoarthritis, and a meniscectomy rate exceeding 50%—were linked to poorer clinical outcomes following partial meniscal resection [21]. Many experts and medical associations recommend partial meniscectomy as a suitable option for treating patients with mild to moderate osteoarthritis or those with meniscal tears who have not benefitted from physical therapy or other non-surgical treatments. They suggest that the prognosis for degenerative meniscal tears largely depends on the associated osteoarthritis [22,23].
One of the main reasons for this different outcome is that we cannot directly assess the nature of osteoarthritis in patients with degenerative meniscus tears using MRI, nor can we conduct a thorough evaluation of the patients’ overall condition. This creates significant uncertainty in predicting the prognosis for patients following arthroscopic surgery.
Despite the high accuracy of MRI features in predicting meniscal tears and osteoarthritis injuries, previous studies have shown that potential selection bias from inter-observer variation is challenging to eliminate. However, deep learning (DL), a data-driven approach, is increasingly being utilized to automatically develop and organize predictive capabilities based on specific features rather than relying on human assessment [24–26].
In recent years, deep learning has become prevalent in medical image analysis, primarily due to its unique advantages in handling multidimensional and large-scale data. Its applications in medical imaging mainly include: image and examination classification; object and lesion classification; organ, region, and landmark localization; target or lesion detection; segmentation of organs and substructures; lesion segmentation; medical image registration; content-based image retrieval; image generation and enhancement; and automatic generation of image reports [27]. Among these, image and examination classification is one of the first areas where deep learning has made significant contributions to medical image analysis. In examination classification, one or more images (an examination) are typically used as input to produce a single diagnostic variable as output (for example, the presence or absence of disease). In this context, each diagnostic examination is a sample, and compared to the dataset sizes in computer vision, the dataset sizes are generally smaller (for example, hundreds or thousands versus millions of samples) [28]. Therefore, the popularity of deep learning in such applications is not surprising.
In the past decade, DL as a data-driven approach, has been increasingly applied towards automatic design and organization based on the predictive ability of specific features instead of human performance [29,30]. An increasing number of deep learning methods have been proposed in orthopedic and sports medicine practices to diagnose meniscus tears and osteoarthritis [24,31]. However, relevant research using deep learning to predict postoperative prognosis models for patients with degenerative meniscus tears undergoing arthroscopy is sparse.
Therefore, further research is necessary to support the accuracy of DLR methods in predicting postoperative prognosis for degenerative meniscus tear patients. The objective of our study is to develop and validate a multimodal DLR model based on preoperative MRI and laboratory tests to predict the prognosis of patients with degenerative meniscus tears undergoing arthroscopic partial meniscectomy, utilizing multimodal data that integrates clinical variables and DLR scoring.
Materials and methods
This retrospective study protocol received approval from the Institutional Review Board at participating hospital and was conducted in accordance with the principles of the 1975 Helsinki Declaration. Because the study is retrospective, the requirement for written informed consent was waived. The study was approved by the Banan Hospital of Chongqing Medical University review board. And the research study was registered in the Chinese Clinical Trial Registry (registration ID: ChiCTR2500098922).
Patient enrolment
We conducted data retrieval within the three-month period from October 15, 2024 to January 15, 2025 and 452 cases in total of degenerative meniscus tear were diagnosed during APM performed in our hospital from February 2020 to December 2022. A degenerative tear was defined as a slowly developing lesion occurring or insidious onset without any history of trauma [32,33], and appearing as a horizontal cleavage (intrameniscal linear signal often communicating with the articular surface), radial, or complex tear pattern on MRI, [34] and this was confirmed intraoperatively. The inclusion criteria are: 1) Age ≥ 40 years old. 2) Diagnosed as meniscus tear. 3) During hospitalization, arthroscopic surgery diagnosed degenerative meniscus tear. 4) Partial meniscectomy was performed to treat the meniscus. The exclusion criteria are: 1) Patients diagnosed with traumatic meniscus tear based on their medical history and intraoperative conditions. 2) Accompanied by ligament injuries around the joints, such as anterior and posterior cruciate ligament injuries, medial and lateral collateral ligament injuries, patellar ligament injuries, etc. 3) Merge with other types of arthritis, such as gouty arthritis, Charcot’s disease, tuberculous arthritis, rheumatoid arthritis. 4) The patient was admitted for conservative treatment and did not undergo surgery.
Data set
Imaging data collected all images of T1 and T2 sequences of the sagittal plane of the knee joint magnetic resonance imaging of patients before operation, and each patient had about 40 images. The clinical characteristics of the included patients were extracted according to the possible influencing factors found in previous studies, including gender, height, weight, body mass index (BMI), white blood cell (WBC), neutrophil percentage, blood glucose, erythrocyte sedimentation rate (ESR), pain time and pain aggravation time, Fig 1 shows the numbers of patients and images included in this study. To protect the confidentiality of patients, patient data were fully anonymized before analysis.
Surgical Procedure, Rehabilitation and Follow‑up protocol
All arthroscopic procedures were carried out without a tourniquet using Laryngeal Mask Anesthesia. The standard anterolateral and anteromedial portals, along with a 5 mm arthroscope, were employed. We conducted APM in every patient to restore the meniscus’s regular structure. Damaged and loose segments of the meniscus were excised utilizing a mechanical shaver and meniscal basket. The approach aimed to retain as much healthy meniscal tissue as possible. Damaged cartilage was not treated during surgery and no medications were administered into the knee during or after the arthroscopic procedure.
The rehabilitation program consisted of progressive neuromuscular and strength training exercises spread over 12 weeks, occurring twice a week. These exercises were designed to maintain range of motion, improve flexibility in the hips and hamstrings, increase strength in the quadriceps and hips, and preserve knee proprioception.
The primary endpoint for this study was 2 years follow-up, and the primary outcome was patient satisfaction with the current function of their knee. At 2 years after surgery, the patients underwent clinical and radiological evaluations, including answering the following question regarding their perception of the clinical outcomes of APM. Satisfaction was elicited using a 5-point Likert scale (1, very dissatisfied; 2, somewhat dissatisfied; 3, neither satisfied nor dissatisfied; 4, somewhat satisfied; and 5, very satisfied). The results were then Divide the results into four levels, with values of 5 assigned to “excellent”, 4 assigned to “good”, 3 assigned to “average” and values of 1 and 2 assigned to “poor”.
Study design
Image preprocessing: the image is processed as follows before being input into the deep learning network: 1. resampling to 128 × 128 × 128 by linear interpolation; 2. standardize image blocks with Z-score; 3. enhance images by randomly rotating −30–30 °, without other methods such as panning and zooming. to combat uncertainty in clinical application scenarios.
Deep learning model development: This study designed a hybrid deep learning model architecture MobileNet based Hybrid Network (MobHy-Net), using MobileNetV2 as a powerful feature extraction tool, removing the last layer and classification layer, and only obtaining the feature map we need. Specifically, we segment the features into fixed sized blocks and perform multi head self attention calculations on these different blocks to obtain blocks containing positional information. The processed blocks are classified by a classifier, and the fused information is decoded by a multi-layer perceptron. To avoid overfitting caused by data imbalance, a weighted oversampling method and a binary cross entropy loss function were used. The MobileNetV2 architecture and formula are shown in File S1.
Patients (not individual images) were randomly assigned to training, validation, and test cohorts (7:2:1 ratio) using patient-level stratified splitting to ensure no data leakage across cohorts. The training and validation queues are used to update model parameters and optimize hyperparameters, respectively. The overall flowchart of the model are shown in Fig 2. Using MobHy-Net as the backbone network and pre training the dataset in the DLR model。
Clinical information integration
Independent sample t-test was conducted on clinical features to remove features with p > 0.05. The Least Absolute Shrinkage and Selection Operator (LASSO) algorithm was used to construct a penalty function λ to shrink some regression coefficients to force some features to become 0, thereby incorporating stable features into LASSO analysis. Based on the maximum average cross validation score standard, perform 3 folds cross validation to determine the optimal lambda value. Select clinical features with non-zero coefficients based on the model corresponding to the optimal λ value. Thus obtaining independent and stable clinical features.
Using neural network to extract combined image features to get DL-sign, and combining the screened clinical features with DL sign and establish a joint model using 14 algorithms including logistic regression, Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, Extra Trees, XGBoost, AdaBoost, Multi-Layer Perceptron, Gradient Boosting Machine, etc. in predicting postoperative outcomes. The above modeling was completed using the PixelmedAI platform.
Statistical analysis
Statistical analysis was performed using the packages of python software version 3.12.7 (https://www.python.org). The dataset was divided into four groups with postoperative patient satisfaction as the reference standard. The clinical and imaging features logical differences among the four groups were compared using Chi-squared, Kruskal-Wallis and one-way ANOVA. Simultaneously, the training, test and validation cohorts were separated to compare the significant difference using one-way ANOVA. Two-tailed tests were used in all statistics, and variables with P < 0.05 were considered statistically significant. Cross-validation was described in terms of the accuracy of each fold and mean accuracy.
The data set was split into training, validation and test groups (7:2:1). Due to the total sample size in our study being less than 500, in order to ensure sufficient training samples for each fold, we set the K value to 3. The training group was used for K-fold stratified validation to validate the models. Over 1200 images for the each group were used randomly for each fold. The folds were made by preserving the percentage of samples for each class so that the distribution of the data sets did not interfere [35]. After validating and finetuning the model, we evaluated the model on the test set. The test set was split before training and was not used for training or validation.
The model performance of the test set was evaluated using the predictive accuracy (ACC), Positive predictive value (PPV), Negative predictive value (NPV), area under curve (AUC). F1-score, sensitivity, and specificity. The confusion matrix is also described to reflect the sensitivity and specificity of the model, so as to evaluate the performance of the algorithm. The predictive accuracy and F1 score were calculated as follows:
where TP indicates a true positive, TN indicates a true negative, FP indicates a false positive, and FN indicates a false negative.
Results
Table 1 lists the characteristics of participants. A total of 16788 MRI images were obtained from 452 patients. The poor, average, good and excellent groups included 100, 100, 100, 152 patients and 3762, 3708, 3730, 5588 images, respectively. Except BMI, the other indicators were statistically significant.
Table 2 lists the model performance of deep learning under the training set, test set and validation set. The accuracy of the test set is slightly lower than that of the validation set, with ACC of 0.75 and AUC of 0.71. Fig 3 shows the variation of coefficient and cross validation (CV) score in lasso regression with the hyper parameter α. Thus, the best α value is obtained, which is about 10−4 to 10−5. Fig 4 show the deep learning ROC curve and confusion matrix. The areas of micro average ROC curve and macro average ROC curve are 0.720 and 0.732 respectively.
The dotted vertical line represents the optimal log (λ) value. Panel b. Cross validation score from the LASSO regression cross-validation procedure was plotted against λ.
After combining the features obtained by deep learning with the known clinical features, different machine learning algorithms are used to obtain the joint model. Among them, light GBM is the best in the training set, test set and validation set, and their ROCs are more than 0.85. Due to the large amount of data, it will be shown in the form of charts below.
Fig 5 shows the SHAP visualization of features obtained from each machine learning model, most of which are dl-sign.
S1, S2 and S3 Fig show the ROC curves and AUC values during the validation of each model under the training set, test set and validation set.
S4, S5 and S6 Fig show the confusion matrix of each model under the training set, test set and validation set, reflecting the specificity and sensitivity of each model.
The light GBM classifier has the highest accuracy in predicting the effect of arthroscopic meniscus plasty for degenerative meniscus tear.(Fig 6)
To facilitate reproducibility and academic exchange, we have made our model code publicly available. You can access the repository at: https://github.com/410312774/PixelMedAI/tree/main/note2-%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E5%88%86%E7%B1%BB/MobHyNet. And the minimal anonymized dataset are shown in S1 Data.
Discussion
The most significant finding of the current study is that combining deep learning image features with clinical features and using machine learning to predict the prognosis of degenerative meniscus tears undergoing APM achieves high accuracy, enabling preoperative diagnosis and treatment strategies to be provided to clinicians and patients. Among numerous classifiers, the Light GBM classifier algorithm demonstrates the highest accuracy. The average accuracy on the training set reached 0.94. Due to the potential over fitting and uneven data distribution, the average accuracy of the test set and the validation set are not consistent, but both have reached a high accuracy, which are 0.93 and 0.92 respectively.
Previous studies have shown that due to the numerous influencing factors, particularly the inability to accurately assess the condition of cartilage in patients with degenerative meniscus tears before surgery, the outcomes of arthroscopic surgery for degenerative meniscus tears are uncertain [36]. As a result, it is often difficult to predict postoperative outcomes preoperatively, leading to situations where surgeons must inform patients and evaluate future results based on the condition of cartilage damage observed during the procedure. This poses certain challenges for both clinicians and patients before surgery. Our current study establishes a postoperative prediction model for degenerative meniscus tears, which utilizes deep learning to analyze MRI data to extract features, and then combines these features with clinical data for machine learning, allowing us to accurately predict patients’ postoperative outcomes preoperatively.
The Light GBM model showed improved ROC-AUC, but PPV remains low, It is mainly due to the small sample size of the poor prognosis category (‘poor ‘/’average’) in the data (only 200 cases in total, accounting for 44.2%). Category imbalance leads to conservative prediction tendency of the model for minority categories, and false positive (FP) increases, thus reducing PPV. This is consistent with the difficulty of minority class recognition reflected by the low F1 score (0.53). Although PPV is low, the core value of the model lies in high AUC (0.92) and stratification ability: 1. it can accurately distinguish the prognosis grade (such as’ excellent ‘vs’ poor’). 2. high NPV (0.88) suggests that the model has strong reliability in excluding adverse prognosis, which helps to avoid unnecessary surgery. PPV only affects the ‘high-risk’ interpretation. In future research, we will alleviate it by adjusting the weight of loss function, threshold optimization and other methods.
Deep learning is like a black box. We can obtain feature values from deep learning, but we cannot know how the feature values are obtained or the meaning of each feature. In order to prove the reliability of this deep learning model, we use Grad-CAM visualization to visually analyze the deep learning model. As shown in Fig 7, the highlight area of the model is located in the meniscus and cartilage area of the knee joint, which also proves the reliability of the features obtained by the deep learning model.
In the model training process of this study, a series of key hyperparameters and training strategies were explicitly set to ensure the reproducibility and scientific rigor of the model’s performance. The Adam optimizer was employed for model training with an initial learning rate of 1 × 10 ⁻ 3. When the validation set performance showed no improvement for 5 consecutive epochs, the learning rate was automatically reduced to 0.1 times its original value. The batch size was set to 32, the maximum number of training epochs was 200, and an early stopping strategy was adopted: training was terminated early if the validation set macro AUC did not improve for 30 consecutive epochs.
The loss function used was weighted categorical cross-entropy. To address the class imbalance problem, weighted oversampling was applied to balance the class distribution of the training samples. Simultaneously, during training, z-score standardization and random rotations (−30° to + 30°) were applied to the three-dimensional image data to enhance robustness. Neural network parameters were initialized using a normal distribution, with the weights of conv3d and linear fully connected layers initialized according to the inverse square root of the number of input units. BatchNorm3d weights were initialized to 1 and biases to 0. During the training and validation process, the training loss, validation loss, and macro AUC were accurately recorded for each epoch, and corresponding curves were plotted in the final report to monitor overfitting and convergence trends. All experiments were conducted on an NVIDIA RTX 4060Ti graphics card with 24GB of memory.
Analyzing the provided training and validation curves, the training loss decreases rapidly in the early stages, with a significant reduction occurring within approximately 50 epochs. This indicates that the model can quickly learn and fit the training data. As training continues, the training loss further decreases slowly and eventually stabilizes, converging to a low level, reflecting the model’s strong fitting ability on the training set.(S7 Fig)
The training set AUC shows a steady upward trend as training progresses and eventually converges between 0.75 and 0.78, suggesting that the model can distinguish between different classes reasonably well. However, the validation set AUC also increases steadily, ultimately stabilizing between 0.70 and 0.75. Both AUC curves exhibit good convergence. Overall, the performance of the current training-validation curves indicates that the model has learned some features.(S8 Fig)
We adapted the mobilenetv2 model in 3D (Fig 8), which is a convolutional neural network designed for efficient computation. The model uses the concepts of “reverse residual” and “linear bottleneck” to achieve a balance between accuracy and model size. This adaptation replaces 2D operations with their 3D counterparts, such as “nn”. Conv3d, “nn. Batchnorm3d” and “f.avg_pool3d” modules convert 2D images into 3D volumes. The model first defines utility functions, which create convolution layers through batch normalization and relu6 activation, and are customized for 3D data. This architecture uses a new component called “reverse residual” module. The module includes an optional expansion stage, followed by the separable convolution in the depth direction, and finally the linear convolution. If the expansion is adopted, the nonlinearity is not applied to the output characteristics. The use of residual connections is conditional and is only applied when the input and output sizes are the same, which greatly alleviates the problem of vanishing gradient in deeper networks.
Although we have targeted optimized the deep learning network, the performance is still unsatisfactory. The reason is that there are many factors that affect the postoperative effect of degenerative meniscus tear. Therefore, the postoperative effect of degenerative meniscus tear can’t be accurately evaluated only from imaging or clinical information, which is the same as the research results of other scholars in the past.
Multimodal deep learning networks strongly complement the single modal drawback by providing additional information about magnetic resonance imaging and clinical changes. Therefore, we establish a deep learning machine learning model under multimodal to improve the performance of the model. After machine learning combined with deep learning features, we found that the joint model has significantly higher accuracy and better performance.
Given that this study focuses on middle-aged and elderly patients with degenerative meniscal tears, who generally have lower functional demands for their knees compared to younger individuals, we did not use the commonly utilized evaluation forms such as Lysholm Knee Scoring Scale and International Knee Documentation Committee (IKDC) Subjective Knee Evaluation Form [37,38]. Instead, we placed greater emphasis on the patients’ subjective satisfaction with their daily life, as this aligns with the goals of such surgeries. Therefore, we used a Likert scale as the outcome variable in this study.
One of the goals of artificial intelligence is to assist humans in image interpretation. Several studies have found that DL outperforms experts in medical image interpretation [39–43]. For meniscus tears, most studies mainly establish diagnostic models [44–48]. Different CNN network structure models have been established using deep learning, which can diagnose the degree of meniscus tears on MRI and compare it with human experts. These algorithms have been proven to be very useful for inexperienced or even experienced surgeons to identify before surgery. Although humans can diagnose meniscus injuries in this situation, DL systems may be useful for situations where humans cannot recognize any wireless logical relationships or patterns. Computer vision sometimes outperforms humans in recognizing more complex and subtle patterns. Therefore, DL based algorithms are expected to become more useful in interpreting various images.
To the best of our knowledge, this is the first study integrating the clinical and radiography information of degenerative meniscus tear patients to predict the effect of postoperative after arthroscopy. Although the MDLR model has shown impressive accuracy (AUC > 0.92), its clinical application faces significant challenges:. First, the imaging studies used in our research only extracted sagittal plane TI and T2 sequences and did not extract coronal and horizontal plane sequences. the exclusive reliance on sagittal-plane MRI sequences represents a notable methodological constraint. While sagittal images effectively capture anterior-posterior meniscal pathology, critical 3D spatial relationships observable in coronal and axial planes remain unexamined. Coronal sequences are essential for evaluating meniscal root integrity, extrusion magnitude, and compartment-specific cartilage wear patterns—features strongly correlated with post-APM outcomes. Concurrently, axial planes provide optimal visualization of meniscocapsular separation and popliteal hiatus abnormalities. The omission of these orthogonal planes (coronal/axial) may compromise the model’s ability to extract comprehensive biomechanically relevant features, potentially diminishing its predictive accuracy for complex degenerative tears. Furthermore, as this study did not include data on patient’s lower limb alignment prior to surgery, the accuracy of the resulting model is not perfect. Third, this study used a retrospective design and this study did not use an external validation set, the absence of external validation poses significant concerns regarding clinical deployability. Our single-center design inherently incorporates institution-specific biases in MRI protocols, surgical techniques, and demographic factors. Without multi-center validation across diverse populations and imaging environments, the model’s generalizability remains unproven. Future studies should prioritize prospective validation in geographically distributed cohorts using standardized MRI protocols incorporating all three anatomical planes to establish true world applicability. Fourthly, prognostic stratification depends on patient satisfaction (Likert scale) rather than objective functional scores (such as Lysholm/IDKD). This may introduce bias and limit comparability with other studies. Moreover, imbalanced training data can tilt the model performance towards most classes. We hope to address the mentioned shortcomings to establish a more accurate and comprehensive predictive model in future research.
Conclusion
We have developed a model based on MDLR. Apart from the imaging features, we believe that the time of pain is an important prognostic factor for patients with degenerative meniscus tear undergoing arthroscopic partial meniscectomy.
This model is outstanding in the stratification of prognosis effectiveness of patients with degenerative meniscus tear after knee arthroscopic partial meniscectomy, and may help orthopedic surgeons preoperatively identify patients likely to have poor outcomes after surgery, supporting more informed patient counseling and potentially reducing unnecessary surgeries. It also highlights the importance of integrating imaging and clinical data for individualized treatment planning. But need for further prospective validation before clinical application.
Supporting information
S1 Fig. Performance of various machine learning models with deep learning features under training set.
https://doi.org/10.1371/journal.pone.0328299.s001
(TIF)
S2 Fig. Performance of various machine learning models with deep learning features under verification set.
https://doi.org/10.1371/journal.pone.0328299.s002
(TIF)
S3 Fig. Performance of various machine learning models with deep learning features under test set.
https://doi.org/10.1371/journal.pone.0328299.s003
(TIF)
S4 Fig. Confusion matrix of various machine learning models with deep learning features under training set.
https://doi.org/10.1371/journal.pone.0328299.s004
(TIF)
S5 Fig. Confusion matrix of various machine learning models with deep learning features under verification set.
https://doi.org/10.1371/journal.pone.0328299.s005
(TIF)
S6 Fig. Confusion matrix of various machine learning models with deep learning features under test set.
https://doi.org/10.1371/journal.pone.0328299.s006
(TIF)
S7 Fig. Training and validation loss of the model.
https://doi.org/10.1371/journal.pone.0328299.s007
(TIF)
S8 Fig. Training and validation macro AUC of the model.
https://doi.org/10.1371/journal.pone.0328299.s008
(TIF)
S1 File. MobileNetV2 architecture and formula derivations.
https://doi.org/10.1371/journal.pone.0328299.s009
(DOCX)
References
- 1. Frese T, Peyton L, Mahlmeister J, Sandholzer H. Knee pain as the reason for encounter in general practice. ISRN Family Med. 2013;2013:930825. pmid:24959577
- 2. Cui A, Li H, Wang D, Zhong J, Chen Y, Lu H. Global, regional prevalence, incidence and risk factors of knee osteoarthritis in population-based studies. EClinicalMedicine. 2020;29–30:100587. pmid:34505846
- 3. Smith BE, Selfe J, Thacker D, Hendrick P, Bateman M, Moffatt F, et al. Incidence and prevalence of patellofemoral pain: A systematic review and meta-analysis. PLoS One. 2018;13(1):e0190892. pmid:29324820
- 4. Logerstedt DS, Scalzitti DA, Bennell KL, Hinman RS, Silvers-Granelli H, Ebert J, et al. Knee Pain and Mobility Impairments: Meniscal and Articular Cartilage Lesions Revision 2018. J Orthop Sports Phys Ther. 2018;48(2):A1–50. pmid:29385940
- 5. Walker PS, Erkman MJ. The role of the menisci in force transmission across the knee. Clin Orthop Relat Res. 1975;109:184–92.
- 6. Peat G, Bergknut C, Frobell R, Jöud A, Englund M. Population-wide incidence estimates for soft tissue knee injuries presenting to healthcare in southern Sweden: data from the Skåne Healthcare Register. Arthritis Res Ther. 2014;16(4):R162. pmid:25082600
- 7. Ulstein S, Årøen A, Engebretsen L, Forssblad M, Røtterud JH. Effect of Concomitant Meniscal Lesions and Meniscal Surgery in ACL Reconstruction With 5-Year Follow-Up: A Nationwide Prospective Cohort Study From Norway and Sweden of 8408 Patients. Orthop J Sports Med. 2021;9(10):23259671211038375. pmid:34722785
- 8. Bleakley C, McDonough S, MacAuley D. The use of ice in the treatment of acute soft-tissue injury: a systematic review of randomized controlled trials. Am J Sports Med. 2004;32(1):251–61. pmid:14754753
- 9. Skou ST, Hölmich P, Lind M, Jensen HP, Jensen C, Garval M, et al. Early Surgery or Exercise and Education for Meniscal Tears in Young Adults. NEJM Evid. 2022;1(2):EVIDoa2100038. pmid:38319181
- 10. van der Graaff SJA, Eijgenraam SM, Meuffels DE, van Es EM, Verhaar JAN, Hofstee DJ, et al. Arthroscopic partial meniscectomy versus physical therapy for traumatic meniscal tears in a young study population: a randomised controlled trial. Br J Sports Med. 2022;56(15):870–6.http://doi.org/10.1136/bjsports-2021-105059
- 11. Gerritsen LM, van der Lelij TJN, van Schie P, Fiocco M, van Arkel ERA, Zuurmond RG, et al. Higher healing rate after meniscal repair with concomitant ACL reconstruction for tears located in vascular zone 1 compared to zone 2: a systematic review and meta-analysis. Knee Surg Sports Traumatol Arthrosc. 2022;30(6):1976–89. pmid:35072757
- 12. Xu C, Zhao J. A meta-analysis comparing meniscal repair with meniscectomy in the treatment of meniscal tears: the more meniscus, the better outcome?. Knee Surg Sports Traumatol Arthrosc. 2015;23(1):164–70. pmid:23670128
- 13. Costa GG, Grassi A, Zocco G, Graceffa A, Lauria M, Fanzone G, et al. What Is the Failure Rate After Arthroscopic Repair of Bucket-Handle Meniscal Tears? A Systematic Review and Meta-analysis. Am J Sports Med. 2022;50(6):1742–52. pmid:34161741
- 14. Englund M, Guermazi A, Gale D, Hunter DJ, Aliabadi P, Clancy M, et al. Incidental meniscal findings on knee MRI in middle-aged and elderly persons. N Engl J Med. 2008;359(11):1108–15. pmid:18784100
- 15. Culvenor AG, Øiestad BE, Hart HF, Stefanik JJ, Guermazi A, Crossley KM. Prevalence of knee osteoarthritis features on magnetic resonance imaging in asymptomatic uninjured adults: a systematic review and meta-analysis. Br J Sports Med. 2019;53(20):1268–78. pmid:29886437
- 16. Hohmann E. Treatment of Degenerative Meniscus Tears. Arthroscopy. 2023;39(4):911–2. pmid:36872031
- 17. Abram SGF, Hopewell S, Monk AP, Bayliss LE, Beard DJ, Price AJ. Arthroscopic partial meniscectomy for meniscal tears of the knee: a systematic review and meta-analysis. Br J Sports Med. 2020;54(11):652–63. pmid:30796103
- 18. Fernández-Matías R, García-Pérez F, Gavín-González C, Martínez-Martín J, Valencia-García H, Flórez-García MT. Effectiveness of exercise versus arthroscopic partial meniscectomy plus exercise in the management of degenerative meniscal tears at 5-year follow-up: a systematic review and meta-analysis. Arch Orthop Trauma Surg. 2023;143(5):2609–20. pmid:35996030
- 19. MacFarlane LA, Yang H, Collins JE, Brophy RH, Cole BJ, Spindler KP, et al. Association Between Baseline Meniscal Symptoms and Outcomes of Operative and Nonoperative Treatment of Meniscal Tear in Patients With Osteoarthritis. Arthritis Care Res (Hoboken). 2022;74(8):1384–90. pmid:33650303
- 20. Sihvonen R, Englund M, Turkiewicz A, Järvinen TLN, Finnish Degenerative Meniscal Lesion Study Group. Mechanical Symptoms and Arthroscopic Partial Meniscectomy in Patients With Degenerative Meniscus Tear: A Secondary Analysis of a Randomized Trial. Ann Intern Med. 2016;164(7):449–55. pmid:26856620
- 21. Eijgenraam SM, Reijman M, Bierma-Zeinstra SMA, van Yperen DT, Meuffels DE. Can we predict the clinical outcome of arthroscopic partial meniscectomy? A systematic review. Br J Sports Med. 2018;52(8):514–21. pmid:29183885
- 22. Brophy RH, Fillingham YA. AAOS Clinical Practice Guideline Summary: Management of Osteoarthritis of the Knee (Nonarthroplasty), Third Edition. J Am Acad Orthop Surg. 2022;30(9):e721–9. pmid:35383651
- 23. Giuffrida A, Di Bari A, Falzone E, Iacono F, Kon E, Marcacci M, et al. Conservative vs. surgical approach for degenerative meniscal injuries: a systematic review of clinical evidence. Eur Rev Med Pharmacol Sci. 2020;24(6):2874–85. pmid:32271405
- 24. Fritz B, Yi PH, Kijowski R, Fritz J. Radiomics and Deep Learning for Disease Detection in Musculoskeletal Radiology: An Overview of Novel MRI- and CT-Based Approaches. Invest Radiol. 2023;58(1):3–13. pmid:36070548
- 25. Li J, Qian K, Liu J, Huang Z, Zhang Y, Zhao G, et al. Identification and diagnosis of meniscus tear by magnetic resonance imaging using a deep learning model. J Orthop Translat. 2022;34:91–101. pmid:35847603
- 26. Li Y-Z, Wang Y, Fang K-B, Zheng H-Z, Lai Q-Q, Xia Y-F, et al. Automated meniscus segmentation and tear detection of knee MRI with a 3D mask-RCNN. Eur J Med Res. 2022;27(1):247. pmid:36372871
- 27. Brosch T, Tang LYW, Youngjin Yoo, Li DKB, Traboulsee A, Tam R. Deep 3D Convolutional Encoder Networks With Shortcuts for Multiscale Feature Integration Applied to Multiple Sclerosis Lesion Segmentation. IEEE Trans Med Imaging. 2016;35(5):1229–39. pmid:26886978
- 28. Plis SM, Hjelm DR, Salakhutdinov R, Allen EA, Bockholt HJ, Long JD, et al. Deep learning for neuroimaging: a validation study. Front Neurosci. 2014;8:229. pmid:25191215
- 29. Muñoz-Martínez S, Iserte G, Sanduzzi-Zamparelli M, Llarch N, Reig M. Current pharmacological treatment of hepatocellular carcinoma. Curr Opin Pharmacol. 2021;60:141–8. pmid:34418875
- 30. An C, Li D, Li S, Li W, Tong T, Liu L, et al. Deep learning radiomics of dual-energy computed tomography for predicting lymph node metastases of pancreatic ductal adenocarcinoma. Eur J Nucl Med Mol Imaging. 2022;49(4):1187–99. pmid:34651229
- 31. Bien N, Rajpurkar P, Ball RL, Irvin J, Park A, Jones E, et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLoS Med. 2018;15(11):e1002699. pmid:30481176
- 32. Niu NN, Losina E, Martin SD, Wright J, Solomon DH, Katz JN. Development and preliminary validation of a meniscal symptom index. Arthritis Care Res (Hoboken). 2011;63(2):208–15. pmid:20862684
- 33. Duong V, Oo WM, Ding C, Culvenor AG, Hunter DJ. Evaluation and treatment of knee pain: a review. JAMA. 2023;330(16):1568-80.http://doi.org/10.1001/jama.2023.19675
- 34. Crues JV 3rd, Mink J, Levy TL, Lotysch M, Stoller DW. Meniscal tears of the knee: accuracy of MR imaging. Radiology. 1987;164(2):445–8. pmid:3602385
- 35. Bey R, Goussault R, Grolleau F, Benchoufi M, Porcher R. Fold-stratified cross-validation for unbiased and privacy-preserving federated learning. J Am Med Inform Assoc. 2020;27(8):1244–51. pmid:32620945
- 36. Noorduyn JCA, van de Graaf VA, Willigenburg NW, Scholten-Peeters GGM, Mol BW, Heymans MW, et al. An individualized decision between physical therapy or surgery for patients with degenerative meniscal tears cannot be based on continuous treatment selection markers: a marker-by-treatment analysis of the ESCAPE study. Knee Surg Sports Traumatol Arthrosc. 2022;30(6):1937–48. pmid:35122496
- 37. Smith HJ, Richardson JB, Tennant A. Modification and validation of the Lysholm Knee Scale to assess articular cartilage damage. Osteoarthritis Cartilage. 2009;17(1):53–8. pmid:18556222
- 38. Irrgang JJ, Anderson AF, Boland AL, Harner CD, Kurosaka M, Neyret P, et al. Development and validation of the international knee documentation committee subjective knee form. Am J Sports Med. 2001;29(5):600–13. pmid:11573919
- 39. He X, Li K, Wei R, Zuo M, Yao W, Zheng Z, et al. A multitask deep learning radiomics model for predicting the macrotrabecular-massive subtype and prognosis of hepatocellular carcinoma after hepatic arterial infusion chemotherapy. Radiol Med. 2023;128(12):1508–20. pmid:37801197
- 40. Zhao X, Zhou B, Luo Y, Chen L, Zhu L, Chang S, et al. CT-based deep learning model for predicting hospital discharge outcome in spontaneous intracerebral hemorrhage. Eur Radiol. 2024;34(7):4417–26. pmid:38127074
- 41. Miao S, Jia H, Cheng K, Hu X, Li J, Huang W, et al. Deep learning radiomics under multimodality explore association between muscle/fat and metastasis and survival in breast cancer patients. Brief Bioinform. 2022;23(6):bbac432. pmid:36198668
- 42. Feierabend M, Wolfgart JM, Praster M, Danalache M, Migliorini F, Hofmann UK. Applications of machine learning and deep learning in musculoskeletal medicine: a narrative review. Eur J Med Res. 2025;30(1):386. pmid:40375335
- 43. Mercurio M, Denami F, Melissaridou D, Corona K, Cerciello S, Laganà D, et al. Deep Learning Models to Detect Anterior Cruciate Ligament Injury on MRI: A Comprehensive Review. Diagnostics (Basel). 2025;15(6):776. pmid:40150118
- 44. Güngör E, Vehbi H, Cansın A, Ertan MB. Achieving high accuracy in meniscus tear detection using advanced deep learning models with a relatively small data set. Knee surg sports traumatol arthrosc. 2024;33(2):450–6.
- 45. Ying M, Wang Y, Yang K, Wang H, Liu X. A deep learning knowledge distillation framework using knee MRI and arthroscopy data for meniscus tear detection. Front Bioeng Biotechnol. 2024;11:1326706. pmid:38292305
- 46. Tack A, Shestakov A, Lüdke D, Zachow S. A Multi-Task Deep Learning Method for Detection of Meniscal Tears in MRI Data from the Osteoarthritis Initiative Database. Front Bioeng Biotechnol. 2021;9:747217. pmid:34926416
- 47. Hung TNK, Vy VPT, Tri NM, Hoang LN, Tuan LV, Ho QT, et al. Automatic Detection of Meniscus Tears Using Backbone Convolutional Neural Networks on Knee MRI. J Magn Reson Imaging. 2023;57(3):740–9. pmid:35648374
- 48. Botnari A, Kadar M, Patrascu JM. A Comprehensive Evaluation of Deep Learning Models on Knee MRIs for the Diagnosis and Classification of Meniscal Tears: A Systematic Review and Meta-Analysis. Diagnostics (Basel). 2024;14(11):1090. pmid:38893617