Improved diagnostic performance of plain radiography for cervical ossification of the posterior longitudinal ligament using deep learning

Hee-Dong Chae; Sung Hwan Hong; Hyun Jung Yeoh; Yeo Ryang Kang; Su Min Lee; Minyoung Kim; Seok Young Koh; Yongeun Lee; Moo Sung Park; Ja-Young Choi; Hye Jin Yoo

doi:10.1371/journal.pone.0267643

Abstract

Background

A high false-negative rate has been reported for the diagnosis of ossification of the posterior longitudinal ligament (OPLL) using plain radiography. We investigated whether deep learning (DL) can improve the diagnostic performance of radiologists for cervical OPLL using plain radiographs.

Materials and methods

The training set consisted of 915 radiographs from 207 patients diagnosed with OPLL. For the test set, we used 200 lateral cervical radiographs from 100 patients with cervical OPLL and 100 patients without OPLL. An observer performance study was conducted over two reading sessions. In the first session, we compared the diagnostic performance of the DL-model and the six observers. The diagnostic performance was evaluated using the area under the receiver operating characteristic curve (AUC) at the vertebra and patient level. The sensitivity and specificity of the DL model and average observers were calculated in per-patient analysis. Subgroup analysis was performed according to the morphologic classification of OPLL. In the second session, observers evaluated the radiographs by referring to the results of the DL-model.

Results

In the vertebra-level analysis, the DL-model showed an AUC of 0.854, which was higher than the average AUC of observers (0.826), but the difference was not significant (p = 0.292). In the patient-level analysis, the performance of the DL-model had an AUC of 0.851, and the average AUC of observers was 0.841 (p = 0.739). The patient-level sensitivity and specificity were 91% and 69% in the DL model, and 83% and 68% for the average observers, respectively. Both the DL-model and observers showed decreases in overall performance in the segmental and circumscribed types. With knowledge of the results of the DL-model, the average AUC of observers increased to 0.893 (p = 0.001) at the vertebra level and 0.911 (p < 0.001) at the patient level. In the subgroup analysis, the improvement was largest in segmental-type (AUC difference 0.087; p = 0.002).

Conclusions

The DL-based OPLL detection model can significantly improve the diagnostic performance of radiologists on cervical radiographs.

Citation: Chae H-D, Hong SH, Yeoh HJ, Kang YR, Lee SM, Kim M, et al. (2022) Improved diagnostic performance of plain radiography for cervical ossification of the posterior longitudinal ligament using deep learning. PLoS ONE 17(4): e0267643. https://doi.org/10.1371/journal.pone.0267643

Editor: Dean Chou, University of California San Francisco, UNITED STATES

Received: August 10, 2021; Accepted: April 12, 2022; Published: April 27, 2022

Copyright: © 2022 Chae et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data except for cervical X-ray images are within the paper and its Supporting Information files. Cervical X-ray image data cannot be shared publicly because of the private health information policies of participating institutions. Cervical X-ray image data can be shared to researchers who meet the criteria for access to confidential data upon request (contact to: Seoul National University Hospital, irb@snuh.org).

Funding: This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI20C2092). Also, this study has received research funding from Deepnoid Inc. (grant number: 06-2018-0260). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: H.D.C received a research grant from Deepnoid, and two of the co-authors (Y.L. and M.S.P.) are employed by Deepnoid. We checked all authors’ competing interests and confirmed that our statement does not alter the adherence to PLOS ONE policy.

Introduction

Ossification of the posterior longitudinal ligament (OPLL) of the cervical spine is one of the most common causes of cervical myelopathy in Eastern Asia, including Korea and Japan [1]. OPLL results in the replacement of ligamentous tissue by the abnormal ectopic bone formation and can lead to persistent compression of the spinal cord. As a result, various degrees of neurologic deficits and spastic cervical myelopathy symptoms can lead to the restriction of activities of daily living and low quality of life [2].

Cervical radiography is usually performed as an initial diagnostic test for the assessment of symptoms from OPLL. On the conventional lateral radiography, OPLL appears as continuous or segmental ossifications posterior to the vertebral bodies along the course of the posterior longitudinal ligament (PLL). However, many instances of cervical OPLL can be overlooked on lateral plain radiographs due to the complex anatomic structures around the cervical spine and superimposition of the ossified ligaments with facet joints and osteophytes [3]. The overall false-negative rate of cervical OPLL in lateral radiographs was reported to be approximately 48% [4], and the interobserver variability showed only fair agreement in the lateral radiograph-based classification of cervical OPLL [5]. Computed tomography (CT) can most accurately delineate OPLL and is the diagnostic modality of choice for preoperative planning. CT can detect small lesions that are indistinct on simple radiographs and precisely assess the thickness and length of the OPLL. However, CT is not routinely performed as an initial evaluation in daily practice, and there is a limitation to its repeated use as a follow-up test due to the high radiation dose. Therefore, even in the era of cross-sectional imaging, plain radiography plays a fundamental role in the diagnosis and assessment of cervical OPLL.

Recent advances in artificial intelligence have shown promising results in various fields of radiology for the detection and characterization of lesions [6, 7]. In a recent study by Miura et al., they showed that deep learning can differentiate between cervical spondylosis and OPLL using lateral cervical radiographs with equal or superior diagnostic performance to that of spine surgeons [8]. However, they did not explore whether deep learning could improve observer performance when used as a secondary reader. Therefore, the purpose of this study was to investigate whether the DL model can improve the diagnostic performance of radiologists for cervical OPLL using plain radiographs.

Materials and methods

The institutional review board of Seoul National University Hospital (IRB No. 1711-133-901) approved this retrospective study with a waiver of informed consent.

Patients

We retrospectively collected 5289 patients who underwent cervical spine CT between March 2008 and October 2017. Of these, 509 patients were diagnosed with cervical OPLL. We excluded patients with a previous history of spinal surgery (n = 133), fracture (n = 23), tumors (n = 35), or infectious spondylitis (n = 11) involving the cervical spine. Next, we split the remaining 307 patients with OPLL into a training-validation (n = 207) and test set (n = 100). For the test set, we additionally collected a convenient sample of 100 patients without OPLL (Fig 1). Finally, 915 radiographs from 207 patients with OPLL (157 men; mean age 59.7 ± 9.9 years; range 30–84 years) were used for the training-validation set, and 100 lateral cervical radiographs from 100 patients with cervical OPLL (68 men; mean age 59.4 ± 11.3 years; range 32–81 years) and 100 radiographs from 100 patients without OPLL (54 men; mean age 51.1 ± 16.6 years; range 19–83 years) were used for the test set (Table 1). Multiple radiographs were used from the same patient for the training-validation set to increase the training data, but only one image per patient was used for the test set. The patients in the training-validation and test sets were exclusive to each of the other datasets. The presence or absence of OPLL of the patients in the training-validation and test sets was confirmed on cervical spine CT, and we considered an ossification with more than 2 mm thickness measured in the axial plane as positive based on the previous study [9]. When multiple CT scans were available, we referred the CT scan closest to the study date of the cervical radiograph. For the test set, we selected a cervical radiograph performed on the day closest to the preoperative CT. The mean intervals between CT and radiographs were 205 days for training-validation set and 10 days for test set, respectively. All radiographs were obtained using DigitalDiagnost VM, DigitalDiagnost VR, DigitalDiagnost VH (Philips Healthcare, Best, The Netherlands), or DGR-C22M2A/KR (Samsung Healthcare, Seoul, Korea).

Download:

Fig 1. Flow diagram of data inclusion and allocation.

https://doi.org/10.1371/journal.pone.0267643.g001

Download:

Table 1. Patient characteristics.

https://doi.org/10.1371/journal.pone.0267643.t001

Data labeling and development of the DL model

A board-certified musculoskeletal radiologist (H.D.C.) with nine years of experience in spine imaging manually segmented OPLL along the boundary of the lesion on radiographs using commercial software (DeepPhi, Deepnoid, Seoul, Korea). During manual segmentation, the corresponding cervical CT images of the patient were referred to as ground truth, and the presence of OPLL lesion for each vertebral level was recorded. Then, according to the CT findings, the morphological types of OPLL were classified into four subtypes by the same radiologist using the classification system proposed by the Investigation Committee on OPLL of the Japanese Ministry of Public Health and Welfare [10]: continuous, segmental, mixed, and circumscribed-type.

Two-dimensional residual U-net with atrous spatial pyramid pooling was implemented and trained to segment cervical OPLL on lateral radiographs [11–13]. Cervical radiographs were resized to 480x576, and 288x288-sized patches were extracted from the image with a stride value of 32. Among the patches, those including at least one pixel of OPLL annotation were used for training. The initial filter size was 32, which was doubled after passing each convolutional layer. We used the Adam optimizer (β1 = 0.9, β2 = 0.999 and eps = 1x10^-8) and Dice coefficient loss function. The initial learning rate was set to 0.0003 with a decay rate of 0.9 for every 2000 steps and the number of epochs was 300. The batch size of 5 and the dropout rate of 0.15 were used for the training. Segmentation results inferred at patch-level were recombined to reconstruct the entire image, and for the overlap between the patches, the results were determined by majority votes for multiple patches (Fig 2). We augmented our training data with several augmentation methods such as rotation, blurring, sharpening, brightness change, and the addition of Gaussian noise using the imgaug library (version 0.4.0) [14]. Our model was implemented with TensorFlow (version 2.4.1, https://www.tensorflow.org) [15] and the entire code can be found at https://github.com/moosungpark/opll.

Download:

Fig 2. The architecture of the deep convolutional neural network used for OPLL segmentation.

OPLL, ossification of the posterior longitudinal ligament.

https://doi.org/10.1371/journal.pone.0267643.g002

Evaluation of the diagnostic accuracy of DL model and radiologists

Per-vertebra and per-patient analysis was done to evaluate the diagnostic performance of the model. In per-vertebra analysis, the cervical spine was divided into six regions of interests (ROI) from C2 to C7, and for convenience, lesions located at the intervertebral disc level were considered to belong to the adjacent upper vertebral level. To assess the performance of the DL model, we recorded the lowest threshold at which OPLL lesions first appeared for each vertebral level, increasing the threshold value from 0.1 to 0.9 by an increment of 0.1, and that lowest threshold was used as the ROI-level probability of the corresponding vertebral level. For per-patient analysis, we used the highest probability among the six ROI-level probabilities as the patient-level probability score.

The observer performance test was conducted over two sessions, and in the first session, we compared the diagnostic performance of the DL model with those of observers. The second session was performed with an interval of 8 weeks, and each observer evaluated the image by referring to the results of the DL model. The observers consisted of three different subgroups by their level of experience in spine imaging: two radiology residents, two musculoskeletal radiology fellows, and two staff radiologists. A total of six observers independently reviewed the validation test data sets set on a commercial PACS program (INFINITT PACS, INFINITT Healthcare, Seoul, Korea). Diagnostic confidence in the presence of OPLL was scored at each vertebra-level ROI with a 5-point Likert scale: grade 1, definitely not present; grade 2, probably not present; grade 3, possibly present; grade 4, probably present; and grade 5, definitely present. As in the DL model, the highest confidence score among the six ROI was used as the patient-level confidence score.

Statistical analysis

The Fisher’s exact test and one-way analysis of variance test were used to compare categorical and continuous variables between training and test sets. We used Cohen’s kappa with linear weights and the Fleiss multi-rater kappa to evaluate interobserver variability in the confidence scores in the diagnosis of OPLL among radiologists. Cohen’s κ values < 0.40 signified poor agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good agreement, and ≥0.81 excellent agreement.

The observer performance test was analyzed using the R package RJafroc (version 2.0.1) [16]. Per-patient analysis was conducted using the area under the receiver operating characteristic (ROC) curve (AUC) analysis. The sensitivity and specificity of the DL model and average observers were calculated with the test set, and we selected the threshold at an activation value of 0.8 for the DL model and grade 3 for human observers, respectively. In the per-vertebra analysis, where there were multiple ROIs in one patient, the ROI-based ROC approach with the Obuchowski-Rockette-Hillis method [17] was applied to account for within-patient correlations of the clustered data. Regarding multiple-reader, multiple-case analysis, the diagnostic performance of the DL model was compared against individual observers using the fixed-reader, random-case (FRRC) method, and the random-reader, random-case (RRRC) method was used to compare the performance of the DL model to the average of the radiologists. The comparison of the AUC of individual observers between two sessions was compared using the FRRC method, and the RRRC method was used for the average AUC of observers. Finally, we performed subgroup analysis according to the vertebral level to evaluate the change in the diagnostic performance of the human observer and the DL model according to each vertebral level.

Statistical analyses were performed using R statistical software (ver. 4.0.5; R Foundation for Statistical Computing, Vienna, Austria). A p-value of < 0.05 was considered to indicate a significant difference.

Results

OPLL lesion characteristics

The results of interobserver variability for the confidence scores in the diagnosis of OPLL showed moderate agreement between two staff radiologists (κ = 0.56), good agreement between two radiology fellows (κ = 0.66), and poor agreement between two radiology residents (κ = 0.30) (S1 Table). The Fleiss κ for all raters was 0.28. When classified into subtypes according to OPLL morphology, there were 38 continuous-type (18%), 122 segmental-type (59%), 30 mixed-type (14%), and 17 circumscribed-type OPLL (8%) in the training set. For the test set, continuous-type was observed in 16 patients (16%), segmental-type in 62 patients (62%), mixed-type in 17 patients (17%), and circumscribed-type in 5 patients (5%).

Comparison of diagnostic performances of the DL model and observers

In per-vertebra analysis (Table 2), the DL model showed an AUC of 0.854 (95% CI, 0.828–0.880), which was higher than the average AUC of 6 observers (0.826; 95% CI, 0.772–0.881), but the difference was not significant (p = 0.292). The fixed-reader, random-case analysis revealed that the performance of the DL model was higher than one fellow and resident (AUC, 0.854 vs. 0.788 and 0.754; p = 0.002 and < 0.001, respectively). And there was no significant difference in diagnostic performance between the DL model and two staff radiologists (p = 0.163 and 0.683, respectively). When subgroup analysis was performed according to the morphologic subtype of OPLL, both the DL model and observers showed the best performance in continuous-type (DL model: AUC, 0.897; average observers, AUC, 0.953) and the worst in circumscribed-type (DL model: AUC, 0.819; average observers, AUC, 0.684). Both the DL model and observers showed decreases in overall performance in the segmental and circumscribed-types compared to the continuous and mixed-types, and this decrease showed the tendency to be more prominent in human observers, especially in radiology residents. In the segmental-type, the AUC value of the DL model was significantly higher than those of the three observers (AUC, 0.825 vs. 0.707, 0.763, and 0.633; P-value range, < 0.001 to 0.027).

Download:

Table 2. Results of the observer performance test, including subgroup analysis, according to the morphologic subtype of OPLL.

https://doi.org/10.1371/journal.pone.0267643.t002

In per-patient analysis, the performance of the DL model was AUC 0.851 (95% CI, 0.799–0.903) and that of average observers was AUC 0.841 (95% CI, 0.781–0.901) (p = 0.739). The patient-level sensitivity and specificity were 91% (95% CI, 83.6–95.8%) and 69.0% (95% CI, 59.0–77.9%) in the DL model, and 83.3% (95% CI, 80.1–86.2%) and 67.7% (95% CI, 63.8–71.4%) for the average observers, respectively. Representative cases are shown in Figs 3 and 4.

Download:

Fig 3. A 65-year-old male patient with mixed-type OPLL.

(A) A lateral plain radiograph shows mixed-type OPLL along the posterior side of the vertebra. The lesion at the C7 level is obscured by the shoulder shadow. (b) In the sagittal image of cervical spine CT, ossifications ranging from the C2 to T1 level are clearly demonstrated (window width = 2000 HU, window level = 500 HU). (c) OPLL lesions annotated by a radiologist on the plain radiograph. (d) In the resulting image inferred by the deep-learning model, OPLL lesions at the C2-6 levels are well predicted, but the lesion located at the C7 level was not detected by the model. OPLL, ossification of the posterior longitudinal ligament.

https://doi.org/10.1371/journal.pone.0267643.g003

Download:

Fig 4. A 51-year-old female patient with segmental-type OPLL.

(a) A lateral plain radiograph shows segmental-type OPLL (arrow) at the C5-6 level. (b) A sagittal image of cervical spine CT also demonstrates the segmental ossifications (arrow) at C5-6 level (window width = 2000 HU, window level = 500 HU). (c) OPLL lesions annotated by a radiologist on the plain radiograph. (d) The deep-learning model correctly predicted the segmental-type OPLL, which was overlooked by two observers. OPLL, ossification of the posterior longitudinal ligament.

https://doi.org/10.1371/journal.pone.0267643.g004

Comparison of observer performances with and without the DL model

When referring to the result images of the DL model, the diagnostic performance of all observers was significantly improved regardless of their level of experience (Table 2). With knowledge of the DL model results, the average AUC of observers increased to 0.893 (p = 0.001) at per-vertebra analysis and 0.911 (p < 0.001) at per-patient analysis (Fig 5). When the performance of each observer group referring to the result of the DL model was compared with the DL model, the average AUC of residents, fellows, and staff radiologists were 0.872 (p = 0.320), 0.884 (p = 0.190), and 0.911 (p = 0.004), respectively, and staff radiologists showed significantly higher performance than the DL model. In the subgroup analysis according to the morphologic subtype, the improvement in diagnostic performance was largest in segmental-type (AUC difference 0.087; p = 0.002), and the increment was the smallest in continuous-type (AUC difference 0.026; p = 0.026).

Download:

Fig 5. Comparison of observer performances with and without the DL model (a) ROC curve of the DL model and average observers in per-patient analysis.

The AUC of the DL model alone was 0.851 (95% CI, 0.799–0.903), and that for average observers was 0.841 (95% CI, 0.781–0.901). The AUC of average observers improved to 0.911 (95% CI, 0.876–0.945) when referring to the results of the DL model. (b) Improved diagnostic performance of individual observers in per-patient analysis with the assistance of the DL model. ROC, receiver operating characteristics; DL, deep learning; AUC, area under the curve.

https://doi.org/10.1371/journal.pone.0267643.g005

Subgroup analysis according to the vertebral level

The results of subgroup analysis at each vertebral level revealed that the diagnostic performance was best at the C2 level in both the DL model (AUC, 0.932) and average observers (AUC, 0.927) (Table 3). In both groups, performance tended to decrease toward the lower cervical level, and in particular, the DL model showed the lowest AUC value of 0.582 at the C7 level, which was significantly lower than that of the observers (AUC, 0.793) (p < 0.001).

Download:

Table 3. Subgroup analysis according to the vertebral level.

https://doi.org/10.1371/journal.pone.0267643.t003

Discussion

We developed a deep learning-based OPLL detection model and investigated its effect on improving diagnostic performance for OPLL using lateral cervical radiographs. The average AUC of observers increased from 0.826 to 0.893 (p = 0.001) at the per-vertebra analysis and from 0.841 to 0.911 (p < 0.001) at the per-patient analysis when referring to the results of the DL model. Both the DL model and radiologists showed higher diagnostic performance in the continuous and mixed-types than in the segmental and circumscribed-types. In addition, the improvement in diagnostic performance was largest in segmental-type (AUC difference 0.087; p = 0.002).

According to the recent meta-analysis by Tetreault et al. [18], only three studies have investigated the diagnostic performance of OPLL in plain radiographs. However, these studies did not include patients without target conditions as a control group and merely reported percent agreement as a measure of diagnostic performance. In a study by Mizuno et al., lateral plain radiography revealed 15 (88.2%) of the 17 OPLL cases, and the two missed cases were the segmental-type. In contrast, Kang et al. reported that the diagnostic accuracy using plain lateral radiographs was only 52.2% [4]. Furthermore, Jeon et al., in their analysis of 146 Korean OPLL patients, showed that OPLL was overlooked in 29 patients (19.9%) on plain radiography [19]. The low diagnostic accuracy of plain radiography was because the plain radiograph is a projection image, and the overlap of the lesion with the facet joint and pedicle shadow interferes with the detection of the OPLL. In addition, the ossification of PLL in segmental- and circumscribed-types can be mistaken for the posterior cortex of the vertebral body and disc calcification.

Our study demonstrated that both the DL model and radiologist performed better in continuous- and mixed-types than in segmental and circumscribed-types. This result is consistent with the findings of the previous study that the diagnostic accuracies using lateral radiographs were higher in the continuous (85.7%) and mixed-type (91.7%) than those of the segmental-type (27.3%) and circumscribed-type (20.0%) [4]. Otake et al. measured the maximum thickness of the ossified lesions on conventional lateral tomograms, and the mean thickness of segmental-type OPLL was 4.3 mm, which was smaller than those of continuous-type (9.1 mm) and mixed-type (8.2 mm) [20]. The decrease in the diagnostic performance in the segmental-type may be due to the relatively small thickness of the ossification lesions compared to the continuous-type. Moreover, ossified lesions leading from the vertebral body to the adjacent intervertebral discs, as in the continuous-type, may be more evident for radiologists to recognize as abnormalities.

A recent study by Miura et al. [8] evaluated the convolutional neural network’s (CNN) performance in diagnosing OPLL on cervical radiographs. They reported that the CNN’s accuracy, average recall, and average precision were 0.86, 0.86, and 0.87, respectively, similar to our study with the patient-level sensitivity of 91%. Also, the performance of the CNN model was higher in the continuous and mixed types than in the segmental and circumscribed types, which is consistent with our results. However, in contrast to their classification model, our study was based on the direct segmentation of OPLL lesions which may enhance the explainability and interpretability of the model.

In our study, the diagnostic performance was highest at C2 and tended to decrease as the level decreased to C7. The main reason is that the lower cervical spine can be obscured in the lateral plain radiograph in patients with a short neck and substantially elevated shoulders [21]. Therefore, it is essential to ensure that the lower cervical spine is adequately visualized by inferior traction of the arms with full expiration [22].

With the rapid development of artificial intelligence and deep learning in recent years, many researchers are actively utilizing this new technology in the field of spine imaging [23]. These studies range from the identification of vertebral fractures [24] to a diagnosis of osteoporosis [25, 26], detection of spinal bone metastases [27], and automatic measurement of spine alignment [28, 29]. With direct access to the pixel-level values and exhaustive search of the entire image, DL algorithms can automatically discover and extract sophisticated features and use them for lesion classification. However, the DL model also has limitations. When there is a lack of training data for a specific disease pattern, some straightforward tasks for radiologists can be rather difficult for the algorithm. In this study, the DL model showed higher performance than the radiologist in the segmental-type, while radiologists showed higher performance in the continuous-type. These complementary strengths of the DL model and human observer may explain the improvement of diagnostic performance when the DL model is used as a second reader.

This investigation had several limitations. First, although the diagnostic performance of the DL model alone was higher than that of the average observers, the difference was not significant. In this study, the size of the training dataset was small, and large-scale data are required to improve the performance of the DL model. Additionally, the statistical power needs to be increased through a larger number of observers and cases. Second, although we used CT images as a reference standard to manually segment the OPLL lesions on a plain radiograph, the boundary of small lesions was not clearly demonstrated on radiographs and could only be detected on CT. This inaccuracy in preparing training data would have influenced the performance of the DL model. Third, we only validated model performance using the data of the same institution as the development dataset; hence, the performance of the model may be overfitting. It is necessary to recruit sufficient OPLL cases at various institutions and perform external validation to evaluate model performance more accurately by future multicenter studies. Finally, we did not measure the change in reading time according to the use of the DL model. Several recent studies [30] have reported a reduced reading time as a strength of the DL model in clinical practice, and a follow-up study is needed to measure the change in reading time of cervical radiographs.

In conclusion, the deep learning-based OPLL detection model can significantly improve the diagnostic performance of radiologists on lateral cervical radiographs. Particularly, the DL model can help diagnose segmental-type OPLL, which shows a low diagnostic performance for radiologists.

Supporting information

S1 Table. Interobserver agreements between human observers.

https://doi.org/10.1371/journal.pone.0267643.s001

(DOCX)

S1 File.

https://doi.org/10.1371/journal.pone.0267643.s002

(ZIP)

References

1. Wu JC, Chen YC, Huang WC. Ossification of the Posterior Longitudinal Ligament in Cervical Spine: Prevalence, Management, and Prognosis. Neurospine. 2018;15(1):33–41. Epub 2018/04/16. pmid:29656627; PubMed Central PMCID: PMC5944629.
- View Article
- PubMed/NCBI
- Google Scholar
2. Kudo H, Yokoyama T, Tsushima E, Ono A, Numasawa T, Wada K, et al. Interobserver and intraobserver reliability of the classification and diagnosis for ossification of the posterior longitudinal ligament of the cervical spine. Eur Spine J. 2013;22(1):205–10. Epub 2012/11/28. pmid:23179977; PubMed Central PMCID: PMC3540306.
- View Article
- PubMed/NCBI
- Google Scholar
3. Yin M, Wang H, Ma J, Huang Q, Sun Z, Yan W, et al. Radiological Characteristics and Surgical Outcome of Patients with Long Ossification of the Posterior Longitudinal Ligament Resulting in Ossified Lesions in the Upper Cervical Spine. World Neurosurg. 2019;127:e299–e310. Epub 2019/04/08. pmid:30954753.
- View Article
- PubMed/NCBI
- Google Scholar
4. Kang MS, Lee JW, Zhang HY, Cho YE, Park YM. Diagnosis of Cervical OPLL in Lateral Radiograph and MRI: Is it Reliable? Korean J Spine. 2012;9(3):205–8. Epub 2012/09/01. pmid:25983816; PubMed Central PMCID: PMC4431003.
- View Article
- PubMed/NCBI
- Google Scholar
5. Chang H, Kong CG, Won HY, Kim JH, Park JB. Inter- and intra-observer variability of a cervical OPLL classification using reconstructed CT images. Clin Orthop Surg. 2010;2(1):8–12. Epub 2010/03/02. pmid:20190995; PubMed Central PMCID: PMC2824098.
- View Article
- PubMed/NCBI
- Google Scholar
6. Park JE, Kickingereder P, Kim HS. Radiomics and Deep Learning from Research to Clinical Workflow: Neuro-Oncologic Imaging. Korean J Radiol. 2020;21(10):1126–37. Epub 2020/07/31. pmid:32729271; PubMed Central PMCID: PMC7458866.
- View Article
- PubMed/NCBI
- Google Scholar
7. Hwang EJ, Park CM. Clinical Implementation of Deep Learning in Thoracic Radiology: Potential Applications and Challenges. Korean J Radiol. 2020;21(5):511–25. Epub 2020/04/24. pmid:32323497; PubMed Central PMCID: PMC7183830.
- View Article
- PubMed/NCBI
- Google Scholar
8. Miura M, Maki S, Miura K, Takahashi H, Miyagi M, Inoue G, et al. Automated detection of cervical ossification of the posterior longitudinal ligament in plain lateral radiographs of the cervical spine using a convolutional neural network. Sci Rep. 2021;11(1):12702. Epub 2021/06/18. pmid:34135404; PubMed Central PMCID: PMC8208978.
- View Article
- PubMed/NCBI
- Google Scholar
9. Fujimori T, Watabe T, Iwamoto Y, Hamada S, Iwasaki M, Oda T. Prevalence, Concomitance, and Distribution of Ossification of the Spinal Ligaments: Results of Whole Spine CT Scans in 1500 Japanese Patients. Spine (Phila Pa 1976). 2016;41(21):1668–76. Epub 2016/11/01. pmid:27120057.
- View Article
- PubMed/NCBI
- Google Scholar
10. Tsuyama N, Terayama K, Ohtani K. The ossification of the posterior longitudinal ligament of the spine (OPLL). Report of the investigation committee on OPLL of the Japanese Ministry of Public Health and Welfare. J Jpn Orthop Assoc. 1981;55:425–40.
- View Article
- Google Scholar
11. Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587. 2017.
- View Article
- Google Scholar
12. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv:150504597. 2015.
- View Article
- Google Scholar
13. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:151203385. 2015.
- View Article
- Google Scholar
14. Jung AB, Wada K, Crall J, Tanaka S, Graving J, Reinders C, et al. imgaug. GitHub. https://github.com/aleju/imgaug. Published June 1, 2020. Accessed January 31, 2022.
- View Article
- Google Scholar
15. Abadi M, Barham P, Chen JM, Chen ZF, Davis A, Dean J, et al. TensorFlow: A system for large-scale machine learning. Proceedings of Osdi’16: 12th Usenix Symposium on Operating Systems Design and Implementation. 2016:265–83. pmid:WOS:000569062400017.
- View Article
- PubMed/NCBI
- Google Scholar
16. Chakraborty D, Philips P, Zhai X. RJafroc: Analyzing Diagnostic Observer Performance Studies. R package version 1.3.2 2020 [cited 2020 17 April]. Available from: https://CRAN.R-project.org/package=RJafroc.
- View Article
- Google Scholar
17. Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data. Statistics in medicine. 2005;24(10):1579–607. pmid:15685718
- View Article
- PubMed/NCBI
- Google Scholar
18. Tetreault L, Nakashima H, Kato S, Kryshtalskyj M, Nagoshi N, Nouri A, et al. A Systematic Review of Classification Systems for Cervical Ossification of the Posterior Longitudinal Ligament. Global Spine J. 2019;9(1):85–103. Epub 2019/02/19. pmid:30775213; PubMed Central PMCID: PMC6362555.
- View Article
- PubMed/NCBI
- Google Scholar
19. Jeon TS, Chang H, Choi BW. Analysis of demographics, clinical, and radiographical findings of ossification of posterior longitudinal ligament of the cervical spine in 146 Korean patients. Spine (Phila Pa 1976). 2012;37(24):E1498–503. Epub 2012/08/24. pmid:22914701.
- View Article
- PubMed/NCBI
- Google Scholar
20. Otake S, Matsuo M, Nishizawa S, Sano A, Kuroda Y. Ossification of the posterior longitudinal ligament: MR evaluation. AJNR Am J Neuroradiol. 1992;13(4):1059–67; discussion 68–70. Epub 1992/07/01. pmid:1636514.
- View Article
- PubMed/NCBI
- Google Scholar
21. Abbasi A, Malhotra G. The "swimmer’s view" as alternative when lateral view is inadequate during interlaminar cervical epidural steroid injections. Pain Med. 2010;11(5):709–12. Epub 2010/04/01. pmid:20353409.
- View Article
- PubMed/NCBI
- Google Scholar
22. Lampignano J, Kendrick LE. Bontrager’s Textbook of Radiographic Positioning and Related Anatomy. 9 ed. Amsterdam: Elsevier; 2017.
23. Galbusera F, Casaroli G, Bassani T. Artificial intelligence and machine learning in spine research. JOR Spine. 2019;2(1):e1044. Epub 2019/08/30. pmid:31463458; PubMed Central PMCID: PMC6686793.
- View Article
- PubMed/NCBI
- Google Scholar
24. Derkatch S, Kirby C, Kimelman D, Jozani MJ, Davidson JM, Leslie WD. Identification of Vertebral Fractures by Convolutional Neural Networks to Predict Nonvertebral and Hip Fractures: A Registry-based Cohort Study of Dual X-ray Absorptiometry. Radiology. 2019;293(2):405–11. Epub 2019/09/19. pmid:31526255.
- View Article
- PubMed/NCBI
- Google Scholar
25. Pan Y, Shi D, Wang H, Chen T, Cui D, Cheng X, et al. Automatic opportunistic osteoporosis screening using low-dose chest computed tomography scans obtained for lung cancer screening. Eur Radiol. 2020. Epub 2020/02/20. pmid:32072260.
- View Article
- PubMed/NCBI
- Google Scholar
26. Jang S, Graffy PM, Ziemlewicz TJ, Lee SJ, Summers RM, Pickhardt PJ. Opportunistic Osteoporosis Screening at Routine Abdominal and Thoracic CT: Normative L1 Trabecular Attenuation Values in More than 20 000 Adults. Radiology. 2019;291(2):360–7. Epub 2019/03/27. pmid:30912719; PubMed Central PMCID: PMC6492986.
- View Article
- PubMed/NCBI
- Google Scholar
27. Zhao H, Yao G, Zhou Y, Wang Z. Application of deep learning for clinical predictive modeling: An artificial intelligence recognition in spinal metastases. Journal of Clinical Oncology. 2019;37(15_suppl):e18050-e.
- View Article
- Google Scholar
28. Weng C-H, Wang C-L, Huang Y-J, Yeh Y-C, Fu C-J, Yeh C-Y, et al. Artificial Intelligence for Automatic Measurement of Sagittal Vertical Axis Using ResUNet Framework. J Clin Med. 2019;8(11):1826. pmid:31683913
- View Article
- PubMed/NCBI
- Google Scholar
29. Watanabe K, Aoki Y, Matsumoto M. An Application of Artificial Intelligence to Diagnostic Imaging of Spine Disease: Estimating Spinal Alignment From Moire Images. Neurospine. 2019;16(4):697–702. Epub 2020/01/07. pmid:31905459; PubMed Central PMCID: PMC6945007.
- View Article
- PubMed/NCBI
- Google Scholar
30. Sung J, Park S, Lee SM, Bae W, Park B, Jung E, et al. Added Value of Deep Learning-based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study. Radiology. 2021;299(2):450–9. Epub 2021/03/24. pmid:33754828.
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Wu JC, Chen YC, Huang WC. Ossification of the Posterior Longitudinal Ligament in Cervical Spine: Prevalence, Management, and Prognosis. Neurospine. 2018;15(1):33–41. Epub 2018/04/16. pmid:29656627; PubMed Central PMCID: PMC5944629.
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Kudo H, Yokoyama T, Tsushima E, Ono A, Numasawa T, Wada K, et al. Interobserver and intraobserver reliability of the classification and diagnosis for ossification of the posterior longitudinal ligament of the cervical spine. Eur Spine J. 2013;22(1):205–10. Epub 2012/11/28. pmid:23179977; PubMed Central PMCID: PMC3540306.
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Yin M, Wang H, Ma J, Huang Q, Sun Z, Yan W, et al. Radiological Characteristics and Surgical Outcome of Patients with Long Ossification of the Posterior Longitudinal Ligament Resulting in Ossified Lesions in the Upper Cervical Spine. World Neurosurg. 2019;127:e299–e310. Epub 2019/04/08. pmid:30954753.
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Kang MS, Lee JW, Zhang HY, Cho YE, Park YM. Diagnosis of Cervical OPLL in Lateral Radiograph and MRI: Is it Reliable? Korean J Spine. 2012;9(3):205–8. Epub 2012/09/01. pmid:25983816; PubMed Central PMCID: PMC4431003.
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Chang H, Kong CG, Won HY, Kim JH, Park JB. Inter- and intra-observer variability of a cervical OPLL classification using reconstructed CT images. Clin Orthop Surg. 2010;2(1):8–12. Epub 2010/03/02. pmid:20190995; PubMed Central PMCID: PMC2824098.
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Park JE, Kickingereder P, Kim HS. Radiomics and Deep Learning from Research to Clinical Workflow: Neuro-Oncologic Imaging. Korean J Radiol. 2020;21(10):1126–37. Epub 2020/07/31. pmid:32729271; PubMed Central PMCID: PMC7458866.
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Hwang EJ, Park CM. Clinical Implementation of Deep Learning in Thoracic Radiology: Potential Applications and Challenges. Korean J Radiol. 2020;21(5):511–25. Epub 2020/04/24. pmid:32323497; PubMed Central PMCID: PMC7183830.
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Miura M, Maki S, Miura K, Takahashi H, Miyagi M, Inoue G, et al. Automated detection of cervical ossification of the posterior longitudinal ligament in plain lateral radiographs of the cervical spine using a convolutional neural network. Sci Rep. 2021;11(1):12702. Epub 2021/06/18. pmid:34135404; PubMed Central PMCID: PMC8208978.
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Fujimori T, Watabe T, Iwamoto Y, Hamada S, Iwasaki M, Oda T. Prevalence, Concomitance, and Distribution of Ossification of the Spinal Ligaments: Results of Whole Spine CT Scans in 1500 Japanese Patients. Spine (Phila Pa 1976). 2016;41(21):1668–76. Epub 2016/11/01. pmid:27120057.
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Tsuyama N, Terayama K, Ohtani K. The ossification of the posterior longitudinal ligament of the spine (OPLL). Report of the investigation committee on OPLL of the Japanese Ministry of Public Health and Welfare. J Jpn Orthop Assoc. 1981;55:425–40.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref11] 11. Chen L-C, Papandreou G, Schroff F, Adam H. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:170605587. 2017.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref12] 12. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv preprint arXiv:150504597. 2015.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref13] 13. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:151203385. 2015.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref14] 14. Jung AB, Wada K, Crall J, Tanaka S, Graving J, Reinders C, et al. imgaug. GitHub. https://github.com/aleju/imgaug. Published June 1, 2020. Accessed January 31, 2022.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref15] 15. Abadi M, Barham P, Chen JM, Chen ZF, Davis A, Dean J, et al. TensorFlow: A system for large-scale machine learning. Proceedings of Osdi’16: 12th Usenix Symposium on Operating Systems Design and Implementation. 2016:265–83. pmid:WOS:000569062400017.
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref16] 16. Chakraborty D, Philips P, Zhai X. RJafroc: Analyzing Diagnostic Observer Performance Studies. R package version 1.3.2 2020 [cited 2020 17 April]. Available from: https://CRAN.R-project.org/package=RJafroc.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref17] 17. Hillis SL, Obuchowski NA, Schartz KM, Berbaum KS. A comparison of the Dorfman–Berbaum–Metz and Obuchowski–Rockette methods for receiver operating characteristic (ROC) data. Statistics in medicine. 2005;24(10):1579–607. pmid:15685718
View Article
PubMed/NCBI
Google Scholar

[60] View Article

[61] PubMed/NCBI

[62] Google Scholar

[ref18] 18. Tetreault L, Nakashima H, Kato S, Kryshtalskyj M, Nagoshi N, Nouri A, et al. A Systematic Review of Classification Systems for Cervical Ossification of the Posterior Longitudinal Ligament. Global Spine J. 2019;9(1):85–103. Epub 2019/02/19. pmid:30775213; PubMed Central PMCID: PMC6362555.
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref19] 19. Jeon TS, Chang H, Choi BW. Analysis of demographics, clinical, and radiographical findings of ossification of posterior longitudinal ligament of the cervical spine in 146 Korean patients. Spine (Phila Pa 1976). 2012;37(24):E1498–503. Epub 2012/08/24. pmid:22914701.
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref20] 20. Otake S, Matsuo M, Nishizawa S, Sano A, Kuroda Y. Ossification of the posterior longitudinal ligament: MR evaluation. AJNR Am J Neuroradiol. 1992;13(4):1059–67; discussion 68–70. Epub 1992/07/01. pmid:1636514.
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref21] 21. Abbasi A, Malhotra G. The "swimmer’s view" as alternative when lateral view is inadequate during interlaminar cervical epidural steroid injections. Pain Med. 2010;11(5):709–12. Epub 2010/04/01. pmid:20353409.
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref22] 22. Lampignano J, Kendrick LE. Bontrager’s Textbook of Radiographic Positioning and Related Anatomy. 9 ed. Amsterdam: Elsevier; 2017.

[ref23] 23. Galbusera F, Casaroli G, Bassani T. Artificial intelligence and machine learning in spine research. JOR Spine. 2019;2(1):e1044. Epub 2019/08/30. pmid:31463458; PubMed Central PMCID: PMC6686793.
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref24] 24. Derkatch S, Kirby C, Kimelman D, Jozani MJ, Davidson JM, Leslie WD. Identification of Vertebral Fractures by Convolutional Neural Networks to Predict Nonvertebral and Hip Fractures: A Registry-based Cohort Study of Dual X-ray Absorptiometry. Radiology. 2019;293(2):405–11. Epub 2019/09/19. pmid:31526255.
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

[ref25] 25. Pan Y, Shi D, Wang H, Chen T, Cui D, Cheng X, et al. Automatic opportunistic osteoporosis screening using low-dose chest computed tomography scans obtained for lung cancer screening. Eur Radiol. 2020. Epub 2020/02/20. pmid:32072260.
View Article
PubMed/NCBI
Google Scholar

[89] View Article

[90] PubMed/NCBI

[91] Google Scholar

[ref26] 26. Jang S, Graffy PM, Ziemlewicz TJ, Lee SJ, Summers RM, Pickhardt PJ. Opportunistic Osteoporosis Screening at Routine Abdominal and Thoracic CT: Normative L1 Trabecular Attenuation Values in More than 20 000 Adults. Radiology. 2019;291(2):360–7. Epub 2019/03/27. pmid:30912719; PubMed Central PMCID: PMC6492986.
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref27] 27. Zhao H, Yao G, Zhou Y, Wang Z. Application of deep learning for clinical predictive modeling: An artificial intelligence recognition in spinal metastases. Journal of Clinical Oncology. 2019;37(15_suppl):e18050-e.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref28] 28. Weng C-H, Wang C-L, Huang Y-J, Yeh Y-C, Fu C-J, Yeh C-Y, et al. Artificial Intelligence for Automatic Measurement of Sagittal Vertical Axis Using ResUNet Framework. J Clin Med. 2019;8(11):1826. pmid:31683913
View Article
PubMed/NCBI
Google Scholar

[100] View Article

[101] PubMed/NCBI

[102] Google Scholar

[ref29] 29. Watanabe K, Aoki Y, Matsumoto M. An Application of Artificial Intelligence to Diagnostic Imaging of Spine Disease: Estimating Spinal Alignment From Moire Images. Neurospine. 2019;16(4):697–702. Epub 2020/01/07. pmid:31905459; PubMed Central PMCID: PMC6945007.
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref30] 30. Sung J, Park S, Lee SM, Bae W, Park B, Jung E, et al. Added Value of Deep Learning-based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study. Radiology. 2021;299(2):450–9. Epub 2021/03/24. pmid:33754828.
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

Figures

Abstract

Background

Materials and methods

Results

Conclusions

Introduction

Materials and methods

Patients

Data labeling and development of the DL model

Evaluation of the diagnostic accuracy of DL model and radiologists

Statistical analysis

Results

OPLL lesion characteristics

Comparison of diagnostic performances of the DL model and observers

Comparison of observer performances with and without the DL model

Subgroup analysis according to the vertebral level

Discussion

Supporting information

S1 Table. Interobserver agreements between human observers.

S1 File.

References