Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Deep learning-based fully automated grading system for dry eye disease severity

  • Seonghwan Kim ,

    Roles Data curation, Formal analysis, Investigation, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    ‡ SK and DP have contributed equally to this article as first authors.

    Affiliations Department of Ophthalmology, Seoul National University College of Medicine, Seoul, Korea, Department of Ophthalmology, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, Korea, Laboratory of Ocular Regenerative Medicine and Immunology, Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea

  • Daseul Park ,

    Roles Data curation, Formal analysis, Investigation, Validation, Visualization, Writing – original draft

    ‡ SK and DP have contributed equally to this article as first authors.

    Affiliations Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea, Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, Korea

  • Youmin Shin,

    Roles Data curation, Formal analysis, Writing – original draft

    Affiliations Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea, Interdisciplinary Program in Bioengineering, Graduate School, Seoul National University, Seoul, Korea

  • Mee Kum Kim,

    Roles Data curation, Validation, Writing – review & editing

    Affiliations Department of Ophthalmology, Seoul National University College of Medicine, Seoul, Korea, Laboratory of Ocular Regenerative Medicine and Immunology, Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea, Department of Ophthalmology, Seoul National University Hospital, Seoul, Korea

  • Hyun Sun Jeon,

    Roles Data curation

    Affiliations Department of Ophthalmology, Seoul National University College of Medicine, Seoul, Korea, Department of Ophthalmology, Seoul National University Bundang Hospital, Seongnam-si, Gyeonggi-do, Korea

  • Young-Gon Kim ,

    Roles Conceptualization, Data curation, Supervision, Writing – review & editing

    ifree7@gmail.com (CHY); younggon2.kim@gmail.com (YGK)

    Affiliation Department of Transdisciplinary Medicine, Seoul National University Hospital, Seoul, Korea

  • Chang Ho Yoon

    Roles Conceptualization, Investigation, Supervision, Validation, Writing – review & editing

    ifree7@gmail.com (CHY); younggon2.kim@gmail.com (YGK)

    Affiliations Department of Ophthalmology, Seoul National University College of Medicine, Seoul, Korea, Laboratory of Ocular Regenerative Medicine and Immunology, Biomedical Research Institute, Seoul National University Hospital, Seoul, Korea, Department of Ophthalmology, Seoul National University Hospital, Seoul, Korea

Abstract

There is an increasing need for an objective grading system to evaluate the severity of dry eye disease (DED). In this study, a fully automated deep learning-based system for the assessment of DED severity was developed. Corneal fluorescein staining (CFS) images of DED patients from one hospital for system development (n = 1400) and from another hospital for external validation (n = 94) were collected. Three experts graded the CFS images using NEI scale, and the median value was used as ground truth. The system was developed in three steps: (1) corneal segmentation, (2) CFS candidate region classification, and (3) estimation of NEI grades by CFS density map generation. Also, two images taken on different days in 50 eyes (100 images) were compared to evaluate the probability of improvement or deterioration. The Dice coefficient of the segmentation model was 0.962. The correlation between the system and the ground truth data was 0.868 (p<0.001) and 0.863 (p<0.001) for the internal and external validation datasets, respectively. The agreement rate for improvement or deterioration was 88% (44/50). The fully automated deep learning-based grading system for DED severity can evaluate the CFS score with high accuracy and thus may have potential for clinical application.

Introduction

Machine learning has enormously impacted medicine in recent years. The technology has great potential for improving medical diagnosis by providing a means to increasing the accuracy, speed, and reproducibility of diagnosis and to reducing clinician workload [1, 2]. Deep learning is a sub-branch of machine learning that uses neural networks with multilayers to learn a function between a set of inputs and outputs [3]. In ophthalmology, deep learning has been applied to various types of imaging, from color fundus photography to optical coherence tomography and anterior segment photography [46].

Dry eye disease (DED) is a multifactorial disorder characterized by loss of the tear film homeostasis accompanied by several ocular symptoms [7, 8]. DED is prevalent in 5.3–34.5% of the population, and its incidence seems to have increased over time [9, 10]. Punctate epithelial erosion (PEE) as evaluated according to the corneal fluorescein staining (CFS) score is one of the critical diagnostic features of DED, in addition to tear break-up time, ocular surface disease index score, and Schirmer test score [11, 12]. Quantifying the degree of CFS is important in grading DED severity. Among various grading methods, the grading system recommended by the National Eye Institute (NEI) is one of the commonly used scales in clinical trials owing to its refined methodology [1315].

However, most CFS scales, including the NEI scale, are still subjective and observer dependent, showing inter- and intra-observer variabilities [16]. Therefore, a reproducible and reliable objective method to minimize subjective bias from human observers is warranted. However, few studies have applied digitalized automated methods including deep learning technology for objectifying the NEI scale and interpreting CFS [1620]. Previously reported systems have limitations in that a system developed by Amparo et al. [16] is not fully automated, and another system reported by Qu et al. [19] is not clear what process the system goes through to evaluate the CFS score. Thus, this study aimed to develop a clinically applicable fully automated deep learning-based system for the assessment of dry eye severity according to the NEI scale. Herein, we developed a system that can automatically segment the cornea and evaluate the severity of DED by directly inputting an original image file captured by a digital camera attached to a slit lamp biomicroscope.

Materials and methods

Study design and patients

Institutional Review Board (IRB) approval was obtained from Seoul National University Hospital (IRB No. 2205-162-1328). The study was conducted according to the tenets of the Declaration of Helsinki. Informed consent was waived by the IRB because the study was based on the retrospective review of data. The authors had access to information that could identify individual participants (i.e., full names) during or after data collection. The data were accessed for research purposes from August 2022 and February 2023 (S1S3 Datasets).

A total of 1400 anterior segment images of DED patients, including Sjögren syndrome and ocular graft-versus-host disease (GVHD), at Seoul National University Hospital (hospital 1) between January 2019 and December 2021 were retrospectively collected. In addition, 94 anterior segment images of DED patients at Seoul National University Bundang Hospital (hospital 2) were collected for external validation. The exclusion criteria were (1) a history of ocular surgery and (2) presence of corneal diseases such as corneal opacity, corneal edema, and keratitis.

Anterior segment image capture technique

A fluorescein strip (FLUO 900 Strip ®, Haag-Streit AG, Bern, Switzerland) was moistened with normal saline, shaken off and applied to the inferior fornix to stain the cornea. After 3–5 blinks, anterior segment images were obtained using the digital unit of a 5-megapixel camera (DC-4, Topcon, Tokyo, Japan) attached to a slit lamp biomicroscope (Topcon, SL-D701, Tokyo, Japan) under a cobalt filter. All image files were saved in JPEG format (2576 × 1934 pixels, 24-bit).

NEI scoring method

The NEI scale was used to evaluate the CFS score [13]. Considering the arbitrary grid of the NEI scale in the 1995 NEI workshop, the grid proportion of the NEI scale was used according to Amparo et al. [16] (Fig 1). A grid was added to each anterior segment photograph. Using a grid, the corneal area was divided into five zones, and the score at each zone was evaluated by three ophthalmologists (SK, MKK, and CHY) who specialized in DED. The median score of each zone was chosen as the ground truth. If the scores of all three ophthalmologists were different or if the score difference between one and the other ophthalmologists was more than 1, the ground truth NEI score was determined through a consensus meeting. A total of 1294 anterior segment photograph images (1100 for grading system establishment, 94 images for external validation, 100 images for serial data analysis) were reviewed.

thumbnail
Fig 1. Corneal segmentation and scoring method.

(A) Corneal segmentation grid and proportion (right eye). The horizontal and vertical ratios of each zone of the grid are 1:1.6:1. (B) Two examples of NEI scale evaluation. PEE of the five zones is assessed and scored using the NEI scale. (C) Corneal segmentation grid and proportion (left eye). NEI, National Eye Institute; PEE, punctate epithelial erosion.

https://doi.org/10.1371/journal.pone.0299776.g001

Development of the fully automated grading system

The fully automated grading system for DED severity was developed using three steps: (1) corneal segmentation, (2) classification of CFS candidate regions, and (3) estimation of NEI grades within the CFS candidate regions (Fig 2).

thumbnail
Fig 2. Diagram of the proposed deep learning system.

In step 1, corneal region in fluorescein-stained slit lamp image is segmented using U-Net architecture with 1100 images and their corneal region labeled masks. In step 2, CNN-based classification model was trained with 200 images and their PEE and non-PEE labeled data to find the PEE candidate regions within the corneal region. In step 3, PEE quantification is performed using PEE density map and presented as MDV. PEE, punctate epithelial erosion; MDV, maximum density value.

https://doi.org/10.1371/journal.pone.0299776.g002

Corneal segmentation model established as first step in fully automated grading system. The outer circles of the grid used to evaluate the NEI score were used as the ground truth for the corneal contour. In cases where the margin of the grid was partially covered with the eyelid, The outer contour of the cornea was manually drawn and used as the ground truth. For cross-validation, 1100 fluorescein-stained corneal images from 1100 patients were divided into the training, development, and validation sets as stratified five folds (Fig 3). Before training, all images were pre-processed using two steps: 1) normalization was performed to change the range of image intensities from 0–255 to 0–1 and 2) contrast-limited adaptive histogram equalization was used to improve the contrast of the images [21]. All input images were resized to 512 x 512 pixels using zero-padding. The U-Net architecture demonstrates outstanding performance in medical image segmentation and is used to establish a segmentation model [22]. Corneal regions in the 1100 anterior segment images were segmented using U-Net architecture with corneal region labeled masks. In this study, sigmoid function was employed as the activation function and Dice coefficient was adopted with stochastic gradient descent (SGD) optimizer to optimize the model’s parameters [2325]. Dice coefficient is an index used to evaluate the similarity between two areas. Dice coefficient is calculated by doubling the size of the shared area and dividing by the sum of the sizes of the two areas. The performance of the model was evaluated using the average Dice coefficient of each fold. The predicted image was post processed, and a grid was drawn to obtain a mask. A circle is detected in the predicted image using the Hough circle transform algorithm [26]. It removes the predicted image corresponding to the outer space of the corresponding circle and then complements the shape with the convex hull algorithm. An inner grid was drawn to divide the five zones of the corneal region. In the previous process, the ratio of the inner grid circle was 1:1.6:1. The central ratio was the diameter of Zone 1. This value was used to separate Zone 1 by drawing a circle in the center. Lines were drawn at 45° and 135° to distinguish the rest of the zone based on the horizontal line passing through the origin. The majority-voting ensemble method was used to integrate the results of the entire system.

thumbnail
Fig 3. Schematic diagram of dataset splitting for deep learning analysis.

For the PEE candidate regional model, 200 cases were divided into the training, development, and validation sets in a 7:2:1 ratio. For the corneal region segmentation model, PEE detection, and quantification model, 1100 cases were divided 5-folds. Also, data from 94 cases were used for external validation of the entire system, and data from another 100 cases were used for serial data analysis. PEE, punctate epithelial erosion.

https://doi.org/10.1371/journal.pone.0299776.g003

In the second step, to reduce false-positive regions that can mimic PEE, such as flashlights or filaments, the CNN-based classification model was used to find the areas where PEE may exist within the corneal region segmented in step 1. To train the classification model, 200 images from 200 patients were divided in a 7:2:1 ratio into the training, development, and validation sets. The 200 images were labeled with red (definite PEE region) and yellow (definite non-PEE region) (Fig 4A–4C). Then, after extracting the coordinates of red and yellow labeling from the images, 100–150 coordinates were randomly selected to set the center. Calculation of extracted patches measuring 192×192 pixels [(x, y), (x-96, y-96), (x+96, y+96)] was conducted (Fig 4D). A total of 24559 patches were used for the learning. Among several classification models (VGG16 [27], VGG19 [27], and InceptionV3 [28]), the ImageNet [29] pre-trained VGG16 showed the best performance and was thus adopted in this model. While U-Net can identify PEE candidate regions, the fully convolutional network (FCN) model was chosen due to the need for more detailed labeling of the PEE region and considering the resolution of the images. FCN model was trained with 200 image and PEE and non-PEE label data to find the PEE candidate areas within the corneal region. During training, an SGD optimizer with a weighted decay of 1e-6 and a batch size of 64 was used. For inferences, 192 × 192 pixel-sized patches were overlapped by 25% using a sliding window to extract the results. Additionally, we verified the model using the Class Activation Map (CAM) to evaluate whether the model effectively classifies PEE regions. In CAM, the output of the final convolutional layer was utilized as feature maps. Global average pooling was then employed to calculate a weighted average for each class, and softmax was applied to obtain the pixel probabilities of specific classes [30].

thumbnail
Fig 4. Examples of PEE and non-PEE labeling generated using Microsoft paint software.

(A–C). Labeling of definite PEE (red color) and definite non-PEE (yellow color). (D). Extraction of patches sized 192 × 192 pixels for learning. PEE, punctate epithelial erosion.

https://doi.org/10.1371/journal.pone.0299776.g004

In step 3, because PEE showed a blob in the image, the blob detection algorithm was used to detect PEE in the CFS candidate region extracted in step 2. Blob algorithm detected the areas of digital images with different characteristics such as brightness or color from surrounding areas. Given that PEE appeared green in the standard image, the algorithms were run using exclusive green channels. After PEE detection, the density was calculated by sliding and overlapping the 32 × 32 window by 50%, with consideration of the PEE size. After calculation, the density map was resized to match the original image size. The largest density value, named the maximum density value (MDV), was extracted for each zone, and the representative characteristics of each zone were determined. CFS score by NEI scale is evaluated based on the highest density part of PEEs of each zone. Therefore, we evaluated the MDV of each zone to estimate PEE grading. As in the corneal segmentation model, 1100 fluorescein-stained corneal images were divided into the training, development, and validation sets as stratified five folds (Fig 3).

External validation

External validation was performed using 94 images from Hospital 2. All images were graded on the NEI scale by three ophthalmologists, as in the hospital 1 data, and the correlation between the prediction value and ground truth was evaluated.

Serial data analysis

Two images of the same eye taken on different days were compared in every 50 patients (100 images in total). To evaluate the accuracy in predicting aggravation or improvement of DED, changes in MDV on different days in the same eye were compared with those in the ground truth. Aggravation of DED was defined as an increase in NEI score, and improvement of DED was defined as a decrease in NEI score.

Statistical analysis

Model performance was validated according to Spearman correlation (r), which was used to evaluate the correlation between the model output and the ground truth. All statistical analyses were conducted using the GraphPad Prism 9.4.1 software (GraphPad Inc., San Diego, CA, USA).

Results

Patient and image characteristics

Among the 1100 patients in the development set (1400 images from hospital 1), 113 (10.3%) and 15 (1.4%) patients had Sjögren syndrome and ocular GVHD, respectively (Table 1). Meanwhile, in the external validation set (94 images from hospital 2), 65 patients (69.1%) were previously diagnosed with Sjögren syndrome. The distribution of the ground truth NEI score in each zone and the ground truth total NEI score in hospitals 1 and 2 are shown in Fig 5.

thumbnail
Fig 5.

Ground truth NEI score of the development dataset (hospital 1 data) at each zone (A) and total NEI score (B), and ground truth NEI score of the external validation dataset (hospital 2 data) at each zone (C) and total NEI score (D). NEI, National Eye Institute.

https://doi.org/10.1371/journal.pone.0299776.g005

thumbnail
Table 1. Diagnosis of the patients in the development (hospital 1) and external validation (hospital 2) datasets.

https://doi.org/10.1371/journal.pone.0299776.t001

Table 2 shows the clinical score agreement among the three investigators before consensus meeting for ground truth. The Spearman correlation coefficients were 0.878, 0.885, and 0.920. After the consensus meeting, Spearman correlation coefficients among the investigators were raised to 0.905, 0.903, and 0.934 (S1 Table). The correlation between maximum density value (MDV) and median/mean NEI score determined by the three investigators is shown in Fig 6. Compared with the median NEI score, the mean NEI score showed higher correlation with MDV in most zones (Fig 6). The correlation between MDV and both median (r = 0.854) and mean (r = 0.869) NEI scores was the highest in zone 5. Meanwhile, the correlation between MDV and NEI score was the lowest in zone 2.

thumbnail
Fig 6.

(A) Correlation between median NEI score and MDV, and (B) correlation between mean NEI score and MDV by zone. NEI, National Eye Institute; MDV, maximum density value.

https://doi.org/10.1371/journal.pone.0299776.g006

thumbnail
Table 2. Agreement in corneal fluorescein staining (CFS) scores based on the National Eye Institute (NEI) scale.

https://doi.org/10.1371/journal.pone.0299776.t002

Results of segmentation and classification model

The examples of corneal segmentation inputs, prediction results, and outputs are shown in Fig 7. The model generates an output of 512 × 512 pixels, and after the removal of padding, this output was resized to the original image of 2576 × 1934 pixels. The Dice coefficients for corneal segmentation were higher than 0.96 in all 5-fold data, and the average Dice coefficient was 0.962 ± 0.001.

thumbnail
Fig 7. Examples of the training dataset, input, and final output used in the corneal segmentation step.

The input is a fluorescein-stained corneal image (first panel). The ground truth (second panel) is used to train the cornea segmentation model. The red lines in the second panel indicate corneal regions. The third panel displays the predicted image of corneal segmentation. The final output (last panel) is a grid mask on a predictive image obtained using computer vision algorithms.

https://doi.org/10.1371/journal.pone.0299776.g007

To reduce false positives mimicking PEE, threshold was set at 0.98 for highly specific model tuning. At a threshold of 0.98, the classification model achieved an accuracy of 0.89, a sensitivity of 0.82, a specificity of 0.96, and AUC of 0.97, indicating its robust performance.

CAM is shown in Fig 8A. The red boxes indicate the patches containing PEE (true positive). The yellow boxes indicate non-PEE patches (true negative), such as flashlights, filaments, and reflection light due to tear film. As shown in the yellow box in Fig 8A, the CAM focused on a portion of the region but correctly classified it as true negative. Fig 8B shows the result of the overlapped density map on the original images; low severity is shown in green and high severity is shown in red.

thumbnail
Fig 8. Illustration of classification model and density map results.

(A) CAM of PEE candidate region classification. The red and yellow boxes represent true positive and true negative of the PEE classification model, respectively. (B) Density map results. Blue indicates low PEE density, and red indicates high PEE density. CAM, class activation map; PEE, punctate epithelial erosion.

https://doi.org/10.1371/journal.pone.0299776.g008

Internal and external validations

The Spearman correlation between MDV and ground truth NEI score was 0.868 in the internal validation datasets (Fig 9A). The Spearman correlation between MDV and ground truth NEI score was 0.863 in the external validation dataset (Fig 9B).

thumbnail
Fig 9. Correlation between total NEI score and MDV.

(A) Spearman correlation result between total NEI score and MDV of the development dataset (hospital 1). The Spearman correlation is 0.868 (p<0.001). (B) Spearman correlation results between the total NEI score and MDV of external validation dataset (hospital 2). The Spearman correlation coefficient is 0.863, and the p-value is <0.001. NEI, National Eye Institute; MDV, maximum density value.

https://doi.org/10.1371/journal.pone.0299776.g009

Serial data analysis

The agreement between the proposed model and ground truth measures for improvement or deterioration was consistent in 44 of 50 patients. The six patients showed different directions in the prediction of the model and ground truth (Table 3). Of the 6 patients with discrepancy between model’s predictions and ground truth, the degree of improvement or deterioration in ground truth NEI score was 2 in four patients and 1 in two patients.

thumbnail
Table 3. Agreement between the entire model and ground truth data for the assessment of improvement or deterioration in 50 eyes (n = 100 images).

https://doi.org/10.1371/journal.pone.0299776.t003

Discussion

In this study, a fully automated artificial intelligence system calculating NEI scores by fluorescein-stained corneal images was developed. The system automatically segmented the corneal region and identified the PEE candidate region. The score of the densest region of each area was calculated according to the NEI scoring system. The total corneal area score calculated by the automated AI system showed a high correlation with the ground-truth NEI score (r = 0.868), and this was comparable to the correlation with the NEI scale score determined by ophthalmologists (r = 0.878–0.920). This system successfully predicted 88% of disease improvement or deterioration.

Unlike the Oxford scale or Sjögren’s International Collaborative Clinical Alliance ocular staining score which evaluates the entire cornea without segmentation, the NEI scoring system divides the cornea into five regions using a grid [13, 31, 32]. The corneal segmenting grid from the 1995 NEI workshop is arbitrary [13], and subsequent studies using the NEI scoring system did not mention the setting of the ratio of the circle [18, 19]. Therefore, the grid was used as in the study of Amparo et al. [16] because they reported that the NEI score may vary depending on the proportion of the inner grid and suggested that a certain ratio of circles should be set.

Two studies previously applied deep learning models to predict the Oxford scale score [17, 33]. In one study, the region with PEE was extracted, and the ratio in the entire cornea was calculated. The correlation with ground truth was 0.85 [33]. In the other study, the Oxford scale score was calculated using a formula after measuring the number of PEEs, and the correlation with the ground truth was 0.981 [17]. Feng et al. reported an automated dry eye grading system based on topological features to predict the ocular staining score recommended by Sjögren’s International Collaborative Clinical Alliance [34]. The authors utilized image processing techniques to extract topological and morphological features from corneal images, and subsequently analyzed and classified them using machine learning models [34]. However, these previous methods could not be used for calculating the NEI score because of the complexity of the NEI scale. Therefore, the corneal regions were divided as in the method presented for the NEI scoring system, scored the densest PEE region in each region, and combined the scores for each region.

Qu et al. reported an automated NEI grading system using deep learning [19]. A staining grading model predicting a score from 0 to 3 for each region was trained with images also scored from 0 to 3 [19]. However, our system adopted a method to quantify PEEs and obtain the MDV. This is more similar to the method of the NEI scoring system and may thus be used as an objective value. Additionally, our proposed method has an advantage that closely aligns with the actual clinical practice of step-by-step review as in the same process of CFS scoring in a clinical setting. This process includes segmenting the corneal region into 5 zones, evaluating the PEE density in each zone, and then summing the evaluations to calculate a total score, which distinguishing it from existing end-to-end deep learning methods. PEE quantification involves two steps of false-positive reduction: cornea segmentation and PEE candidate region classification. Corneal segmentation involves segmentation of corneal regions and using these regions for scoring. This may reduce false positives by eliminating the possibility of detecting in areas outside the target scoring region. The NEI score can then be further explained according to the final density map of PEE.

Although the external dataset included a higher proportion of patients with Sjögren syndrome and patients with more severe disease, external validation in this study showed a Spearman correlation of 0.863, similar with the 0.868 in the internal validation. This indicates that the reliability and the performance of the developed system can be universally applied in settings with different DED severities.

This study has some limitations. First, there were some images in which the peripheral cornea was partially occluded by the upper eyelid. This was because the Korean patients frequently showed low palpebral fissure height. This might have caused the lowest correlation between MDV and NEI score in zone 2.

Second, although one eye was included only once, both eyes of some patients were included for grading system establishment in this study. Third, several outlier cases showed a low correlation between the prediction value and the ground truth. There may have been misdetections of tear break up, low contrast, dense PEE, segmentation grid edge, and light reflex. The internal dataset included 16 cases of misdetection: 6 cases of tear break-up, 4 cases of low contrast, 4 cases of dense PEE, 1 case of segmentation grid edge, and 1 case of light reflex. Meanwhile, the external dataset included 7 cases of misdetection: low contrast (2), tear break-up (2), dense PEE (1), segmentation grid edge (1), and light reflex misdetection (1). Third, the images used in model training were obtained from a single institution. The dataset used for model training should include images from multiple institutions to ensure model reproducibility and consistency in results. In addition, previous study [35] indicated that the use of yellow cut-off filters to remove external blue light is critical for the optimal visualization of ocular surface staining and improve the sensitivity and specificity of diagnosis. Therefore, to improve the performance of the proposed system, training the corneal segmentation model and PEE candidate region classifier using a yellow cutoff filter should be further studied. Also, this study was conducted in a single institution with same camera. Multicenter study is needed to further validate the grading system. Furthermore, extracting PEE candidate region can exclude some area where PEE might exist. Previous studies [19, 20] regarding the evaluation of PEE also have the same limitations. In clinical practice, it is difficult to calculate the NEI score by applying a constant grid using only slit-lamp examination. Thus, despite these limitations, the current system has clinical value because it can automatically apply a grid to the cornea and calculate scores by simply inputting fluorescein-stained corneal images into the system. Particularly, this system can be applied to dry eye clinical trials where objective measurement is important or to multicenter studies that need to reduce interobserver variation.

In conclusion, a fully automated deep learning-based grading system for dry eye severity was developed and was able to evaluate the CFS score with accuracy as high as that of expert ophthalmologists. With this system, a more reliable and reproducible method for DED severity grading can be achieved. In addition, this system can be used in the future to reduce human error and in clinical trials or multicenter studies.

References

  1. 1. van der Laak J, Litjens G, Ciompi F. Deep learning in histopathology: the path to the clinic. Nat Med. 2021;27(5):775–84. Epub 20210514. pmid:33990804.
  2. 2. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. Epub 20190107. pmid:30617339.
  3. 3. Siontis KC, Noseworthy PA, Attia ZI, Friedman PA. Artificial intelligence-enhanced electrocardiography in cardiovascular disease management. Nat Rev Cardiol. 2021;18(7):465–78. Epub 20210201. pmid:33526938; PubMed Central PMCID: PMC7848866.
  4. 4. Grassmann F, Mengelkamp J, Brandl C, Harsch S, Zimmermann ME, Linkohr B, et al. A Deep Learning Algorithm for Prediction of Age-Related Eye Disease Study Severity Scale for Age-Related Macular Degeneration from Color Fundus Photography. Ophthalmology. 2018;125(9):1410–20. Epub 20180410. pmid:29653860.
  5. 5. Moraes G, Fu DJ, Wilson M, Khalid H, Wagner SK, Korot E, et al. Quantitative Analysis of OCT for Neovascular Age-Related Macular Degeneration Using Deep Learning. Ophthalmology. 2021;128(5):693–705. Epub 20200924. pmid:32980396; PubMed Central PMCID: PMC8528155.
  6. 6. Kuo MT, Hsu BW, Yin YK, Fang PC, Lai HY, Chen A, et al. A deep learning approach in diagnosing fungal keratitis based on corneal photographs. Sci Rep. 2020;10(1):14424. Epub 20200902. pmid:32879364; PubMed Central PMCID: PMC7468230.
  7. 7. Bron AJ, de Paiva CS, Chauhan SK, Bonini S, Gabison EE, Jain S, et al. TFOS DEWS II pathophysiology report. Ocul Surf. 2017;15(3):438–510. Epub 20170720. pmid:28736340.
  8. 8. Craig JP, Nichols KK, Akpek EK, Caffery B, Dua HS, Joo CK, et al. TFOS DEWS II Definition and Classification Report. Ocul Surf. 2017;15(3):276–83. Epub 20170720. pmid:28736335.
  9. 9. Dana R, Bradley JL, Guerin A, Pivneva I, Stillman I, Evans AM, et al. Estimated Prevalence and Incidence of Dry Eye Disease Based on Coding Analysis of a Large, All-age United States Health Care System. Am J Ophthalmol. 2019;202:47–54. Epub 2019/02/06. pmid:30721689.
  10. 10. Wu J, Wu X, Zhang H, Zhang X, Zhang J, Liu Y, et al. Dry Eye Disease Among Mongolian and Han Older Adults in Grasslands of Northern China: Prevalence, Associated Factors, and Vision-Related Quality of Life. Front Med (Lausanne). 2021;8:788545. Epub 2021/12/14. pmid:34901096; PubMed Central PMCID: PMC8655125.
  11. 11. Bron AJ, Argueso P, Irkec M, Bright FV. Clinical staining of the ocular surface: mechanisms and interpretations. Prog Retin Eye Res. 2015;44:36–61. Epub 20141023. pmid:25461622.
  12. 12. Nichols KK, Evans DG, Karpecki PM. A Comprehensive Review of the Clinical Trials Conducted for Dry Eye Disease and the Impact of the Vehicle Comparators in These Trials. Curr Eye Res. 2021;46(5):609–14. Epub 2020/11/27. pmid:33238774.
  13. 13. Lemp MA. Report of the National Eye Institute/Industry workshop on Clinical Trials in Dry Eyes. Clao j. 1995;21(4):221–32. pmid:8565190.
  14. 14. Asbell PA, Maguire MG, Peskin E, Bunya VY, Kuklinski EJ. Dry Eye Assessment and Management (DREAM©) Study: Study design and baseline characteristics. Contemp Clin Trials. 2018;71:70–9. Epub 2018/06/09. pmid:29883769; PubMed Central PMCID: PMC7250048.
  15. 15. Yu K, Bunya V, Maguire M, Asbell P, Ying GS, Dry Eye A, et al. Systemic Conditions Associated with Severity of Dry Eye Signs and Symptoms in the Dry Eye Assessment and Management Study. Ophthalmology. 2021;128(10):1384–92. Epub 20210327. pmid:33785415; PubMed Central PMCID: PMC8463420.
  16. 16. Amparo F, Wang H, Yin J, Marmalidou A, Dana R. Evaluating Corneal Fluorescein Staining Using a Novel Automated Method. Invest Ophthalmol Vis Sci. 2017;58(6):BIO168–BIO73. pmid:28693042.
  17. 17. Bagbaba A, Sen B, Delen D, Uysal BS. An Automated Grading and Diagnosis System for Evaluation of Dry Eye Syndrome. J Med Syst. 2018;42(11):227. Epub 20181008. pmid:30298212.
  18. 18. Chun YS, Yoon WB, Kim KG, Park IK. Objective assessment of corneal staining using digital image analysis. Invest Ophthalmol Vis Sci. 2014;55(12):7896–903. Epub 20141118. pmid:25406292.
  19. 19. Qu JH, Qin XR, Li CD, Peng RM, Xiao GG, Cheng J, et al. Fully automated grading system for the evaluation of punctate epithelial erosions using deep neural networks. Br J Ophthalmol. 2021. Epub 20211020. pmid:34670751.
  20. 20. Su T-Y, Ting P-J, Chang S-W, Chen D-Y. Superficial Punctate Keratitis Grading for Dry Eye Screening Using Deep Convolutional Neural Networks. IEEE Sensors Journal. 2020;20(3):1672–8.
  21. 21. Pizer SM, Amburn EP, Austin JD, Cromartie R, Geselowitz A, Greer T, et al. Adaptive histogram equalization and its variations. Computer vision, graphics, and image processing. 1987;39(3):355–68.
  22. 22. Ronneberger O, Fischer P, Brox T, editors. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical image computing and computer-assisted intervention; 2015: Springer.
  23. 23. Dice LR. Measures of the amount of ecologic association between species. Ecology. 1945;26(3):297–302.
  24. 24. Robbins H, Monro S. A stochastic approximation method. The annals of mathematical statistics. 1951:400–7.
  25. 25. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. nature. 1986;323(6088):533–6.
  26. 26. Duda RO, Hart PE. Use of the Hough transformation to detect lines and curves in pictures. Communications of the ACM. 1972;15(1):11–5.
  27. 27. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014.
  28. 28. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z, editors. Rethinking the inception architecture for computer vision. Proceedings of the IEEE conference on computer vision and pattern recognition; 2016.
  29. 29. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, editors. Imagenet: A large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition; 2009: Ieee.
  30. 30. Selvaraju RR, Cogswell M, Das A, Vedantam R,Parikh D, Batra D, editors. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision; 2017: Ieee.
  31. 31. Whitcher JP, Shiboski CH, Shiboski SC, Heidenreich AM, Kitagawa K, Zhang S, et al. A simplified quantitative method for assessing keratoconjunctivitis sicca from the Sjögren’s Syndrome International Registry. Am J Ophthalmol. 2010;149(3):405–15. Epub 20091229. pmid:20035924; PubMed Central PMCID: PMC3459675.
  32. 32. Bron AJ, Evans VE, Smith JA. Grading of corneal and conjunctival staining in the context of other dry eye tests. Cornea. 2003;22(7):640–50. pmid:14508260.
  33. 33. Su T-Y, Ting P-J, Chang S-W, Chen D-Y. Superficial punctate keratitis grading for dry eye screening using deep convolutional neural networks. IEEE Sensors Journal. 2019;20(3):1672–8.
  34. 34. Feng J, Ren Z-K, Wang K-N, Guo H, Hao Y-R, Shu Y-C, et al. An Automated Grading System Based on Topological Features for the Evaluation of Corneal Fluorescein Staining in Dry Eye Disease. Diagnostics. 2023;13(23):3533. pmid:38066774
  35. 35. Peterson RC, Wolffsohn JS, Fowler CW. Optimization of anterior eye fluorescein viewing. American journal of ophthalmology. 2006;142(4):572–5. e2. pmid:17011847