Comparison of Diagnostic Tests in Distinct Well-Defined Conditions Related to Dry Eye Disease

Purpose This study compares signs, symptoms and predictive tools used to diagnose dry eye disease (DED) and ocular surface disorders in six systemic well-defined and non-overlapping diseases. It is well known that these tests are problematic because of a lack of agreement between them in identifying these conditions. Accordingly, we provide here a comparative clinical profile analysis of these different diseases. Methods A spontaneous and continuous sample of patients with Sjögren's syndrome (SS) (n = 27), graft-versus-host-disease (GVHD) (n = 28), Graves orbitopathy (n = 28), facial palsy (n = 8), diabetes mellitus without proliferative retinopathy (n = 14) and glaucoma who chronically received topical drugs preserved with benzalkonium chloride (n = 20) were enrolled. Evaluation consisted of a comprehensive protocol encompassing: (1) structured questionnaire - Ocular Surface Disease Index (OSDI); (2) tear osmolarity (TearLab Osmolarity System - Ocusense); (3) tear film break-up time (TBUT); (4) fluorescein and lissamine green staining; (5) Schirmer test and (6) severity grading. Results One hundred and twenty five patients (aged 48.8 years-old±14.1, male:female ratio = 0.4) were enrolled in the study, along with 24 age and gender matched controls. Higher scores on DED tests were obtained in Sjögren Syndrome (P<0.05), except for tear film osmolarity that was higher in diabetics (P<0.001) and fluorescein staining, that was higher in facial palsy (P<0.001). TFBUT and OSDI correlated better with other tests. The best combination of diagnostic tests for DED was OSDI, TBUT and Schirmer test (sensitivity 100%, specificity 95% and accuracy 99.3%). Conclusions DED diagnostic test results present a broad range of variability among different conditions. Vital stainings and TBUT correlated best with one another whereas the best test combination to detect DED was: OSDI/TBUT/Schirmer.


Introduction
Dry eye disease (DED) is a multifactorial condition involving changes in tear composition and volume as well as ocular surface (OS) integrity, with a several different risk factors and symptoms.
[5] It is well known that diagnostic test results for DED poorly correlate with one another and with symptoms. [6,7] A possible reason for this disparity is that the heterogeneity in causative DED factors induces different changes in underlying mechanisms controlling lacrimal gland (LG) and OS physiology.
In Sjögren's syndrome (SS), the LG turns into a target of the immune system. Consequently, the presence of focal lymphocytic infiltrates leads to increased production of pro-inflammatory cytokines, acinar damage and aqueous production impairment. [8,9] Similar inflammatory responses may be noted in graft versus host disease (GVHD) that is also accompanied by conjunctival inflammation, meibomian gland dysfunction and severe DED features. [10] As LG secretion is under neural control, proper stimuli from the ocular surface afferent sensory nerves in the cornea and conjunctiva activate efferent responses to stimulate LG secretion. In this regard, conditions such as diabetes mellitus (DM) and facial palsy (FP) can be important causes of dysfunctional changes in tear volume and/or composition. [11][12][13][14] In addition, it is also important to consider that hormones, in particular, insulin, thyroid and sex steroid hormones are regulators of LG functions [15][16][17][18] and that the OS is constantly affected by their related diseases (e.g.; Graves orbitopathy). Finally, environmental factors, aging and topical medications or their preservatives (e.g., benzalkonium chloride or BAK), may also contribute to either improve or aggravate DED. [19][20][21] Our hypothesis is that differences in the underlying mechanisms of diseases (i.e., SS, GVHD, DM, FP, Graves orbitopathy, BAK toxicity) affect in distinct ways and intensity tear secretion and DED clinical presentation. The present work compares the signs and symptoms of DED in six systemic well-defined and nonoverlapping diseases. By evaluating the performance of DED tests, we draw attention to the need to deal with this challenging problem of diagnosing this highly prevalent and sight threating condition.

Methods
A total of one hundred and twenty-five DED subjects were included in this study. Consecutive patients attending the outpatient DED clinic in a tertiary care university hospital were invited to participate. Patients presenting one of the following conditions associated with DED were included: SS (diagnosed following the American-European Criteria) [22], GVHD, Graves orbitopathy, facial palsy (based on clinical criteria), chronic glaucoma topical treatment with benzalkonium chloride (BAK) preserved drugs for at least one year, and diabetes mellitus without retinopathy (based on fasting glycemic levels and indirect fundoscopic evaluation).
All patients were under clinical treatment at the time of their evaluation. Due to a lack of agreement among the established DED diagnostic criteria, described in different clinical studies [1][2][3][4], we adopted the following criteria: Ocular Surface Disease Index (OSDI) score . 20 and/or Schirmer test (ST) ,10 mm or tear break up time (TFBUT) #6 seconds and/or any of the vital staining .3 and/or tear film osmolarity .310 mOsm. DED diagnosis was considered if the patient presented at least one positive test according to these pre-established criteria. Patients were separated into six different subgroups based on their disease (i.e.; SS, GVHD, Graves orbitopathy, facial palsy, diabetes mellitus without retinopathy, or chronic glaucoma treatment with BAK preserved eye drops), were compared throughout the study.
Twenty-four healthy volunteers, matched by age and gender were enrolled as a control group. As exclusion criteria we considered: active ocular infection, ocular allergy, history of refractive surgery or contact lens wear, pregnancy and lactation, or conditions with clinical overlapping of the aforementioned diseases.
The study was approved by the Faculty of Medicine Ethics Committee, University of São Paulo and was conducted in accordance with the tenents of the Declaration of Helsinki and current legislation on clinical research. Written informed consent was obtained from all subjects after explanation of the procedures and study requirements. Evaluation of DED consisted of a protocol encompassing: OSDI questionnaire, tear film osmolarity measurement, tear break-up time (TFBUT), corneal staining with fluorescein, Schirmer test (ST) and conjunctival staining with lissamine green, as described below and according to the following sequence [6].

OSDI
The OSDI score is a subjective symptom questionnaire, used as DED outcomes measurement to estimate its severity [6,23]. A portuguese language validated version was used. [24] 2. Tear film osmolarity Tear film osmolarity was measured using a lab-on-a-chip system to simultaneously collect and analyze the electrical impedance of a tear sample (TearLab Corp San Diego, CA, USA). A small tear sample of (50-nanolitre) was collected from the lower meniscus, using a disposable test chip by passive capillary action. Osmolarity readings are given in milliosmoles per liter a few seconds after the transfer. [25] Quality control procedures were applied before starting patient testing each day, to confirm function and calibration according to the manufacturer instructions.
Slit lamp examinations inspected the cornea and conjunctiva at a magnification of 10-16X and were used to perform some of following tests as previously described. [6,26] 3. Tear film break-up time TFBUT was measured 10-30 seconds after instillation of 5 ml of a 2% sodium fluorescein solution (Allergan, Guarulhos, Brazil) and calculating the average of three consecutive breakup times, determined manually using a stopwatch (in seconds).

Corneal fluorescein staining
Corneal fluorescein staining was evaluated using cobalt blue illumination following the 15-point NEI/ Industry scale (grades of 0-3 for five regions of the ocular surface), after TFBUT measurments.

Schirmer test
Tear production was measured in both eyes simultaneously with Schirmer test strip for 5-minutes without anesthetic (Ophthalmos Ltd., São Paulo, Brazil).

Lissamine green conjuntival staining
Lissamine green conjuntival staining was evaluated after instilling 10 ml of a 1% sodium lissamine green dye (Ophthalmos Ltd., São Paulo, Brazil). Conjunctival staining assessment used a grading scheme described by van Bijsterveld according to a modified NEI/Industryscale, where grades of 0-3 are assigned for three regions (temporal, central and nasal).
All measurements were performed by the same investigators, under similar testing conditions and at room temperature. For each sign, the more severe measurement in the two eyes was used in the analysis of disease severity.
In this study, patients were using recommended treatments for their diseases and artificial tears for DED. Patients were instructed to not use any eye drops on the day when they were examined in the clinic.
Dry eye severity was graded according to a modified severity score scheme from 1-4 as described previously. [5].

Statistics
Descriptive statistics for continuous data were reported as mean6SD. Continuous variables were compared using Kruskall-Wallis (with Dunn's post hoc test) when two or more than two groups were analyzed, respectively. Correlations between the variables under investigation were determined using Spearman correlation coefficient. Differences were considered significant at P,0.05. All analysis were performed using SPSS v.17.0 (IBM Corp., Armonk, NY, USA). The values of sensitivity, specificity, positive predictive value and accuracy were made for the following tests: OSDI.20, tear film osmolarity .310 mOsm, Schirmer test ,10 mm, TFBUT,6 sec, and vital staining $3, as standardized above to include as DED for each condition. All calculations of true positive, true negative, false positive and false positive were made taking into consideration, except the one that is under observation. For the best combination of tests to detect DED in this population, we applied binary multivariate logistic regression through a backward model, including all individuals and test results.
We evaluated some statistical measures in order to better understand the performance of DED diagnostic tests. Sensitivity relates to a test's ability to identify positive results (i.e, cases of DED), is calculated with true positive results and total of true conditions; while specificity evaluates the ability to identify negative results (i.e., non DED) and is calculated using true negative results and total of negative conditions. Accuracy is used to correctly identify or exclude a condition, here DED. That is, the accuracy is the proportion of true results (both true positives and true negatives cases of DED) in the population. Positive Predictive Value (PPV) is also an indicator of accuracy, reinforcing the capacity of the test to identify the real positive result whereas Negative Predictive Value (NPV) identifies the real negative ones. For all those calculations, the study assumed that the cut-off values as standardized above to include DED for each condition. All calculations of true positive, true negative, false positive and false positive were made taking into consideration the exams, except the one that is under evaluation.

Results
A total of one hundred twenty-five DED subjects, with a male:female ratio of 0.4, and mean age of 48.8614.1 years were included in this study. The control group consisted of twenty-four normal volunteers, with a male:female ratio of 0.38, and mean age of 45.7612.7 years. Based on the presence of baseline conditions associated with DED, the following subgroups were formed: SS (n = 27), GVHD (n = 28), Graves orbitopathy (n = 28), facial palsy (n = 8), diabetes mellitus without retinopathy (n = 14), and patients under chronic glaucoma treatment with BAK preserved eyedrops (n = 20).
Results obtained in the clinical and laboratory evaluation of patients and controls varied significantly among patients, controls and across different subgroups (Table 1).
Correlation coefficients calculated from the data set of all patients included in this study are reported in the Table 2. Among the diagnostic tests evaluated herein the coefficients were consistently modest suggesting lack of concordance. The highest values of correlation were observed between fluorescein and lissamine, and between TBUT and lissamine (R 2 = 0.43 and R 2 = 0.31, respectively). Similarly, the correlations coefficients evaluated within each study subgroup had a wide range of values for each test and no consistent relationship (Table 3).
Most patients were classified as grades 2 and 3 in severity as it is shown in Figure 1. The SS patient subgroup had higher DED severity prevalence with a grade of 4 (24%). This same subgroup had among the four grades a more homogeneous DED score distribution. Results from all study parameters according to severity score showed a consistent association between severity scores and clinical and laboratory parameters.
Based on the arbitrary cut-off levels established here, the sensitivity, specificity, accuracy and positive and negative predictive values of the different diagnostic tests were evaluated. The most sensitive test was OSDI while the least accurate was lissamine green staining ( Table 4). The test sensitivity among the different subgroups had a large variability (Table 5).

Discussion
The present work revealed that there is very appreciable variation among diagnostic test results among different diseases and the best test combination to detect DED is OSDI/TBUT/ Schirmer test. This result reinforces the importance of the most commonly used tests to detect DED in clinical practice, but also emphasizes their variability. By comparing distinctly related diseases to DED incidence, we found that those tests are poor predictors of this disease. This inadequacy makes it more apparent of the need to rely at this time on clinical interpretation of a combination of test results.
Meaningful diagnostic testing in DED patients across a broad range of different etiologies and presentations is still a challenge. [27,28] Owing to the great variability in DED severity, it is unlikely that a single test result has adequate sensitivity to serve for DED given its multifactorial nature and numerous manifestations. It is important to consider, the overlap between normal and DED values, the lack of a gold standard test or even an ensemble of universally accepted tests and the lack of concordance between the signs and symptoms of this disease. Research on potential DED diagnostic tools and therapeutic agents has increased exponentially. [27,[29][30][31] Even though there are a large number of symptoms as well as a wide range of methods and severity grades commonly linked to DED diagnosis, they can be also characteristic of other conditions besides DED. [29,32] All subgroups reported higher OSDI scores compared to controls, although in chronically treated glaucoma patients and in diabetics they did not reach significance. OSDI scores correlated poorly with other tests in a broad analysis, but in SS patients the best agreement was found between: 1) OSDI and TBUT; 2) OSDI and fluorescein staining. Those findings suggest that when dryness reflects ocular surface changes, then neural pathways are better preserved than in patients with DM or BAK. In this situation, this association is consistent with patients' description of ocular discomfort.
TFBUT is a widely used test. It is minimally invasive, repeatable and more reliable than the Schirmer test. We found that the TFBUT had the greatest correlation with other tests in the different diseases. A possible reason for this agreement is that its score can vary depending on a larger number of factors, such as, exposed ocular surface area tear film volume and clearance among others and there is no widely accepted standard cut-off.
The ocular surface staining pattern is not necessarily altered in early stages of the disease. [33,34] SS, GVHD and facial palsy patients had higher vital staining scores. Similar findings were obtained in a recent study comparing DED results in systemic conditions of Asian rheumatoid arthritic patients who had higher corneal staining scores than DM and smokers. [35] Correlation coefficient analysis showed that the highest positive values were found between the two dye staining results. This was also the case with the group evaluation and between the SS and GVHD subgroups. As aqueous tears deficiency is characteristic of those diseases, this correlation agrees also with their correlation between lower Schirmer test and TFBUT values. The Schirmer test has been considered inaccurate, unrepeatable and not inclusive of the evaporative aspect of DED. [27] Table 4. Values of sensitivity, specificity, positive predictive value and accuracy for the tests of DED among the groups. Some studies have shown that tear film osmolarity is a feasible parameter for diagnosing and evaluating therapeutic response. [36] In addition, tear osmolarity measurements could have a parallel with DED severity. [25,37] However, tear film osmolarity does not strongly correlate with other tests. [38] The present work shows, that SS, GVHD and DM without retinopathy patients also, presented with tear film hyperosmolarity, which reaffirms evidence, of its association with DED severity.
Of interest, the considerably higher tear film osmolarity mean values found in DM without retinopathy and those individuals under clinical treatment, where the DED severity was only 1 and 2, is in agreement with a report suggesting a possible influence of metabolic dysfunction on tear osmolarity. [39] A critical appraisal of different DED causes in the present study reveals an overlap between test results due to the range of severity in the different categories as well as variations in the underlying pathophysiological mechanisms. SS and GVHD whose clinical DED findings allocated them as moderate to severe grades had higher test mean values and most significant correlations. Glaucoma patients included in this study were being chronically treated topically with one BAK preserved drug for at least one year. Other studies have shown that there is a high prevalence of ocular surface changes and symptoms in glaucoma patients whose severity correlates with the number of medications. [40,41] DED frequency was high in DM without proliferative retinopathy patients, however in no case was DED severe or presenting with higher mean levels in other signs. This is in accordance with recent epidemiologic studies where proliferative retinopathy was associated with higher frequency and severity of DED. [42,43] DED is a serious and complex condition whose early recognition and treatment are crucial to avoiding losses in visual acuity and to improving quality of life. [44] On the basis of this study, we conclude that irrespective of the underlying causative DED condition, it remains quite difficult to interpret current DED diagnostic sets because of their frequent disagreement. Our results suggest that here is a lack of strong and consistent correlations among the test results. Herein, we could also observe that these values and severity grades ranged widely throughout the different disease subgroups. Nevertheless, each test could provide distinct information in a particular patient context, related diseases, risk factors and DED stage. It is reasonable that a specific test combination could provide better conclusions, regarding effective clinical management care. However, it remains very problematic to design meaningful clinical trials to evaluate the results of different DED treatment studies.
To determine the cut-off levels of tests for DED implies making a decision that is broad or restrictive. There is no widely accepted standard cut-off for its diagnosis. [6] Since our aims were to investigate the DED tests in pre-defined diseases with different mechanisms and to check whether any test has more accuracy in that specific group, we opted to define as a DED patient one who presented with changes in any of the tests in the panel, but to be more selective in terms of cut-off levels to avoid DED without sufficient cause.
In the future, a more comprehensive characterization of lacrimal gland dysfunction and/or ocular surface disease markers may provide valuable insight needed for a better understanding of underlying mechanisms needed for DED management.