The Berkeley Dry Eye Flow Chart: A fast, functional screening instrument for contact lens-induced dryness

Purpose In this article, we introduce a novel flow chart-based screening tool for the categorization of contact lens-induced dryness (CLIDE) and its impact on daily visual activities: the Berkeley Dry Eye Flow Chart (DEFC). Methods One hundred thirty (130) experienced soft contact lens wearers discontinued lens wear for 24 hrs, passed a baseline screening and eye health examination, completed the Ocular Surface Disease Index (OSDI) then were dispensed fresh pairs of their habitual lenses. After 6 hrs of wear, subjects were administered a battery of symptom questionnaires, and underwent non–invasive tear breakup time (NITBUT) measurement, grading of distortion in reflected topographer mires, grading of lens surface wettability, and a fluorescein examination of the ocular surface. Subjects returned after at least 48 hrs and repeated all assessments after 6 hrs of wear of a second fresh pair of habitual lenses. Results The repeatability of the DEFC between visits was within 1%, and Limits of Agreement and Coefficient of Repeatability were comparable to those of the other CLIDE assessments. Higher DEFC score was significantly related to shorter pre-lens NITBUT, higher OSDI score, and higher Visual Analog Scale (VAS) ratings of average and end-of-day severity and frequency of dryness (all p < 0.001). For CLIDE as diagnosed based on DEFC score, the highest sensitivities and specificities were achieved by the OSDI and VAS ratings; pre-lens NITBUT exhibited good sensitivity but poor specificity. The optimum pre-lens NITBUT diagnostic threshold was found to be ≤ 2.0 sec for debilitating CLIDE, and the OSDI threshold was ≥ 11.4. Conclusions The DEFC provides a means of quickly categorizing CLIDE patients based on severity and frequency of symptoms, and on the degree to which symptoms impact daily life. The DEFC has several potential advantages as a CLIDE screening and monitoring tool, has good repeatability, and is significantly related to commonly employed clinical assessments for CLIDE.

colleagues in clinical practice who felt that numerical grading of symptoms or Likert-style checkbox questionnaires were insufficient to capture how a symptomatic patient really felt. While an internal mental state is a latent trait, it may be made manifest (and measured) in a change of expression or behavior related to the symptoms. From a clinical perspective, the extent to which symptoms interfere with daily activities is a good indicator of the severity of the problem, and suggestive of how it should be further treated. From a research perspective, it was determined that the instrument should be short and easy to complete, thus reducing the potential for misclassification bias. It was finally determined that the most promising design would be a flow-chart questionnaire, with binary items related to the presence or absence of symptoms, their severity and frequency, and their impact on daily life.
There are several potential advantages inherent in the design of the DEFC. It is very simple to read and understand, reducing the potential for later misdiagnosis due to incorrectly completed questionnaires from patients who are fatigued or confused by a longer or more difficult instrument, and reducing the potential for bias in clinical studies due to over-selecting for the capabilities required to successfully complete longer, more difficult questionnaires [29]. Because the DEFC score incorporates aspects of symptom presence, sensation, severity, frequency and especially impact on everyday activities, different types of functional CLIDE can be identified by setting the DEFC score at appropriate diagnostic thresholds. For example, a researcher may wish to assign potential clinical trial subjects to either a control group with no CLIDE symptoms (DEFC score < 2) or a CLIDE group with symptoms spanning the entire range encountered in the target population (DEFC score ! 2); a clinician may wish to screen patients (n.b., with further clinical assessment before initiating actual treatment) whose symptoms require no or only mild intervention such as occasional application of artificial tears (DEFC score < 4) from those whose symptoms are severe enough to be debilitating and require more thorough investigation and possibly more aggressive treatment (DEFC score ! 4). Finally, the DEFC score provides a fast and easy way to gather a basic, functional, actionable reading of CLIDE status, which can be logistically important in facilitating large numbers of potential subjects being quickly pre-qualified in a clinical trial setting or classified for epidemiologic study, or large numbers of patients being efficiently pre-screened in an institutional setting (e.g., an on-site screening at a senior care facility) [30,31].
Our goals in this paper are: (1) to establish the repeatability of the the DEFC and to compare its performance with that of other known subjective and clinical measures of CLIDE; (2) to determine whether several clinical tests and subjective symptom measures of CLIDE are related to the DEFC score, and to determine their effect sizes and significance of association; (3) to determine the diagnostic performance (sensitivity and specificity, optimum threshold) of these subjective and clinical measures for a DEFC-based diagnosis of CLIDE. We will present this analysis for the two different examples of DEFC-based CLIDE diagnosis described above: No Symptoms vs. Any Symptoms, and No/Mild Symptoms vs. Debilitating Symptoms. These findings will serve to illuminate whether and to what degree clinical outcomes and subjective symptom measures are reflected in different types of CLIDE, based on the functional classification offered by the DEFC. With its characteristics and performance documented, the DEFC could prove to be a useful addition to the investigational and screening tools available for CLIDE, and to its diagnosis and management in the clinical setting.

Subjects
Subjects were recruited from the University of California, Berkeley campus and surrounding community through posted flyers and direct referrals. Potential subjects were soft contact lens wearers 18 years of age or older who had had eye examinations within the previous 2 years. Eligible subjects were free of any ocular surface pathology or health conditions with ocular manifestations, and were not taking any prescription or over-the-counter medications that could affect the ocular surface or tear film.

Study protocol and procedures
Subjects who passed an initial eligibility screening by telephone and elected to participate were instructed to report to the U.C. Berkeley Clinical Research Center (CRC) for a baseline examination and final eligibility determination, wearing their habitual lenses. Upon arrival at the baseline visit, subjects completed a background questionnaire to collect demographic data as well as medical and contact lens histories, and to complete the Ocular Surface Disease Index (OSDI). Subjects then completed the DEFC followed by 100-point rating scale questionnaires on the severity and frequency of all-day and end-of-day dryness symptoms, presented as Visual Analog Scales (VAS) [32,33]. For dryness severity, a rating of 0 = No Dryness and a rating of 100 = Severe Dryness; for dryness frequency, a rating of 0 = Never and a rating of 100 = All the Time. VAS ratings were given by the subject making a vertical mark through a 10 cm horizontal line with only the end points labelled as described, to indicate where his or her eyes fell on the continuum of symptoms, the location of which was automatically determined by scanning software (Teleform, Hewlett Packard, Palo Alto, CA, USA).
Eligible subjects were then scheduled for 2 follow-up visits, at least 48 hrs apart. Subjects were dispensed 2 fresh pairs of their habitual lenses, and were instructed to insert a fresh pair of lenses at least 6 hrs prior to each follow-up visit. At these visits, subjects first completed all dryness symptom questionnaire instruments, following which, a corneal topographer (E300, Medmont International Pty. Ltd., Melbourne, Victoria, AUS) was used to measure non-invasive tear breakup time (NITBUT). Breakup time was recorded at the first sign of a break or distortion in the reflected mires. Three measurements per eye were taken and averaged, with at least 30 seconds between measurements to allow blinking to refresh the tear film. Following the NITBUT measurement, in a mild form of "stress test", subjects were instructed to blink and then hold the eyes open, and at 10 sec the severity of distortion or haze in the reflected topographer mires was graded on a 0-4 scale (0 = No haze or distortion; 4 = Extreme haze or distortion) in the central, nasal, temporal and inferior zones. A slit lamp (SL 120, Carl Zeiss Meditec AG, Oberkochen, Baden-Württemberg, GER) with white light was then used to grade in vivo contact lens surface wettability on a 0-4 scale (0 = Non-wetting; 4 = Complete wetting). Following removal of the lenses and instillation of 2 μl of 0.35% sodium fluorescein to the bulbar conjunctiva, cobalt blue light and a Wratten #12 yellow barrier filter were employed to grade corneal and conjunctival staining according to the Brien Holden Vision Institute grading scales [34]. Conjunctival staining was graded separately in the nasal, temporal, superior and inferior quadrants. Corneal staining was graded separately in the 4 quadrants and in the central 5mm zone, in terms of the type, extent and depth of staining.
This study adhered to the tenets of the Declaration of Helsinki, was approved by institutional review board (U.C. Berkeley Committee for the Protection of Human Subjects) and was conducted in compliance with Health Insurance Portability and Accountability Act (HIPAA) guidelines for data safety and subject anonymity. All subjects provided written informed consent after a thorough description of the procedures, potential benefits, and risks involved in the study.

Statistical methods
Repeatability of the DEFC instrument was assessed by comparing scores at the 2 visits, taken under as close to identical conditions as possible. The 2 visits were approximately 48 hours apart, both after at least 24 hours discontinuation of lens wear, insertion of a fresh pair of lenses, followed by 6 hours of wear. We examined the distribution of the differences (Visit 2 -Visit 1) in DEFC score, the mean difference, Limits of Agreement (LoA) [35], Difference vs. Means (DVM) plots [35], the % of scores within 1 point on the 5-point DEFC scale from visit to visit, and conducted a variance component analysis to estimate the within-subject variability in DEFC scores from which we calculated the Coefficient of Repeatability (CR) [36]. Strictly speaking, the true repeatability of an instrument requires measuring the identical phenomenon on multiple occasions under identical conditions with the same observer, which is not possible for subjective symptoms which can vary from day to day for some subjects, even under nearly identical conditions. We therefore conducted the same repeatability analysis on our other CLIDE measures (VAS ratings of symptoms, NITBUT, etc.) and converted those results to DEFC-equivalent scales, in order to show that the repeatability of the DEFC instrument, in the presence of inherently labile symptoms, is comparable to that of other commonly employed CLIDE assessments.
After repeatability assessment and a thorough exploratory analysis, a multivariable linear mixed effects modeling approach was taken to determine which clinical, laboratory and subjective measures were significantly related to DEFC score, accounting for potential within-subject correlations engendered by the repeated measures design. Because DEFC score is a subject-level variable, for eye-level variables such as tear breakup time or lens surface wettability, we examined grades for the worse eye (e.g., faster breakup time, poorer wettability), the better eye, the average of the two eyes and the sum of the two eyes as potential explanatory variables for DEFC score. For corneal and conjunctival staining, we also examined the grades zone-by-zone and summed over all zones.
After identifying which variables were significantly related to DEFC score, we employed Receiver Operating Characteristic (ROC) analysis [37] to estimate the sensitivities and specificities of these diagnostics to a DEFC-based determination of CLIDE status, for the two different thresholds (DEFC score !2 and !4) described above. In the ROC analysis we calculated the sensitivity and specificity using every possible value of each clinical and subjective measure as a diagnostic threshold, within the ranges we observed. We developed an algorithm to implement our adaptation of the method proposed by Emir, et al. [38] for estimating sensitivity and specificity in the presence of repeated measures. The optimum threshold was considered to be the value that resulted in sensitivity and specificity at the smallest Euclidean distance to the (0,1) point on the ROC plot (i.e., the point geometrically closest to 100% sensitivity and 100% specificity).

Subject characteristics
One hundred thirty (130) subjects completed the study. Subjects ranged in age from 18 to 61 years, with a mean (SD) of 26.6 (9.7) years. A majority of subjects (n = 89, 68%) were female. Sixty Asian subjects (46%) and 70 non-Asian subjects (54%) were analyzed. The Asian group consisted of subjects of Chinese, Japanese, Korean, Southeast Asian or Pacific Islander descent. Non-Asian subjects included Caucasians, African-Americans, and Latinos. Subjects were all experienced soft contact lens wearers (38% silicone hydrogel, 62% hydrogel), with mean (SD) wearing times of 5.7 (1.7) days/week and 11.8 (3.4) hours/day. Further detail on the demographic makeup and baseline characteristics of our subjects is provided in Table 1.

Repeatability of the DEFC
The mean difference in DEFC scores (Visit 2 -Visit 1) was 0.05 units (1.00%) on the 5-point DEFC scale, and was symmetrical about this difference, ranging from +3 to -3 units (Fig 2). The between-visit differences in other CLIDE measures ranged from 0.02% to 4.60% (Table 2) and were also symmetrical.

DEFC score: Relationships to other CLIDE assessments
Subjects reported DEFC scores spanning the range of CLIDE symptoms (1)(2)(3)(4)(5), with a median score of 3, and a mean (SD) score of 2.8 (1.4), combined over the 2 visits. Table 3 shows the distribution of DEFC scores at each visit, along with the mean (SD) of each CLIDE assessment, stratified on DEFC score. We found there to be significantly (p < 0.001) shorter pre-lens NITBUT (i.e., greater tear film instability) with higher DEFC score (i.e., worse symptoms). We examined a number of variables derived from pre-lens NITBUT on both the raw and natural log scales, and found that the best model fits were achieved using the worse pre-lens NITBUT of the two eyes, with three individual measurements above 10 sec truncated at that time to better approximate normality and avoid undue leverage effects. The pre-lens NITBUT ranged from approximately 4.0 sec on average among subjects with no CLIDE symptoms (DEFC score = 1) to 1.9 sec on average among subjects whose CLIDE symptoms are severe enough to frequently interfere with daily activities (DEFC score = 5). Table 4 shows the sensitivity and specificity of pre-lens NITBUT for our first DEFC-based diagnosis of CLIDE (No Symptoms vs. Any Symptoms) to be 65% and 50%, respectively, at a threshold of 2.5 sec. For our second DEFC-based diagnosis of CLIDE (No/Mild Symptoms vs. Debilitating Symptoms), the sensitivity and specificity of pre-lens NITBUT were 73% and 48%, respectively, at a threshold of 2.0 sec. We did not find a significantly worse grade of distortion of the reflected topographer mires at 10 sec post-blink with higher DEFC score (Table 3). We examined a number of variables derived from grading of distortion, including the individual grades in each quadrant, and the sum distortion grade over all quadrants. We found no significant association between any distortion grade variable and DEFC score, with the sum score across quadrants from the worse eye (p = 0.540) having the best model fit, although all models were similar. With no significant association, grade of distortion was clearly not useful for distinguishing subjects with no symptoms from those with any CLIDE symptoms, having a sensitivity and a specificity of 0.50 and 0.52, respectively (Table 4); nor was grade of distortion useful for distinguishing subjects with no/mild symptoms from those with debilitating symptoms requiring intervention, with a sensitivity and a specificity of 0.52 and 0.52, respectively.
Clinician grading of in vivo contact lens wettability did not show a trend related to DEFC score (Table 3), and indeed in the mixed effects models was not a significant factor (e.g., p = 0.607 for the worse eye, that is, the eye with the less wettable lens). As expected, grade of lens wettability was not useful for distinguishing subjects with no symptoms from those with any CLIDE symptoms, having a sensitivity and a specificity of 0.48 and 0.58, respectively (Table 4); nor was grade of contact lens wettability useful for distinguishing subjects with no/ mild symptoms from those with debilitating symptoms requiring intervention, with a sensitivity and a specificity of 0.50 and 0.55, respectively. The OSDI score was significantly related to the DEFC score (p < 0.001), and showed a consistent increase across the levels of the DEFC (Table 3) from a mean (SD) of 8.0 (8.6) among subjects with no CLIDE symptoms (DEFC score = 1) to 25.8 (17.2) among subjects with the most severe symptoms (DEFC score = 5). As shown in Fig 3, for the first DEFC-based CLIDE diagnosis (No Symptoms vs. Any Symptoms), the OSDI exhibited a low sensitivity of 0.55 but a relatively high specificity of 0.80 (Table 4). For the second DEFC-based diagnosis (No/Mild Symptoms vs. Debilitating Symptoms), the OSDI showed a sensitivity and specificity of 0.71 and 0.70, respectively.
There were significant associations between DEFC score and subjective VAS ratings of symptoms, including on-average severity and frequency of dryness while wearing lenses, and end-of-day (EOD) severity and frequency of dryness (all p < 0.001). A clear trend of higher VAS ratings with higher DEFC score was found (Table 3), and EOD ratings tended to be slightly higher than on-average ratings (Fig 4). On-average dryness severity ranged from approximately 19 on the 100-point VAS scale for those with no symptoms (DEFC score = 1) to 48 for those with the most debilitating symptoms (DEFC score = 5), while EOD dryness severity ranged from approximately 25 to 68, on average, across those same DEFC categories. Similarly, on-average dryness frequency ranged in mean from approximately 14 to 60 across the DEFC score spectrum, while EOD dryness frequency means ranged from approximately 25 to 71. Sensitivities and specificities for our two DEFC-based diagnoses of CLIDE are shown in Table 4. In general, the subjective VAS ratings had higher sensitivities and specificities than did any of the clinical measurements, and were similar to those of the OSDI. The Berkeley Dry Eye Flow Chart

Discussion
In this article, we have introduced the Berkeley Dry Eye Flow Chart, which has the potential advantages of being simple to understand, fast to administer to large numbers of subjects or patients, and provides a simple 1-5 score that directly reflects not only the presence of dryness symptoms and the level of discomfort they induce, but the functional impact of these symptoms on daily visual activities such as reading or using a computer. We have shown that the repeatability of the DEFC is comparable to or better than that of other common clinical and subjective measures of CLIDE. We have demonstrated a direct relationship between higher DEFC score and shorter pre-lens tear breakup time, higher VAS ratings of severity and frequency of symptoms, both on-average and at end-of-day, and higher OSDI score. Finally, we have demonstrated how the DEFC can be used to categorize subjects into CLIDE and non-CLIDE groups based on different diagnostic criteria, and estimated the sensitivity and specificity of each subjective and clinical CLIDE assessment to our two different DEFC-based diagnoses, along with the optimum diagnostic threshold for each assessment.
The DEFC was significantly related to, and showed clinically significant trends in the expected direction with, subjective ratings of symptoms using the VAS and the OSDI, and with pre-lens tear film instability measured by NITBUT. These are among the most commonly The Berkeley Dry Eye Flow Chart employed assessments for both CLIDE and DE in general. In contrast, grades of topographer mires distortion and in vivo contact lens wettability were not significantly related to DEFC score, and furthermore did not suggest a linear trend of worse distortion or poorer wettability with higher DEFC score. These assessments, however, are not standard tests for CLIDE or DE in clinical practice. We included them in this study because some previous work in our lab [39] and others' [10,40] suggested that these might prove to be useful assessments related to severity of symptoms. This, however, turned out not to be the case in our study population. Thus the validation of the DEFC is based on the more commonly accepted assessments of tear film instability measurement and subjective ratings of symptoms.
The presence of a contact lens partitions the tear film into pre-and post-lens segments [41], and the pre-lens tear film is typically less stable than the pre-corneal tear film without lenses [42]. It remains equivocal whether pre-lens tear film stability differs between symptomatic and asymptomatic contact lens wearers. Estimates of average pre-lens NITBUT from asymptomatic hydrogel lens wearers range from 2.8 sec to 10.8 sec depending on lens type as well as on lens care solution formulation [43][44][45][46]. Pre-lens NITBUT has been found across a similar range in symptomatic subjects, and while symptomatic and asymptomatic subjects have been found to be significantly different in some studies [47], they have not in others [48,49]. Ours is the first study to show a distinct trend in pre-lens NITBUT in vivo across the range of CLIDE severity, from approximately 4 sec on average among subjects with no CLIDE symptoms to less than 2 sec among subjects with the most debilitating CLIDE. In addition, this study is the first to establish diagnostic thresholds ( 2.5 sec for any CLIDE symptoms, 2.0 sec for debilitating symptoms) for pre-lens, in vivo NITBUT (albeit at sensitivities and specificities of 65 and 50, respectively, for any CLIDE symptoms, and 73 and 48, respectively, for debilitating symptoms). It should be noted that the sensitivities and specificities, and the diagnostic thresholds obtained by ROC analysis in this study must be interpreted in the context of our study population: its demographic makeup, contact lens history, ocular characteristics, and so forth. For example, a much older population with substantially different refractive demands could very well elicit different diagnostic performance and different optimum diagnostic thresholds.
Corneal and conjunctival staining were also assessed at each exam. Although visible ocular surface damage can be indicative of severe, prolonged DE or CLIDE [50], in our study there were no significant relationships between DEFC score and any of our ocular surface staining variables. This is because, in our sample of young, healthy, experienced contact lens-wearing subjects, who wore fresh pairs of their habitual contact lenses for 6 hours only prior to assessment, there were very few cases of even moderate staining-certainly not sufficient to test any hypothetical statistical association.
The ROC analysis showed that even the best assessments for CLIDE do not operate with anything close to perfect sensitivity and specificity. In other words, even for the best assessments there will be false positives and false negatives. The sensitivities and specificities we found in this study are broadly in line with the relatively few examples published to date for CLIDE diagnosis, and with the somewhat more commonly reported diagnosis of DE in general. It is impossible to compare the sensitivities and specificities we found directly to other published examples because differences in study design, population characteristics, and in the criteria for determining a positive diagnosis all result in widely varying estimates across studies. For example, the sensitivities of NITBUT were 65% and 73% for our two DEFC-based diagnoses, respectively, compared with published examples ranging from 55% to 90%; the specificities for NITBUT that we found were 50% and 48%, while published examples ranged from 21% to 85% [51][52][53][54][55]. Studies employing various types of questionnaires have estimated sensitivities of 60% to 86%, compared with approximately 66% to 82% for our VAS questionnaire items, and specificities of 46% to 94%, compared with our range of approximately 68% to 82% [51,[55][56][57][58]. A great many other potential diagnostic instruments (primarily for DE) have been investigated, including the Schirmer test, fluorescein tear breakup time, tear osmolarity, imprint cytology, vital staining, tear ferning, and a variety of subjective symptom instruments, leading to published estimates of sensitivity ranging from as low as 39% to as high as 98%, and estimates of specificity ranging from 10% to 100% [15,20,51,[59][60][61][62][63][64][65][66]. The primary difference between studies is the set of criteria by which DE or CLIDE was defined, which renders direct comparisons of diagnostic performance uninformative. At the very least, the DEFC provides another means of defining CLIDE or DE, with certain practical advantages over other methods, and against which the most common diagnostic tools perform with sensitivities and specificities comparable to previous definitions of CLIDE or DE.
In the current study, sensitivities tended to be slightly higher for our second DEFC-based diagnoses (No/Mild Symptoms vs. Debilitating Symptoms), in which subjects with mild symptoms that do not cause discomfort sufficient to interfere with activities are not distinguished from subjects with no symptoms whatsoever. This shows that CLIDE assessments tend to be better at identifying subjects with debilitating symptoms than they are at distinguishing symptom-free subjects from those with any level of CLIDE symptoms from severe to mild. Specificities for both DEFC-based diagnoses of CLIDE tended to be about the same, or slightly lower for the second diagnosis (but not significantly so). With the exception of OSDI, thresholds were higher (i.e., symptoms ratings higher, NITBUT shorter) for subjects with debilitating CLIDE symptoms.
There were some differences particular to the OSDI in diagnosing CLIDE as based on our DEFC categorization. For the first DEFC-based diagnosis of CLIDE (No Symptoms vs. Any Symptoms), the OSDI had low sensitivity (56%), meaning that the OSDI score did not identify a number of subjects with DEFC ! 2 as being CLIDE-positive. On the other hand, it had high specificity (81%), meaning that it is better at correctly identifying subjects with DEFC = 1. OSDI scores above the ROC threshold of 11.4 are probably characteristic mostly of those with CLIDE symptoms severe enough to cause significant discomfort [15] and possibly affect visual functioning in daily activities. Thus a number of subjects with only mild symptoms not affecting daily activities had OSDI scores that, while somewhat elevated over the truly CLIDE-negative subjects, did not exceed the ROC threshold and therefore were not diagnosed as CLIDEpositive. This is supported by the higher sensitivity (71%) of the OSDI for the second DEFCbased diagnosis of CLIDE (No/Mild Symptoms vs. Debilitating Symptoms), while still maintaining a specificity (70%) that is not significantly lower than for the first diagnosis. It is interesting to note that the ROC analysis-generated threshold for diagnosing CLIDE using the OSDI in our study population (! 11.4) is very close to the commonly accepted clinical threshold OSDI score (! 13) for diagnosing DE [67,68].
Our results illuminate a common difficulty in identifying optimal diagnostic criteria for multifaceted conditions such as CLIDE, arising from a common misconception. There are a number of published reports covering various clinical, laboratory and subjective assessments shown to be related to a CLIDE outcome. Often these variables are shown to be significant explanatory variables in multivariable association models of CLIDE, whether the outcome be presence/absence, risk, or severity of symptoms. However, a significant association with CLIDE does not necessarily make such an assessment a good diagnostic indicator [38]. It would seem natural to assume, from perusing the literature, that if certain questionnaire items or clinical assessments are consistently shown to be significantly related to CLIDE outcomes, they should therefore be routinely administered to ascertain a patient's CLIDE status; however, without directly examining the diagnostic performance of such assessments (i.e., sensitivity and specificity; determination of threshold values), there is no way of knowing whether or to what extent such assessments are useful as diagnostic indicators for CLIDE. Fig 5 depicts the potential pitfalls in using assessments that are significantly associated with CLIDE outcomes as diagnostic tools: while there is clear separation between CLIDE-negative (filled circles in the figure) and CLIDE-positive (open triangles) subjects, there are nevertheless a number of subjects with symptom severity and/or frequency above the optimum thresholds (as determined by ROC analysis) who did not self-classify as having CLIDE, as well as those with symptom ratings below the thresholds who do self-classify as CLIDE sufferers. These "false positives" and "false negatives" contribute to the diagnostic performance-the sensitivities and specificities-of the VAS rating scale symptom assessments.
It is important to understand that the ordinal categorical nature of the DEFC gives the instrument inherently good repeatability but relatively poor discrimination ability, or resolution. In other words, it would require a relatively large change in a patient's or subject's symptoms before there was a change in DEFC score, compared with, say, the 100-point VAS rating scale which has relatively poor repeatability but can record differences or changes in symptoms with much finer resolution. The uses for which the DEFC is designed make this appropriate: a clinician or researcher would not use the DEFC to determine whether there were fine between-subject differences in symptoms (e.g., that would show up on the VAS rating scale), but rather, would use the DEFC to determine whether differences in symptoms are sufficient to represent qualitatively different levels in terms of a patient's or subject's quality of life (e.g., dryness symptoms that do or do not cause significant ocular discomfort; symptoms that do or do not interfere with visual activities in daily life). While both types of instruments have a place in the clinical and research arenas, these distinctions highlight the need to employ the subjective response instrument that is most appropriate to the specific clinical or research application. In this study, we compared individual assessments of CLIDE used in current research and clinical practice to our new metric, the Berkeley DEFC score. In practice, most clinicians would consider a number of different assessments before identifying a subject as having CLIDE or some other DE-related condition [50]. It is possible that the best diagnosis of CLIDE consists of a suite of clinical and subjective measures, each with its own diagnostic threshold, that collectively would provide the highest sensitivity and specificity for a test of CLIDE. Although there has been some limited work done in this area [51,52], the collective performance of commonly employed suites of diagnostic tools, the identification of the best set of assessments for diagnosis of CLIDE or DE in different populations, and determination of the optimum diagnostic thresholds for those assessments have not been well-studied to date. Work is ongoing in this area.

Conclusions
This article has introduced the Berkeley Dry Eye Flow Chart, discussed its potential advantages as a CLIDE screening tool, demonstrated its association with several known CLIDE assessments, and established its repeatability. In addition, we have demonstrated that the 5-level ordinal categorization of CLIDE based on the DEFC corresponds directly to trends in VAS ratings of symptoms, OSDI scores, and pre-lens NITBUT, and we have established the optimum diagnostic thresholds for these commonly employed assessments in our population by ROC analysis. We believe that the DEFC can provide a fast, accurate, functional categorization of CLIDE that takes into account the extent to which symptoms affect visual activities in daily life, and that it could prove a useful addition to the set of tools available to researchers and clinicians working with symptomatic contact lens wearers.