The Accuracy of Diagnostic Methods for Diabetic Retinopathy: A Systematic Review and Meta-Analysis

Objective The objective of this study was to evaluate the accuracy of the recommended glycemic measures for diagnosing diabetic retinopathy. Methods We systematically searched MEDLINE, EMBASE, the Cochrane Library, and the Web of Science databases from inception to July 2015 for observational studies comparing the diagnostic accuracy of glycated hemoglobin (HbA1c), fasting plasma glucose (FPG), and 2-hour plasma glucose (2h-PG). Random effects models for the diagnostic odds ratio (dOR) value computed by Moses’ constant for a linear model and 95% CIs were used to calculate the accuracy of the test. Hierarchical summary receiver operating characteristic curves (HSROC) were used to summarize the overall test performance. Results Eleven published studies were included in the meta-analysis. The pooled dOR values for the diagnosis of retinopathy were 16.32 (95% CI 13.86–19.22) for HbA1c and 4.87 (95% CI 4.39–5.40) for FPG. The area under the HSROC was 0.837 (95% CI 0.781–0.892) for HbA1c and 0.735 (95% CI 0.657–0.813) for FPG. The 95% confidence region for the point that summarizes the overall test performance of the included studies occurs where the cut-offs ranged from 6.1% (43.2 mmol/mol) to 7.8% (61.7 mmol/mol) for HbA1c and from 7.8 to 9.3 mmol/L for FPG. In the four studies that provided information regarding 2h-PG, the pooled accuracy estimates for HbA1c were similar to those of 2h-PG; the overall performance for HbA1c was superior to that for FPG. Conclusions The three recommended tests for the diagnosis of type 2 diabetes in nonpregnant adults showed sufficient accuracy for their use in clinical settings, although the overall accuracy for the diagnosis of retinopathy was similar for HbA1c and 2h-PG, which were both more accurate than for FPG. Due to the variability and inconveniences of the glucose level-based methods, HbA1c appears to be the most appropriate method for the diagnosis diabetic retinopathy.


Introduction
In 1997, the Expert Committee on the Diagnosis and Classification of Diabetes Mellitus stated that the diagnosis of diabetes should focus simultaneously on plasma glucose concentrations and its long-term microvascular complications, particularly diabetic retinopathy [1]. In 2009, a report from the International Expert Committee (IEC) proposed glycated hemoglobin (HbA1c) as an appropriate test for diagnosing diabetes [2]. The American Diabetes Federation [3] and the World Health Organization [4] reinforced this recommendation and maintained that both fasting plasma glucose (FPG) and 2-hour plasma glucose (2h-PG) after a 75-g oral glucose tolerance test (OGTT) are appropriate tests for the diagnosis of diabetes in non-pregnant adults.
The variety of biomarkers for diagnosing diabetes poses a challenge for clinicians and health planners [5]. Clinicians should consider the advantages and disadvantages of using the biomarkers and decide which test, or which combination of tests in a pre-specified order, should be used for each type of patient [6]. The advantages of HbA1c are it is not modified by acute events, such as stress or vigorous physical exercise, and that it has greater pre-analytical stability and renders more reliable results than glucose-based tests. However, it has also been reported that HbA1c levels substantially depend on various non-glycemic factors, such as iron or vitamin B12 deficiency, renal failure, or variables related to the lifespan of red blood cells [7]. Moreover, neither the FPG nor the 2h-PG tests are influenced by individual susceptibility to the glycation of hemoglobin, genetic factors and individual characteristics [8], such as age or ethnicity. Furthermore, the costs of determining HbA1c are higher than those of FPG.
Diabetic retinopathy is an early diabetes-related complication that is a good criterion for comparing the diagnostic accuracy of diabetes biomarkers [1]. The DETECT-2 project, an international pool of nine studies from five countries, recently re-examined the relationship between glycemic measures and retinopathy. It was suggested that the current diabetes diagnostic level for FPG could be lowered from 7.0 to 6.5 mmol/L and that an HbA1c level of 6.5% (47.5 mmol/mol) is a suitable alternative diagnostic criterion [9]. The World Health Organization, based on the level above which the risk of developing micro-and macrovascular complications increases, has also recommended the use of 6.1 mmol/L as FPG cutoff point for the diagnosis of impaired fasting glucose; furthermore, the ADA recommended lowering this threshold from 6.1mmol/l to 5.6mmol/l [3,4]. However, to our knowledge, no previous study has comprehensively reviewed and compared the accuracy of the main glycemic measures to identify diabetes-specific retinopathy.
Thus, we conducted a systematic review and meta-analysis of the literature to evaluate the accuracy of HbA1c, FPG and 2h-PG for diagnosing diabetic retinopathy.

Literature search
A literature search was conducted in MEDLINE (via PubMed), EMBASE, the Cochrane Central Register of Controlled Trials, the Cochrane Database of Systematic Reviews and the Web of Science databases from their inception to July 17, 2015. Three comprehensive search themes were combined using Boolean operators: ["HbA1c" OR "glycated hemoglobin" OR "glycated hemoglobin" OR "hemoglobin A1c" OR "glucose" OR "fasting glucose"] AND ["threshold" OR "cut-off" OR "cut point" OR "sensitivity" OR "specificity" OR "diagnostic" OR "differential diagnosis"] AND ["microvascular complications" OR "retinopathy" OR "retinal"]. The reference lists of the retrieved articles were reviewed for additional studies. The literature search was performed independently by two reviewers (IC and CA), and inconsistencies were resolved via conference.

Selection criteria
We aimed to identify original articles analyzing the HbA1c, FPG and 2h-PG thresholds associated with an increased frequency of retinopathy. The following inclusion criteria were used: i) study participants were individuals aged 18 years; ii) index tests used were HbA1c, FPG and 2h-PG; iii) an outcome of diabetic retinopathy at any stage; and iv) study designs including cross-sectional, case-control, or cohort studies, with either prospective or retrospective data collection. The exclusion criteria were as follows: i) insufficient data to calculate sensitivity or specificity; ii) studies conducted only with diagnosed diabetic individuals; iii) studies conducted on gestational diabetes; and iv) studies written in a language other than English or Spanish. When multiple articles reported data from the same study, the most recent article was selected.

Data extraction and quality assessment
The following data were collected from each study were included in this review: 1) author identification, 2) year of publication, 3) country of the study, 4) year of data collection, 5) ophthalmic examination test, 6) age of the participants, 7) number of participants, 8) prevalence of retinopathy and 9) parameters summarizing the accuracy of the test (cut-off, sensitivity, specificity, area under curve (AUC) and the diagnostic odds ratio (dOR)).
We used the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool to evaluate four domains of each study: patient selection, index test, reference standard and flow of patients and timing of the tests. Each domain was evaluated in terms of the risk of bias, and the first 3 domains were also evaluated in terms of concerns regarding the applicability of the results [10].
Data extraction and quality assessment were independently performed by IC and CA, and inconsistencies were managed by consensus.

Statistical analysis and data synthesis
This study was reported according to the PRISMA [11] statement (Table A and Figure A in S1 File) and the recommendations of the Cochrane Collaboration Handbook [12]. The sensitivity, specificity, AUC and dOR as well as their corresponding 95% confidence intervals (CIs) were calculated for HbA1c, FPG and 2h-PG in each included study. Although the protocol of this meta-analysis specified that at least five studies were required in a subgroup to conduct the pooled estimations, a meta-analysis including only four studies is provided at (Table B in S1 File).
Hierarchical summary receiver operating characteristic curves (HSROC) were used to summarize the overall test performance. The HSROC have been proposed to estimate the performance of diagnostic tests on data from a meta-analysis, and the AUC is not only useful to evaluate not only the curve but also the strength of the heterogeneity [13]. To reach a threshold of excellent accuracy, the AUC must be in the region of 0.97 or higher. An AUC of 0.93 to 0.96 is very good and an AUC of 0.75 to 0.92 is good. An AUC less than 0.75 may be reasonable, but the test has evident shortcomings in its diagnostic accuracy [14]. When a study did not provide information about the AUC, it was calculated.
The dOR was computed using Moses' constant of a linear model, which indicates that this approach relies on the linear regression of the logarithm of the dOR of a study (dependent variable) and on an expression of the positivity threshold of that study (independent variable). The dOR is a measure of the accuracy of the test data that combines sensitivity and specificity into a single value. The dOR values range from 0 to infinity, with higher values indicating a better discriminatory test performance (higher accuracy). A dOR of 1.0 indicates that a test does not discriminate between patients with the disorder and those without it [15].
Forest plots were used to display the sensitivity, specificity, AUC and dOR for each glycemic parameter in the reviewed studies. The heterogeneity of the results across studies was evaluated using the I 2 statistical parameter. I 2 values of <25%, 25-50% and >50% usually correspond to small, medium and large heterogeneity, respectively [16]. Given that in most cases the heterogeneity was large, the results of the different studies were pooled using a random-effects model with the Der Simonian and Laird method.
The separate influence of each study in the pooled dOR was estimated by recalculating the pooled estimate after the exclusion of individual studies. Finally, publication bias was visually evaluated using a funnel plot, as well as with the method proposed by Deeks [17].

Baseline Characteristics
A total of 2,632 articles were retrieved from the literature search. After removing 552 duplicated articles, the titles and abstracts of 2,080 studies were screened. We excluded 2,028 studies that clearly did not fulfil all of the inclusion criteria or met at least one of the exclusion criteria, leaving 52 studies that were reviewed in full. Next, 41 of the studies were excluded following the full text reading (see study exclusion in References A in S1 File), and the remaining 11 articles were used for the final analysis (Fig 1) [ [18][19][20][21][22][23][24][25][26][27][28]. The 11 studies comprising this review included 45,686 participants. The studies were conducted in China, North America, Japan, Korea, India, Malaysia, France and Australia; one study was conducted among Pima Indians. The age of the participants ranged from 18 to 79 years. The retinopathy prevalence varied from 1.6% to 15.8% across the studies. All of the studies provided information on the global diabetic retinopathy prevalence, except one study that reported only moderate non-proliferative retinopathy [19]. All of the studies except for one, which also showed prospective data [28], had cross-sectional designs. Only four studies provided information regarding 2h-PG [19,22,27,28]. Finally, one study provided several cutoffs for FPG; however, we selected the internationally recommended cut-off of 7.0 mmol/L for this analysis (Table 1).

Study Quality
As evaluated with QUADAS-2, all of the studies included information regarding the seven quality items. However, the studies had shortcomings in two domains: the index test and the reasons for excluding participants. In fact, most studies interpreted their results without reference to a standard (HbA1c: 78%; FPG: 60%; and 2h-PG: 75%) and only considered the prespecified index test threshold (Table C and Figure A in S1 File).    Figure B, C, D and E in S1 File depict sensitivity, specificity, PLR and NLR funnel plots, respectively) The area under the HSROC (Fig 3) estimating the discriminating accuracy of HbA1c for identifying retinopathy was 0.837 (95% CI: 0.781-0.892; p < 0.001) and was 0.735 (95% CI: 0.657-0.813; p < 0.001) for FPG. The 95% confidence region for the point that summarized the overall test performance included studies in which the test cut-offs ranged from 6.1% (43.2 mmol/mol) to 7.8% (61.7 mmol/mol) for HbA1c and from 7.8 to 9.3 mmol/L for FPG.

Meta-analysis
When we estimated the pooled accuracy parameters from the four studies that evaluated the diagnostic performance of HbA1c, FPG and 2h-PG in the same sample, the pooled dOR was 34.68 (95% CI, 23.56-51.03; p < 0.001) for HbA1c, 24

Sensitivity analysis for the effect of individual studies
When the impact of individual studies was examined by removing studies from the analysis one at a time we observed that the pooled dOR estimation for HbA1c increases after removing data from the Cheng et al. [25]

Publication bias
The asymmetry test, using Deek's method [17], did not suggest the existence of publication bias either for HbA1c (intercept

Discussion
The most recent recommendations propose HbA1c as a good test for diagnosing diabetes in non-pregnant adults and also include FPG and 2h-PG as appropriate methods [3,4]. Thus, which of the recommended tests should be used remains controversial. In our meta-analysis of 11 studies, HbA1c performed better than FPG in identifying individuals with diabetic retinopathy. Moreover, our data indicate that the three glycemic tests have sufficient diagnostic  The Accuracy of Diagnostic Methods for Diabetic Retinopathy accuracy on diabetic retinopathy in clinical practice, supporting the current international recommendations.
Our meta-analysis of the four studies [19,22,27,28] that compared these three tests in the same set of patients showed that, overall, 2h-PG and HbA1c have similar accuracy estimates for diabetes retinopathy in terms of the dOR and AUC and are better than FPG. In recent decades, the 2h-PG after a 75-g oral glucose tolerance test (OGTT) has been the preferred test for confirming a diagnosis of diabetes in clinical practice, but because it is time-consuming and labor-intensive [29], both the FPG and HbA1c tests are considered good alternatives [2,4].
Although the pooled specificity in the meta-analysis of the 11 studies comparing HbA1c and FPG was similar, the pooled sensitivity for HbA1c was 2-fold higher than that for FPG, and the pooled dOR was almost 4-fold higher. Regarding the low sensitivity of FPG, the Diabetes Prevention Program [30] and NHANES [25] reported that 8% of individuals with a FPG below diabetic thresholds had retinopathy. Thus, using the recommended FPG cut-off of 7.0 mmol/L for the diagnosis of diabetes [2,3,4], a not negligible percentage of cases of diabetic retinopathy would be undiagnosed. Other advantages of HbA1c are that it can be measured in a non-fasting state and it has good pre-analytical stability and low day-to-day variability. However, HbA1c has some limitations: diabetes is defined by high blood glucose rather than by glycation of proteins and HbA1c does not reflect postprandial glycaemia [5].
Authors have questioned the use of diabetes retinopathy as the gold standard for the diagnosis of diabetes because no uniform glycemic threshold for the presence of retinopathy has been found across populations [26]. Moreover, most studies relating HbA1c to retinopathy have been cross-sectional and have not excluded individuals with diagnosed diabetes (even if treated with hypoglycemic drugs, and the reported thresholds were dependent on the statistical methods used, the definition of retinopathy, and factors influencing HbA1c levels, such as individual susceptibility to glycation and aging. However, currently, no other clinical diagnostic standard exists for diabetes. Meta-analyses of diagnostic tests synthetize the performance of a test providing a pooled estimation of diagnostic accuracy parameters, and also estimates a summary point (a summary sensitivity and specificity estimates) and a HSROC, but not allows the identification of the optimal cut-off point [31]. However, the cut-offs within the 95% confidence region for HbA1c ranged from 6.1% (43.2 mmol/mol) to 7.8% (61.7 mmol/mol) and from 7.8 to 9.3 mmol/L for FPG. These findings support the cut-offs proposed by the International Expert Committee for the diagnosis of diabetes using HbA1C, but not for FPG [2].
As is common in diagnostic meta-analyses, all of the estimations of the diagnostic accuracy were performed considering the large variability across individual studies. A substantial part of this variability is derived from a threshold effect due to the differences in the thresholds used to determine positivity in the tests. Factors influencing the threshold effect across the studies include the criteria for the diagnosis of retinopathy, the statistical methods used for defining cut-offs, and the assay methods used to measure diagnostic tests, particularly HbA1c. The wide clinical spectrum of patients included in the studies is also responsible for a substantial proportion of variability across the studies. While participants in some studies are a representative sample of the general population, other studies included selected samples with a known high prevalence of diabetes. Moreover, some studies removed individuals undergoing antidiabetic drug treatment from the analyses, and others accounted for potential modifiers, such as age or hypertension. In fact, the threshold effect and the wide spectrum of patients could explain the "shoulder arm" found in the HSROC graphics, which partially results from the inverse correlation between the sensitivity and specificity. Note that this correlation and the large variability in diagnostic accuracy across the studies support the use of HSROC because they explicitly addresses the relationship between sensitivity and specificity using the threshold [32] and account for inter-study heterogeneity.
In the sensitivity analysis we observed that the estimate of the pooled dOR decreases after removing Park et al study [20], because it involved a large and homogenous sample, and consequently higher estimates of sensitivity and specificity. After removing two other studies, the estimate of the pooled dOR increases owing to: the Cheng et al study [25] included mostly population at high risk for developing diabetes and considered a cut-off for diagnosing of retinopathy of 5.5% for HbA1c, and therefore provides high sensitivity and low specificity estimates; the Wong et al. study [26] reported low sensitivity estimates including three population-based samples, and excluded participants who had ungradable retinal photographs. A review that analyzed the potential sources of bias and variation in diagnostic accuracy studies, suggested that high variability in the characteristics of participants in the studies testing the accuracy of tests for diabetes retinopathy is significantly associated to lower accuracy estimates [33].
This review has several potential limitations, including publication bias and insufficient information from study reports. Although we found no clear evidence of significant publication bias, studies showing poor test performance might be less (or more) likely to be published. Furthermore, given the high variability in the study results and the fact that most studies used diagnostic cut-offs that differed from the international recommendations, our results must be interpreted with caution. Finally, to ensure that the results can be generalized, we included studies with both diabetic and non-diabetic participants. We expect that antidiabetic medications have the same effect on the HbA1c, FPG and 2h-PG levels; however, we cannot rule out the possibility of some differences associated with specific drugs or clinical settings.

Conclusion
The three recommended tests for the diagnosis of type 2 diabetes show sufficient accuracy for their use in clinical settings, although the overall accuracy for the diagnosis of retinopathy was slightly higher for HbA1c and 2h-PG than for FPG. Due to the variability and inconveniences of the glucose level-based methods, the HbA1c test might be the most appropriate method for the diagnosis of type 2 diabetes in nonpregnant adults. However, the appropriate use of this information requires an evaluation of the clinical context, specifically, whether the test will be used for screening or diagnosis, the availability of the test in underdeveloped countries and the costs.
Supporting Information S1 File. Table A in S1 File. PRISMA Guidelines Checklist. Table B in S1 File QUADAS-2 risk of bias assessment. U: unclear; Y: yes; N: no; L: low; H: high; HbA1c: glycated haemoglobin; FPG: fasting plasma glucose; 2h-PG: 2-hour plasma glucose. Table C in S1 File. Subgroup analysis of the four studies that included measurements of HbA1c, FPG and 2h-PG. Values in parentheses are 95% confidence intervals. FPG: fasting plasma glucose, PLR: positive likelihood ratio, NLR: negative likelihood ratio, dOR: diagnostic odds ratio, AUC: area under receiver operating characteristic curve. Figure A in S1 File. Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) criteria, for the reviewed studies. Figure B in S1 File. Forest plot of the sensitivity of each index test for diagnosing diabetes in the reviewed studies. CI: confidence interval; (a), (b) and (c) indicate different subgroups of participants in that study, as defined by setting (Table 1). Figure C in S1 File. Forest plot of the specificity of each index test for diagnosing diabetes in the reviewed studies. CI: confidence interval; (a), (b) and (c) indicate different subgroups of participants in that study, as defined by setting (Table 1). Figure D in S1 File. Forest plot of the positive likelihood ratio (PLR) of each index test for the diagnosis of diabetes in the reviewed studies. CI: confidence interval; (a), (b) and (c) indicate different subgroups of participants in that study, as defined by setting (Table 1). Figure E in S1 File. Forest plot of the negative likelihood ratio (NLR) of each index test for the diagnosis of diabetes in the reviewed studies. CI: confidence interval; (a), (b) and (c) indicate different subgroups of participants in that study, as defined by setting (Table 1). Figure F in S1 File. Assessment of potential bias due to including each study in the review, by index test. dOR: Diagnostic odds ratio; CI: confidence interval; (a), (b) and (c) indicate different subgroups of participants in that study, as defined by setting (Table 1). Figure G in S1 File. Funnel plot for the assessment of potential publication bias. ESS: Effective sample size. References A in S1 File. Studies excluded from the systematic review and meta-analyses and main reasons for their exclusion. (DOCX)