Suitability of UK Biobank Retinal Images for Automatic Analysis of Morphometric Properties of the Vasculature

Purpose To assess the suitability of retinal images held in the UK Biobank - the largest retinal data repository in a prospective population-based cohort - for computer assisted vascular morphometry, generating measures that are commonly investigated as candidate biomarkers of systemic disease. Methods Non-mydriatic fundus images from both eyes of 2,690 participants - people with a self-reported history of myocardial infarction (n=1,345) and a matched control group (n=1,345) - were analysed using VAMPIRE software. These images were drawn from those of 68,554 UK Biobank participants who underwent retinal imaging at recruitment. Four operators were trained in the use of the software to measure retinal vascular tortuosity and bifurcation geometry. Results Total operator time was approximately 360 hours (4 minutes per image). 2,252 (84%) of participants had at least one image of sufficient quality for the software to process, i.e. there was sufficient detection of retinal vessels in the image by the software to attempt the measurement of the target parameters. 1,604 (60%) of participants had an image of at least one eye that was adequately analysed by the software, i.e. the measurement protocol was successfully completed. Increasing age was associated with a reduced proportion of images that could be processed (p=0.0004) and analysed (p<0.0001). Cases exhibited more acute arteriolar branching angles (p=0.02) as well as lower arteriolar and venular tortuosity (p<0.0001). Conclusions A proportion of the retinal images in UK Biobank are of insufficient quality for automated analysis. However, the large size of the UK Biobank means that tens of thousands of images are available and suitable for computational analysis. Parametric information measured from the retinas of participants with suspected cardiovascular disease was significantly different to that measured from a matched control group.


Results
Total operator time was approximately 360 hours (4 minutes per image). 2,252 (84%) of participants had at least one image of sufficient quality for the software to process, i.e. there was sufficient detection of retinal vessels in the image by the software to attempt the measurement of the target parameters. 1,604 (60%) of participants had an image of at least one eye that was adequately analysed by the software, i.e. the measurement protocol was Introduction Around 20 million people in the UK alone suffer from at least one long term health condition, and over the next decade this is set to increase three-fold with the ageing population placing an enormous economic and human-resource strain on the health care system [1][2][3]. In the US, 7 out of 10 deaths each year are from chronic diseases [4]. The retina, through the analysis of blood vessels in fundus images, is a well-established non-invasive proxy of systemic non-retinal vascular disease [5][6][7], and detailed clinical observations of characteristic fundus features have led to the identification of early indicators of a diverse range of conditions, including diabetes mellitus, cardiovascular disease and neurological disorders [5,[8][9][10].
The UK Biobank includes the largest retinal data repository in a prospective populationbased cohort and represents an important and valuable resource, accessible to international researchers, for improving the prevention, diagnosis and treatment of a wide range of long-term conditions and chronic illness [11]. Considerable potential exists in using this dataset to discover biomarkers that will in the future enable systematic patient surveillance and the identification of high-risk patients as well as surrogate end points that will accelerate the discovery of new interventions.
We accessed fundus images held in the UK Biobank and subjected them to computational analysis using VAMPIRE (Vascular Assessment and Measurement Platform for Images of the REtina) software [12,13], which yields parametric information relating to the retinal vasculature that may be indicative of systemic disease [5,14]. Our objective was to assess the suitability of UK Biobank images for computational analysis and this paper presents our findings. Validation of relevant VAMPIRE modules has been reported elsewhere [12,13,15,16].

Methods
The UK Biobank recruited around half a million people aged between 40-69 years in the UK from 2006-2010. With informed consent, participants completed a detailed touchscreen questionnaire about their lifestyle, environment, medical and family history, underwent a range of physical measures and provided blood, urine and saliva samples for future analysis. They also agreed to long term follow-up of their health. A subset of 68,544 participants underwent retinal fundus camera imaging-a single non-mydriatic digital image was obtained from right and left eyes using a Topcon 3D OCT 1000 Mark 2 (45°field-of-view, centred to include both optic disc and macula). Colour images were stored in PNG format with dimensions 2048 × 1536 pixels. The UK Biobank obtained approval for all its data collection procedures from its governing Research Ethics Committee.
We accessed fundus images from 2,690 participants-people with a self-reported history of myocardial infarction (n = 1,345), and a control group without such a history (n = 1,345), matched on assessment centre location, age and sex. This yielded 5,380 images for analysis with VAMPIRE version 2.0 (Universities of Edinburgh and Dundee, UK). The selection criterion for cases was chosen in order to harvest a participant group with a long-term condition (i.e. cardiovascular disease) that is widely considered to have retinal manifestations [5]. Our access to the resource was checked and approved by the UK Biobank as being consistent with their Access Procedures and Ethics and Governance Framework [17,18]. Our research adhered to the tenets of the Declaration of Helsinki.
A team of four operators, consisting of a consultant level ophthalmologist (author Cameron) and three medical students (authors El-Medany, Mulholland and Sheng), was trained to use the VAMPIRE software with clear protocols to quantify retinal blood vessel parameterstortuosity, bifurcation branching angle (BA) and bifurcation coefficient (BC)-that might reveal abnormalities in the vasculature that are considered indicative of disease processes and microvascular damage [5,14]. Training for each operator consisted of two sessions lasting approximately four hours in total: an introductory phase where the protocols and software were presented and familiarity with them gained through practice on a demonstration image set (n = 20); and an assessment phase where competency in operation was assessed on a testing image set (n = 20). Fundus images were divided amongst the four operators based on their availability for conducting analysis-Cameron (40%), El-Medany (4%), Mulholland (40%) and Sheng (16%). Software ran on UK Biobank hardware, accessed via a remote Windows desktop session.
The correct procedure for operating the software was as follows. First, the boundaries of the optic disc (OD) in a retinal image were detected [15] and a conventional OD-centred circular measurement zone established, stretching from the OD boundary to twice the OD diameter. Next, the software attempted to detect the retinal blood vessels present in the image and locate bifurcations. If VAMPIRE was able to detect the vasculature within the measurement zone the image was classed as processable; otherwise it was deemed non-processable.
For processable images, the operator was asked to identify, visually, the 2 thickest arterioles and 2 thickest venules in the measurement zone. The operator then used the software to select the path for a vessel as it extends from the edge of the OD to the outer edge of the zone for the calculation of tortuosity [16]. Taking the mean of these 2 measurements gave the summarised arteriolar and venular tortuosity values, respectively for an image. The operator was further asked to identify 3-5 arteriolar and 3-5 venular bifurcations per image from the automatically detected candidates located within the measurement zone, with the software returning BA and BC for each one. Due to the macula centred nature of the images, measuring 3-5 bifurcations was a realistic sampling target to realise. The software calculated the median for each set of measurements in order to summarise into arteriolar and venular parameters for an image. Successful completion of the measurement protocol categorised the image as analysable, i.e. there was sufficient detection of retinal vessels in the image to enable the measurement of the target parameters; otherwise it was recorded as non-analysable.
A chi-square test was used to compare the proportions of processable and analysable images by sex, age (in groups 40-50, 51-60 and 61-70 years), and case-control status. A logistic regression model was used to calculate odds ratio (OR) between female and male, age groups, and cases and controls. Inter-operator variability was assessed for the measured retinal parameters using images from 40 participants selected at random and by calculating the intraclass correlation coefficient (ICC) using a two-way mixed, consistency agreement, average measure model.
We further evaluated our measurement protocols by calculating the correlation of retinal parameters between right and left eyes for participants with analysable images for both eyes using Spearman's correlation coefficient. This assumes that in a (non-ocular disease) population an effective measurement strategy for characterising a particular retinal feature will produce a high inter-ocular correlation [19]. Finally, a logistic regression model was used to estimate associations of retinal measures with cases versus controls. ORs were calculated for an increase of 1 SD of VAMPIRE measures and adjusted for age and sex. Values of p < 0.05 (2-tailed) were considered statistically significant. Statistical analyses were conducted using SAS software version 9.3 (SAS Institute, Cary NC) and SPSS version 19 (IBM Corp, Chicago IL, USA). Table 1 shows the mean age and sex distribution of our study population, of all the UK Biobank participants who underwent fundus imaging, and of the entire UK Biobank population. The UK Biobank participants who had fundus imaging had almost identical mean age (57 years) and sex distribution (54% female) to the entire UK Biobank population, while participants in the case-control dataset used in this study were slightly older (mean age 61 years) and a lower proportion were female (20%).

Results
Total operator time for using VAMPIRE on the study population images came to approximately 360 hours or around 4 minutes per image, with an additional 16 hours of training time (i.e. 4 hours per operator). Computational experiences are summarised in Table 2 which indicates that from the 2,690 UK Biobank participants that we assessed for our study, 2,252 people (84%) had a processable fundus image for at least one eye while 1,817 people (68%) had processable images for both eyes. 1,604 (60%) participants had an analysable fundus image of at least one eye, while 957 (36%) had analysable images for both eyes.
Further examination by participant sex, age group and case-control status is presented in Table 3. A higher proportion of female than male participants had at least one processable image (88% versus 83%), but the proportion of participants with analysable images did not differ by sex. Increasing age was associated with a reduced proportion of participants with processable and with analysable images (e.g., 93% processable and 78% analysable in those aged 40-50 years; 82% processable and 56% analysable in those aged 61-70 years). However, casecontrol status was not associated with either processablity or analysability, suggesting no major influence on image quality of cardiovascular disease or vascular risk factors such as smoking, hypertension and diabetes (which would be more common among cases than controls). Analysis of inter-operator variability is presented in Table 4. From the images of 40 participants selected at random, 12 could not be analysed due to poor image quality. For the remaining 28 images there was very good agreement between operators for measurements of BC and BA (ICC > 0.8). For measurements of tortuosity the level of agreement was lower (<0.3 ICC < 0.5). Table 5 shows the correlations of retinal parameters between right and left eyes for participants with analysable images for both eyes, regardless of case-control status. Positive but low correlations (< 0.25) were present and significant ( 0.05) for all parameters. Associations of retinal vascular measures for cases and controls are reported in Table 6 (and S1 Table). We found narrower arteriolar BA (p = 0.02) and also lower arteriolar and venular tortuosity

Discussion
The proportion of UK Biobank participants in this study who had at least one image of sufficient quality for automated processing (84%) is similar to the proportion of participants reported to have retinal images that were gradable for retinal vascular parameters in other prospective, population-based studies conducting retinal photography (82-91%) [20][21][22][23]. However, UK Biobank has the advantages that the number of participants with fundus imaging is substantially larger than in any previous prospective study, and that the retinal measures derived can be studied in association with a very wide range of other data of unparalleled depth and breadth from the baseline assessment, subsequent enhancements and follow-up. Operator experience during this study suggested that non-processable images were distributed amongst four categories: around 80% were completely or largely blurred or obscured, hence unusable for computational analysis with any software; around 15% were hazy or blurred, which again precludes them from any computational analysis, though the images may still be suitable for visual grading of certain retinal characteristics such as lesions of diabetic retinopathy; only around 5% showed the retinal vasculature and hence might conceivably be processable and analysable with either VAMPIRE (after future refinements to the software) or other software. Our study sample had a higher proportion of males and a higher mean age than the whole UK Biobank population who underwent retinal examination. Since male sex was associated with reduced image processability, and increasing age was associated with both reduced processability and reduced analysability, a higher proportion of all 68,544 participants with retinal imaging in UK Biobank are likely to have images which are processable and analysable with VAMPIRE software than we report here for our study sample.
We found that a substantially smaller proportion of images could be analysed than were processable. From our experiences with automated analysis, signs of retinopathy (e.g. microaneurysms, exudates, and haemorrhages) do not interfere with measuring retinal vessel parameters using VAMPIRE. However, the macula-centred nature of the UK Biobank fundus images affects software analysis of vascular parameters: as the macula is an avascular region of the retina, most of the visible vasculature appears concentrated towards the side of the image containing the OD. So opportunities to measure retinal vascular parameters such as bifurcation geometry and tortuosity can be limited, contributing to the lower proportion of analysable compared with processable images. This issue will affect the performance of other vascular analysis software tools as well as the quantification of other vascular parameters, particularly vessel diameters, which are commonly measured in the vicinity around the OD [24][25][26][27].
A key aim for future work will be to reduce the proportion of processable but non-analysable images without compromising measurement validity or reliability. This will require further careful study of the main determinants of analysability. For example, our experience suggests that some images are processable but not analysable with VAMPIRE as a result of insufficient or incorrect bifurcation detection, i.e. either the software does not detect a bifurcation or it incorrectly labels the parent vessel and the measurement has to be discarded. Ongoing software refinements will address these issues through improved identification and evaluation of bifurcations and with a manual override option for the operator to correct software detection errors, thus enabling a higher proportion of analysable images in the future.
Similar protocols have been reported by other authors in computer-assisted studies on retinal biomarkers measuring in similar zones. For instance in [28], 7 arteriolar and 7 venular vessels were selected for tortuosity analysis along with 5 arteriolar and 5 venular bifurcations. In line with our results, arteriolar tortuosity was significantly lower among ischaemic heart disease cases (n = 124) compared to controls, although no difference was observed in arteriolar BA. To assess retinopathy of prematurity [29], tortuosity was quantified using 4 arteriolar and 4 venular vessels, i.e. 2 major vessels in each of 4 quadrants wherever possible. In [30], a protocol choosing only the most significant arterioles originating from the OD and that crossed a significant part of the fundus was proposed, giving 2-3 vessels per image. In an investigation into the relationship of retinal vascular tortuosity with age, blood pressure, and other cardiovascular risk factors, 6 large arteriolar segments and 6 large venular segments were considered [31]. Less tortuous arterioles were independently associated with older age, higher blood pressure and higher body mass index. To study cardiovascular risk factors and retinal arteriolar tortuosity in a multi-ethnic child population, the protocol for tortuosity involved 5-8 arteriolar and 5-8 venular segments [32]. Finally, approximately 15 bifurcations (an assortment of arteriolar and venular) were measured per image in patients with diabetes [33].
Inter-operator agreement ranged from excellent for bifurcations to fair and poor for tortuosity (arteriolar and venular, respectively) in our tests (n = 28). One possible explanation for this inconsistency is the difference in the number of samples required to calculate each parameter, i.e. 2 for tortuosity and 3-5 for bifurcations. With tortuosity there exists more opportunity for the operators to select different vessels while for bifurcations the requirement for a larger sample means they are more likely to choose the same ones as each other. Also, we have not tested intra-operator agreement in this study and so our results should be interpreted with a degree of caution.
While it is common to consider measurement reliability through repeat analysis on subsets of images, reporting the correlation between right and left eyes to show efficacy of a measurement protocol in characterising the retinal vasculature is rare. Although our correlations were significant, they were also low which suggests that further refinement of our protocols is required in order to improve their effectiveness. This may yet increase the utility of the retinal vascular parameters and warrant their further investigation as candidate biomarkers of systemic disease.
The average operator time per image suggests that fundus images for all 68,544 participants could be processed and analysed in around 18 months by a 4-strong team or 9 months by 8 operators, under reasonable assumptions (42 working weeks per year per operator, with 5 working days per week and 7 working hours per day). However, operator time required to analyse large image sets is anticipated to decrease as the software becomes increasingly automated in future versions (for example, computerised classification of vessels as arterioles or venules, which will lessen manual decision making and accelerate the analysis of retinal parameters).
As an alternative to automated image analysis for handling large numbers of retinal images, the process of outsourcing (or 'crowdsourcing') specific retinal image analysis tasks to an online community of non-experts has been explored recently in the context of lesion grading [34], but further studies are necessary to assess the reliability of crowdsourcing and to identify for which tasks it is most suited. The approach seems unlikely to be suitable for annotating morphometric properties of the vasculature, which is a very different task from lesion grading.
As a final point, despite the fact that a sizeable proportion of images from among those initially accessed were unsuitable for computational analysis, the number of images potentially available for this type of analysis from the total collection of UK Biobank retinal images is still of substantial size, and sufficient to support statistically well-powered studies featuring retinal parameters.
Supporting Information S1 Table. Association of VAMPIRE measurement between cases and control in left eye, right eye and both eyes. (DOCX)