Multi-ancestry GWAS analysis identifies two novel loci associated with diabetic eye disease and highlights APOL1 as a high risk locus in patients with diabetic macular edema

Diabetic retinopathy (DR) is a common complication of diabetes. Approximately 20% of DR patients have diabetic macular edema (DME) characterized by fluid leakage into the retina. There is a genetic component to DR and DME risk, but few replicable loci. Because not all DR cases have DME, we focused on DME to increase power, and conducted a multi-ancestry GWAS to assess DME risk in a total of 1,502 DME patients and 5,603 non-DME controls in discovery and replication datasets. Two loci reached GWAS significance (p<5x10-8). The strongest association was rs2239785, (K150E) in APOL1. The second finding was rs10402468, which co-localized to PLVAP and ANKLE1 in vascular / endothelium tissues. We conducted multiple sensitivity analyses to establish that the associations were specific to DME status and did not reflect diabetes status or other diabetic complications. Here we report two novel loci for risk of DME which replicated in multiple clinical trial and biobank derived datasets. One of these loci, containing the gene APOL1, is a risk factor in African American DME and DKD patients, indicating that this locus plays a broader role in diabetic complications for multiple ancestries. Trial Registration: NCT00473330, NCT00473382, NCT03622580, NCT03622593, NCT04108156.


Introduction
Diabetic retinopathy (DR) is a common complication of diabetes and one of the leading causes of blindness worldwide [1].Approximately 20% of DR patients have diabetic macular edema (DME) characterized by fluid leakage into the retina [2].DME can occur at any point in the timeline of DR, but is more likely to occur later in the disease.Diabetic patients with African American ancestry have the greatest risk of developing DME of all populations in the United States, estimated at three times greater than those of European-descent [3].Risk factors include improper glycemic control and duration of diabetes, but these do not explain all of the heterogeneity in DME risk [4][5][6].While it is accepted there is a genetic component to DR and DME risk, to date, large-scale genome-wide association studies (GWAS) have not yielded robust insights [7].Most well powered studies have been in DR, and have been well summarized [8].DR is an inherently heterogeneous disease marker, and one suggestion cited for identification of replicable loci identified via GWAS was harmonized patient phenotyping [9].As an example, one recent study containing DR, DME and proliferative diabetic retinopathy cases in a multi-ethnic study population of 43,565 was unable to find loci at genome-wide significance [10].Because not all DR cases have DME, focusing on that subtype alone could increase power to find novel loci associated with the risk of disease.Furthermore, the increased prevalence of DR and DME in African American individuals indicates there may be genetic risk factors specific to that population.To that end, we performed the largest genetic study of DME to date (a total of 1,502 cases and 5,603 controls) identifying two loci reaching genome-wide significance for risk of DME, one of which contains APOL1 and is estimated to account for approximately 5% of the risk of developing DME in African American patients with diabetes.

Study design and populations
We performed a genome wide association study study using DNA derived from blood samples obtained from DME patients from clinical trials for ranibizumab (NCT00473330 [RISE] and NCT00473382 [RIDE]), faricimab (NCT03622580 [YOSEMITE] and NCT03622593 [RHINE]), and the port delivery system with ranibizumab (NCT04108156 [PAGODA]) (S1 Table ).These study populations were selected for inclusion on the basis of available phenotypic information and DNA availability for whole-genome sequencing (all but PAGODA) and array genotyping (PAGODA).Patients were sequenced and genotyped between 2017-2021.
Samples and data for controls in the discovery study population (cases from the ranibizumab (RISE and RIDE) and faricimab (YOSEMITE and RHINE) studies) were obtained from clinical trial studies of asthma, autoimmune disease, Crohn's disease, chronic obstructive pulmonary disease, inflammatory bowel disease, interstitial lung disease, idiopathic pulmonary fibrosis, rheumatic arthritis, systemic lupus erythematosus and ulcerative colitis.An independent set of controls for the replication study population (cases from the port delivery study with ranibizumab-PAGODA) were selected from clinical trial studies of cancer and asthma, in addition to healthy controls from a cohort study examining Age-Related Macular Degeneration in subjects with African ancestry (DIVERSITY).
All patients included in this study provided written informed consent for whole-genome sequencing or array genotyping of their DNA.Ethical approval was provided as per the original clinical trials.

DNA analysis
The WGS data for the discovery study population was generated to a read depth of 30X using the HiSeq platform (Illumina X10, San Diego, CA, USA) processed using the Burrows-Wheeler Aligner (BWA) / Genome Analysis Toolkit (GATK) best practices pipeline.WGS short reads were mapped to hg38 / GRCh38 (GCA_000001405.15),including alternate assemblies, using BWA version 0.7.9a-r786 to generate BAM files.All WGS data was subject to quality control and checked for concordance with SNP fingerprint data collected before sequencing.After filtering for genotypes with a GATK genotype quality greater than 90, samples with heterozygote concordance with SNP chip data of less than 75% were removed.Sample contamination was determined with VerifyBamID software, and samples with a freemix parameter of more than 0.03 were excluded.Joint variant calling was done using the GATK best practices joint genotyping pipeline to generate a single variant call format (VCF) file.The called variants were then processed using ASDPEx to filter out spurious variant calls in the alternate regions.
Data in the replication study population was generated using the Infinium Global Screening Array-24 Kit v2 and v3 from Illumina.Genotypes with GenCall (GC) score less than 0.15 were set to missing.

Quality control-Patient DNA samples
Ancestry was assigned using an ancestry threshold of 0.7 obtained from supervised analysis with ADMIXUTRE 1.For the discovery study population, MatchIt [11] was used to do a 4:1 (control:case) ratio using the nearest neighbor matching on the propensity score to select controls based on age, sex and global ancestry.
For both the discovery and replication study populations, samples were excluded if the call rate was less than 90%.Identity by descent analysis was used to detect and filter out relatedness in the dataset; samples were excluded if PI_HAT was 0.4 or higher.Samples were removed if they showed excess heterozygosity with more than three SDs of the mean.This resulted in 448 in the AFR study population (99 cases and 349 controls), 343 in the AMR study population (87 cases and 347 controls), 439 in the EAS study population (88 cases and 351 controls) and 4,577 in the EUR study population (921 cases and 3,656 controls) for the discovery dataset with a combined total of 1,195 DME cases and 4,703 non-DME controls.Similarly, after quality control there are 976 patient samples in the EUR replication study population (262 cases and 714 controls) and 231 subjects in the AFR replication study population (45 cases and 186 controls) for a combined total of 307 DME cases and 900 non-DME controls.

Quality control-genotype
For the replication study population, multi-allelic sites were split into biallelic records, variants were left-aligned and normalized, and duplicate variants based on chromosome, position, ref allele, and alt allele were removed prioritizing by missingness rate.Additional quality control was performed by removing variants that had a target majority ancestry alternate allele frequency (AF) difference exceeding 0.2 compared to the reference, as well as removing ambiguous (A/T, G/C) variants where target majority ancestry minor allele frequency (MAF) exceeded 0.4 in either the target or reference.Phasing and imputation were carried out using Beagle 5.1 with hg38 HapMap genetic maps and a reference panel consisting of merged Genentech internal whole-genome sequencing and Haplotype Reference Consortium (HRC) genotypes.This merged panel consists of 134,246,541 bi-allelic sites in 51,455 (chr1), 55,929 (chr2-22), and 54,963 (chrX) samples.The default sliding window length of 40 cM with 4 cM overlap between adjacent sliding windows was used.Imputed VCFs were annotated with rsIDs from dbSNP154.The estimated squared correlation between the estimated allele dose and the true allele dose (DR2) was used to filter out poorly imputed variants by setting a DR2 threshold of 0.3.
Sample genotypes (discovery WGS only) were set to missing if the Genotype Quality score was less than 20 and SNPs were removed if the missingness was higher than 5%.SNPs were filtered if the significance level for the Hardy-Weinberg equilibrium test was less than 5x10 -8 .The allele depth balance test was performed to test for equal allele depth at heterozygote carriers using a binomial test (discovery WGS only); SNPs were excluded if the p-value was less than 1x10 -5 .

Statistical analysis
A common variant (MAF> = 1%) genome-wide association study (GWAS) was conducted to assess DME risk in the AFR, AMR, EAS and EUR study populations separately.PLINK [12] was used to perform logistic regression two-sided test using an additive model, adjusted for age, sex and genetic ancestry.The first three PCAs were used in as covariates for genetic ancestry and their cumulative proportion of variance was >90%.METAL [13] was subsequently used to do a meta-analysis, under a mixed effects model.There were 7,601,242 variants that passed QC in one or more of the cohorts.The ancestry-specific cohorts were QC'd and analyzed separately, rather than filtering on MAF in the combined cohort.For the meta-analysis, variants were included if they were present in two or more of the ancestry groups.
LDSC software was used separately on EUR and AFR GWAS summary stats to calculate heritability [14].The parp function in the CRAN R library twoxtwo was used to calculate the population attributable risk proportion in EUR and AFR study populations combined and separately [15].
Samples from other disease areas were used as controls.We used a genotype-on-phenotype reverse regression to test for association with the non-DME controls.We calculated the posterior probability (PP) that a SNP is also associated with each individual disease area included in the control cohort; SNPs passing the 0.6 PP threshold in any of the control cohort disease areas were flagged and excluded from the GWAS results.

FinnGen ethics statement and methods
A second replication population consisted of patients enrolled in the FinnGen Biobank cohort.Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act.Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health.Recruitment protocols followed the biobank protocols approved by Fimea.The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017.

Colocalization
We utilized colocalization analysis between eQTL data from GTEx, eyeQTL, and Sun et al pqtls to identify putative causal genes in the region [16][17][18].The coloc library (CRAN R) uses approximate Bayes factors to estimate posterior probabilities (PP) for common variants in the GWAS study as well as the QTL study.A PP4 estimates the probability that a variant is associated with both traits and a PP4 threshold of 0.8 was used as evidence for colocalization.Variants within +/-250 bp of the most significant SNP in the locus were included.GTEx tissues relevant to complications of diabetes were included: kidney and vascular / endothelium (artery aorta, artery coronary and artery tibial).EyeQTL included the following tissues: RPE, retina, RPE macula, RPE non-macula, retina macula, and retina non-macula.

Results
We conducted a multi-ancestry GWAS to identify risk alleles associated with DME.Discovery population cases (N = 1,195) comprised patients from clinical trials for DME involving ranibizumab (NCT00473330 [RISE N = 126], NCT00473382 [RIDE N = 143]) and faricimab (NCT03622580 [YOSEMITE N = 511] and NCT03622593 [RHINE N = 415]) who consented for genetic analysis.Inclusion criteria for all studies allowed for both T1D and T2D patients.Based on self-reported data, from YOSEMITE and RHINE, we estimate 6% of participants have T1D and 94% have T2D.We obtained controls (N = 4,703) from participants in nonophthalmic clinical trials that were sequenced and processed with the same informatics pipeline; this ensures we minimize batch effects that may be introduced via differing sequencing technologies and bioinformatics processing pipelines (see methods for details).We grouped DME cases and non-DME controls by genetic ancestry using defined populations in the 1000 Genomes Project.Therefore, population subsets in this study will be referred to by the 1000 Genomes super-population codes (see methods for details).We performed case and control analyses for each of the four super-populations (AFR-African, EUR-European, EAS-East Asian and AMR-AdMixed American), controlling for age, sex and genetically defined ancestry, combining results via meta-analysis.Due to the use of individuals with other, non-ophthalmic diseases as controls, we used a genotype-on-phenotype reverse regression to remove non-DME specific findings, which have been removed from the presented analysis [19].We sought to replicate any genome-wide significant signals in a sample of 307 multi-ancestry DME cases from a clinical trial evaluating ranibizumab in the Port Delivery System (NCT04108156 [PAGODA]) and 900 non-DME controls obtained in a similar fashion to the discovery dataset.Case and control demographics for the discovery and replication populations are available in Table 1.

GWAS identifies loci on chr 22q12 and chr19p13 as associated with risk of DME
There was no evidence of genomic inflation in this meta-analysis (λ gc = 0.98 S2 Fig) .We estimated the SNP heritability (h 2 ) in the largest two super-populations, EUR and AFR, to be 0.41 and 0.48 respectively.
After genome-wide removal of non-DME specific loci via reverse regression, we found two novel loci at chr22q12.

Chr22q12 -APOL1
The strongest signal of association in the discovery study population was at chr22q12.3(S2 Table ).The top SNP at this locus, rs2239785, results in a missense variant (K150E) in APOL1, encoding apolipoprotein L1 (G allele-OR meta = 1.71,P meta = 5.05x10 -23 ).The risk (G) allele encodes the E150 allele of the peptide.Previous investigation has shown the E150 allele is important, since in the context of I228 and K255 (the AFR "EIK" haplotype) it increases APO-L1-induced cytotoxicity in HEK293 cells compared to K150 (KIK) expressed at similar levels, whereas in the context of the refseq haplotype (EMR; E150/M228/R255), there is almost no cytotoxicity [20].Furthermore, in Africans the minor K150 allele was shown to be protective for diabetic, non-diabetic and all-cause end-stage kidney disease in a dominant model (ORs 0.76-0.8),implying the major (E150) allele exacerbates kidney disease (presumably in the context of I228 and K255 since this is the major African haplotype) [21].However, to our knowledge, this is from a single study and more investigation is warranted.The direction of effect was consistent across all four super-populations (OR eur = 1.73,P eur = 4.07x10 -20 ; OR afr = 1.84,P afr = 0.002; OR amr = 1.43,P amr = 0.14; OR eas = 1.43,P eas = 0.12) (Fig 4).The frequency of this allele was highest in AFR DME patients (EAF afr = 0.76), greater than twice that of EUR DME patients (EAF eur = 0.31).We replicated the association at rs2239785 in an independent multi-ancestry DME case (PAGODA) and control study sample (OR meta = 1.41,P meta = 0.003; OR eur = 1.31,P eur = 0.03; OR afr = 2.06, P afr = 0.02), with a combined sample OR = 1.65 and P = 2.0x10 -24 (Table 3).
To further investigate this association, we performed a PheWAS for rs2239785 in the Finn-Gen biobank study population (Freeze 7, https://r7.finngen.fi).FinnGen is a large biobank derived study where case assignment is based on physician coding, and cases were compared to the reminder of FinnGen study participants.In FinnGen, diabetic maculopathy was the only coded indication specific to diabetic complications of the macula, the area of the retina impacted in DME.For the index SNP at the APOL1 locus found in this GWAS (rs2239785), we saw an association with diabetic maculopathy with the same direction of effect as the primary DME GWAS (OR = 1.17,P = 5.5x10 -5 , MAF cases = 0.18, MAF controls = 0.16; N = 2,790 cases, 284,826 controls).This was the second top association in all coded phenotypes for this variant, with the first being intrahepatic cholestasis of pregnancy (S3 Table ).Finally we queried the summary statistics for a recent DME GWAS in Australian patients of EUR ancestry https://doi.org/10.1371/journal.pgen.1010609.g001[22].While rs2239785 was not in the available data, we find a perfect proxy in rs136175 (r 2 = 1.0,D' = 1.0 with rs2239785 in EUR) has a similar direction of effect (OR = 1.32,P = 0.09; N = 270 cases, 435 controls).When combining the aforementioned studies in a meta-analysis for this locus, we see OR meta = 1.30,P meta = 1.30x10 -20 .Therefore, we are able to replicate the observed association between DME (or closely related phenotypes) and rs2239785, an amino acid changing SNP in APOL1, in three further studies.
Conditional analysis did not identify secondary signals at the locus in EUR; however SNP rs6518994 retained significance in the AFR sample after conditioning on rs2239785 in the primary GWAS dataset (original OR = 1.92,P = 2x10 -4 ; conditioned OR = 1.74,P = 0.002; r 2 with rs2239785 = 0.04).Further conditional analysis incorporating rs6518994 did not reveal any novel signals with P<0.01.
Due to the use of non-DME controls in this study for whom diabetes was not ascertained, it was important to establish whether the association we observed was driven diabetes status, rather than DME itself.We employed multiple approaches to exclude this.First, in the Finn-Gen PheWAS, we did not observe association with rs2239785 and T1D or T2D status (both P>0.53).Furthermore, we queried recent highly powered GWAS for type-1 (T1D) [23] and a multi-ancestry type-2 (T2D) diabetes finding no association for rs2239785 in either (both P>0.4).Further investigation in the EUR and AFR patient subsets of the T2D multi-ancestry study did not alter this (both P>0.5).Thus, we find this association is not driven by diabetes predisposition itself.Multiple variants in APOL1, including rs2239785, have been associated in a recessive model with diabetic and non-diabetic nephropathy in African American individuals [21,[24][25][26][27][28].These findings identified an AFR specific haplotype tagged by two SNPs (rs73885319, encoding S342G, and rs60910145, encoding I284M (G1)) and a 6-bp deletion (rs75781853 (G2)) associated with disease risk [24].G1 and G2, found in AFR only, are not strongly correlated (r 2 <0.18) but reside on the same haplotype block (D' = 1.0) with rs2239785.Due to this haplotypic structure, it is difficult to disentangle the effect of rs2239785, G1 and G2 in AFR, and raises the possibility that the DME association we observed could be confounded to some degree by nephropathy.The G1 and G2 APOL1 variants are not found in individuals with European ancestry in our cohort (S4 Table ).As such, the European population enables investigation of rs2239785 (K150E) on its own, and whether this SNP affects kidney disease.
First, we see repeated evidence of association between rs2239785 and risk of DME in individuals of European descent: the RISE, RIDE, YOSEMITE and RHINE trials in the discovery dataset, the PAGODA trial in the replication dataset and the FinnGen BioBank dataset.Second, in FinnGen, despite the association between rs2239785 and diabetic maculopathy, we found no evidence of association between rs2239785 and renal dysfunction in T1D or T2D patients (both P>0.37).Furthermore, we were unable to find an association between this SNP and DKD in recent GWAS conducted in European ancestry patients with either T1D or T2D (both P>0.71) [29,30].Additionally, we saw no association in a large meta-analysis for CKD and kidney function (BUN, eGFR) in European patients (all P>0.22) [31].While renal failure or anticipated need for dialysis were exclusion criteria in the DME clinical trials from which these cases were obtained, a DKD diagnosis was not.There were kidney measurements for 912 of the DME patients in the YOSEMITE and RHINE trials.As a final test, we found that removing 290 patients from the 912 of whom had albumin to creatinine ratio (S5  nitrogen (BUN) or protein levels outside of the ideal range had no effect on the association with DME in the remaining 622 patients with normal kidney function (OR meta = 1.79,P meta = 7.91x10 -17 ; OR eur = 1.89,P eur = 1.25x10 -16 ; OR afr = 1.76,P afr = 0.03).
These data are consistent with a model whereby the G1 and G2 APOL1 variants drive DKD susceptibility in individuals of African descent, but that rs2239785 (or another allele with which it is in linkage disequilibrium) represents an independent, and distinct, association that contributes directly to DME predisposition across ancestries.
To investigate the possibility that this locus could be associated with diabetes status rather than DME we performed similar analyses as described for the locus at chr22.First, in the Finn-Gen PheWAS, we did not observe an association with T1D or T2D (both P>0.42).Second, there was no association for rs10402468 in large T1D [23] or T2D [33] GWAS studies (both P>0.3).Further investigation in T2D in the EUR patient subset of the multi-ancestry study did not alter this (P>0.4).Thus, we find this association is specific to DME, and not due to diabetes status.

Population attributable fraction at the APOL1 locus in AFR
The increased effect allele frequency of rs2239785 in AFR indicates a greater overall burden in this patient population.As such, we calculated the fraction of DME risk in patients with diabetes that is attributable to the APOL1 locus (Table 4).DME prevalence in United States patients with diabetes for non-Hispanic white and non-Hispanic black individuals is estimated at 2.6% and 8.4% respectively [3].We estimate that rs2239785 accounts for 5.2% of the risk of developing DME in non-Hispanic black patients with diabetes compared to 1.9% in non-Hispanic white patients.As previously mentioned, the G1 and G2 variants are only seen in AFR individuals and are on the same haplotype as rs2239785.We hypothesized that the population attributable fraction for rs2239785 in AFR who do not possess either the G1 or G2 haplotypes would be similar to that of EUR.Looking at this subsample in AFR, we estimate that rs2239785 accounts for 1.60% of the risk of developing DME in non-Hispanic black patients with diabetes.Thus, it appears the haplotype containing the rs2239785 risk allele (E150) itself, without the presence of G1 and G2 in AFR (most likely a result of recombination with the European rs2239785-only haplotype in the admixed African American population used in this study), accounts for a similar proportion of overall population risk in individuals of EUR and AFR ancestry.This provides further evidence that rs2239785 is a separate, DME specific locus which increases risk of DME regardless of presence of G1 and G2 in patients of African ancestry.

Association with baseline DME clinical and imaging measures
We sought to determine if these two loci impact DME clinical and imaging measures at baseline.We did not see an association for the APOL1 and ANKLE1/PLVAP SNPs with baseline Diabetic Retinopathy Severity Score (DRSS) (N = 1,168; P>0.1).Nor was there any association of the APOL1 SNP with three retinal measures derived from Optical Coherence Tomography (OCT) data: (1) center subfield thickness (CST) (N = 1,339), ( 2) intraretinal fluid in central subfield (N = 1,020), and (3) subretinal fluid (N = 387).However, we did observe an association between CST and the ANKLE1/PLVAP SNP (β = 19.3;P = 0.006).Thus, it appears the association of the risk allele with greater CST is consistent with previous reports of CST being higher in severe DME patients as well as when comparing DME patients to controls [34][35][36].

Discussion
Despite multiple GWAS for DR and subtypes performed to date, few replicable associations have been reported.In this analysis we focused on the DME subtype to increase power by reducing disease heterogeneity.In this multi-ethnic common variant GWAS, we report two novel loci for risk of DME.Both loci achieve genome-wide significance in the discovery dataset and replication in both an additional multi-ancestry DME clinical trial patient sample, and the FinnGen biobank study.We observed consistent effect sizes and directions across ancestries in both the discovery and replication datasets at both loci.We also observed super-population differences in disease heritability, where the SNP-heritability was 0.41 in EUR and 0.48 in AFR.This was partially due to variation at the APOL1 locus in AFR, where the effect allele frequency for the top SNP was over two times that of EUR.This increased allele frequency at the APOL1 locus resulted in an estimated 5% population attributable fraction of DME in non-Hispanic black patients with diabetes, over two times that of non-Hispanic white patients, despite the similar effect size for the SNP in each superpopulation.This indicates that non-Hispanic black patients with DME could preferentially benefit from therapies designed around APOL1, however, follow-up studies with increased sample size are warranted.
To date there are no published reports of APOL1 staining in the eye, but scRNA seq data indicates expression in immune and vascular cells of the retina and choroid endothelia [37].The development of macular edema requires a greater rate of fluid entry from leaky vessels into the retina than that of fluid reabsorption [38].Given that APOL1 renal risk variants expressed under an endothelial promoter in mice caused vascular leak in the skin [39], one possible mechanism is that APOL1 could similarly exacerbate vascular leak in the retinal or choroidal endothelia, directly resulting in edema.Alternatively, the leaky vasculature could allow infiltration of cytokines into the eye, which would upregulate APOL1 beyond a critical threshold in the cells where it is expressed, causing cytotoxicity [40].Another possibility is that the cation channel activity of APOL1 (e.g.K + efflux) could interfere with K + co-transporter mediated water reabsorption by Mu ¨ller or retinal pigment epithelial cells, contributing to edema [38,41].Determining the localization of APOL1 protein in DME eyes would help narrow down the possibilities.
The second locus identified includes the genes PLVAP and ANKLE1.PLVAP, plasmalemma vesicle-associated protein, is also called PV-1 and recognized by the PAL-E antibody.PLVAP is an endothelial protein that regulates basal permeability.In the eye, PAL-E staining is specific to the fenestrated endothelium of the choriocapillaris and ciliary processes [42].Plvap deficient mice develop edema in some vascular beds [43] and loss of Plvap has been shown to disrupt the choriocapillaris structure and lead to retinal degeneration [44].As such, these murine studies are consistent with our results suggesting lower PLVAP mRNA levels are a risk factor for DME.However, PLVAP expression is also seen at the blood retinal barrier (BRB), but only after insult by factors that induce leakage, such as diabetic retinopathy [45].In a properly functioning BRB, PLVAP expression is low or non-existent [46].This is contrary to our results and further studies will be needed to associate the risk allele with differential PLVAP expression and function.
ANKLE1 (which encodes the protein LEM3) was also identified at this locus by our colocalization analysis.LEM domain proteins, such as LEM3 bind BAF, a chromatin protein.ANKLE1 has been shown in C. elegans to be important in the formation and stabilization of chromatin bridges in the end stages of cellular division [47].Expression appears somewhat ubiquitous with enrichment in brain and eye tissue [48], however mechanistic information on ANKLE1 and retina or eye is absent from the literature.While further perturbation studies will be required to assess causality, based on what is currently known, PLVAP is the most likely causal gene at this locus.
Notably, we observed no prior DME, DR or PDR loci at the GWAS significant level in this DME analysis (S9 Table) [8,10,22].This could be for multiple reasons.First, this study contains at least three times more cases than previous DME studies.Additionally, other studies have focused on DR patients as a whole.As DR is a heterogeneous condition, homogenizing based on phenotypes within DR can increase power to find an association.
A weakness of our study is the use of controls from non-ophthalmologic clinical trials.This has the potential to lead to associations that are confounded by diabetes status or other diabetic complications such as kidney disease.This is especially important at the APOL1 locus, for which multiple alleles in APOL1 have strong associations to risk of kidney disease in individuals of African descent.We have addressed this in several ways, including leveraging genetic architecture differences at the APOL1 locus in individuals of European and African descent.We queried highly powered GWAS for T1D and T2D, finding no hint of an association for either locus in patients of European ancestry.For the APOL1 locus we queried DKD, CKD and kidney function GWAS, again finding no association in European ancestry patients.Through PheWAS in the FinnGen population, we found strong associations for ocular complications due to diabetes, but not T1D, T2D or kidney disease.Finally, we conducted a sensitivity analysis, where we removed patients with signs of kidney malfunction at baseline finding no effect on the association at the APOL1 locus.Based on these efforts, these findings appear to be specific for DME risk in multiple populations and not related to the concomitant diagnosis of diabetes or nephropathy, however, as for any genetic association, there remains the possibility that the findings in this study are the consequence of confounding by other unmeasured factors.Historically, the standard design when assessing determinants of complications from diabetes has been to compare patients with complications from diabetes to patients with diabetes and no complications.Our findings question the need for such a constraint on study design, especially when considering there are multiple highly powered GWAS one can query to detect possible confounding due to diabetes risk itself or other co-morbid complications.
In summary, here we present results from the largest GWAS on DME to date which highlights APOL1, a gene canonically associated with kidney disease, and broadens its impact to a second diabetic complication.We find this gene to be associated with risk of DME in multiple populations, and note a substantial increased genetic risk of DME in non-Hispanic black patients with diabetes due to the increased allele frequency of rs2239785 in patients of African ancestry.Additionally, we uncovered an additional locus containing the genes ANKLE1 and PLVAP.Historically, replicable loci for diabetic retinopathy and sub-phenotypes have been difficult to uncover.The present study indicates that power to detect genetic effects influencing diabetic ocular phenotypes can be improved by categorizing DR cases by sub-phenotype prior to genetic analysis.
3 using samples from phase 3 of the 1000 Genomes Project as reference samples labeled by their superpopulation (S1 Fig) (AFR = African [from the African continent and African ancestry from the US and Caribbean]; AMR = AdMixed American [from North and South America excluding US of African Ancestry and Caribbean of African ancestry]; EAS = East Asian [from China, Japan and Vietnam]; EUR = European [from the United States with Northern and Western European ancestry, Italy, Finland, England and Scotland]).

Fig 1 .
Fig 1. Manhattan plot of the discovery study population containing 1,195 DME cases and 4,703 non-DME controls.Gene names in peaks containing SNPs with a P<5X10 -8 are labeled.The red line signifies the threshold for genome-wide significance (P<5x10 -8 ) and the blue line indicates the threshold for suggestive significance (P<1x10 -5 ).