Genetics of osteopontin in patients with chronic kidney disease: The German Chronic Kidney Disease study

Osteopontin (OPN), encoded by SPP1, is a phosphorylated glycoprotein predominantly synthesized in kidney tissue. Increased OPN mRNA and protein expression correlates with proteinuria, reduced creatinine clearance, and kidney fibrosis in animal models of kidney disease. But its genetic underpinnings are incompletely understood. We therefore conducted a genome-wide association study (GWAS) of OPN in a European chronic kidney disease (CKD) population. Using data from participants of the German Chronic Kidney Disease (GCKD) study (N = 4,897), a GWAS (minor allele frequency [MAF]≥1%) and aggregated variant testing (AVT, MAF<1%) of ELISA-quantified serum OPN, adjusted for age, sex, estimated glomerular filtration rate (eGFR), and urinary albumin-to-creatinine ratio (UACR) was conducted. In the project, GCKD participants had a mean age of 60 years (SD 12), median eGFR of 46 mL/min/1.73m2 (p25: 37, p75: 57) and median UACR of 50 mg/g (p25: 9, p75: 383). GWAS revealed 3 loci (p<5.0E-08), two of which replicated in the population-based Young Finns Study (YFS) cohort (p<1.67E-03): rs10011284, upstream of SPP1 encoding the OPN protein and related to OPN production, and rs4253311, mapping into KLKB1 encoding prekallikrein (PK), which is processed to kallikrein (KAL) implicated through the kinin-kallikrein system (KKS) in blood pressure control, inflammation, blood coagulation, cancer, and cardiovascular disease. The SPP1 gene was also identified by AVT (p = 2.5E-8), comprising 7 splice-site and missense variants. Among others, downstream analyses revealed colocalization of the OPN association signal at SPP1 with expression in pancreas tissue, and at KLKB1 with various plasma proteins in trans, and with phenotypes (bone disorder, deep venous thrombosis) in human tissue. In summary, this GWAS of OPN levels revealed two replicated associations. The KLKB1 locus connects the function of OPN with PK, suggestive of possible further post-translation processing of OPN. Further studies are needed to elucidate the complex role of OPN within human (patho)physiology.

Introduction Osteopontin (OPN) encoded by the SPP1 gene was first described as a glycoprotein belonging to the SIBLING (Small Integrin-Binding LIgand N-linked Glycoprotein) family in 1985 [1]. OPN is expressed in a multitude of tissues like osteoblasts, osteocytes, odontoblasts (playing a role in mineralization and bone resorption [2,3]) macrophages, smooth muscle cells, and endothelial cells, but can also be found in the inner ear, the central nervous system, and the placenta [1,2]. Although, OPN can be detected in many cell types it is predominantly synthesized and expressed in kidney tissue. OPN production is stimulated by many factors including parathyroid hormone, calcitriol, calcium, phosphate, and cytokines. The protein is able to bind integrins through a specific peptide sequence, the arginine-glycine-aspartic acid (RGD) motif, making interaction with various cell types possible (via the nuclear factor kappa B pathway, [4,5]). In the kidney, integrins can be found in the Bowman's capsule, glomerular epithelium, and vascular epithelium [6,7]. OPN is synthesized in the thick ascending limb of Henle's loop and in the distal tubule [1,8].
In a review by Kaleta (2019), known (patho)physiological roles of OPN have been discussed [1]. Based on this review, the physiological role of OPN in the kidney is not fully understood yet, but it has been suggested as being essential for tubulogenesis [1]. SPP1 mRNA as well as OPN protein expression were elevated in mostly rat models of kidney diseases and high OPN expression correlated with proteinuria, reduced kidney function, and fibrosis [1]. One study identified various polymorphisms in the SPP1 promoter region affecting its transcriptional activity [9]. In the past several specific SPP1 gene variants have been associated with the pathogenesis and progression of different kidney diseases. Other case-control studies reported on specific variants in the SPP1 gene being associated with different kidney disease patients in comparison to a (healthy) control group: For example, rs1126616 was repeatedly reported as a marker for lupus nephritis and immunoglobulin A nephropathy [10][11][12][13][14]. In connection with diabetic nephropathy, the two SNPs in SPP1, rs11730582 and rs17524488, have been reported [15,16]. We therefore reasoned that the presence of reduced kidney function may represent a good study setting to further establish our understanding of the genetic underpinnings of OPN levels in kidney disease, as some biologic mechanisms might be upregulated and thus be easier to detect, which has been shown before [17][18][19]. In Jing et al. [19], for example, the magnitude of effects for known loci identified in a GWAS of serum urate in CKD patients were of similar or higher magnitude than those reported from population-based studies. The German Chronic Kidney Disease (GCKD) study comprises a large cohort of CKD patients [20]. Besides demographic and clinical data, genetic data are available as well as baseline measurements of serum OPN, providing an ideal setting to explore the genetics of OPN. For this purpose, we performed a GWAS of serum OPN levels in the GCKD study. Table 1 gives an overview of baseline characteristics of a selected number of variables for the complete GCKD study cohort and the GWAS analysis set in which participants with complete data on genetics, OPN measurements as well as estimated glomerular filtration rate (eGFR) and urinary albumin-to-creatinine ratio (UACR) are included (S1 Fig). There were no major discrepancies between the complete cohort and the analysis set.

Genome-wide association study and fine-mapping
We conducted a GWAS for serum OPN levels (log 2 -transformed) using~7.7 million highquality autosomal bi-allelic variants of the GCKD study with a minor allele frequency (MAF) of �0.01 (S1 Table). The quantile-quantile plot comparing observed and expected p-values from the OPN GWAS did not indicate inflation (inflation factor λ = 1.01), consistent with the absence of systematic errors (S3 Fig).
Overall, the Manhattan plot revealed three genome-wide significant regions associated with OPN levels (p-value <5.0E-08; S4 Fig). Besides the three identified regions, conditional analysis did not reveal additional independent signals (S5 Fig). The respective association results for the three index SNPs (= SNP with the lowest p-value in the respective region) are presented in Table 2. For all three SNPs the coded allele was present frequently (allele frequency range 0.5-0.75). The respective coded allele in our cohort decreased OPN levels on average (S6 Fig) with effect estimates per copy of the coded allele ranging from -0.10 to -0.18 (SE: 0.01-0.02; Table 2). One of the index SNPs on chromosome 4 (rs10011284, 4:88833389) is located upstream of SPP1, which encodes the protein OPN itself (Fig 1A). The other index SNP on chromosome 4 (rs4253311, 4:187174683) maps into the KLKB1 gene (intronic variant), encoding the protein prekallikrein (PK) that is converted to kallikrein (KAL); a protease implicated in the surface-dependent activation of coagulation, bradykinin (BK) release, and potentially the renin angiotensin aldosterone system (Fig 2A). The index SNP on chromosome 5 (rs2731673, 5:176839898) maps closest to the F12 gene, which encodes coagulation factor XII, a serine protease that cleaves KLKB1-encoded PK to KAL, among other functions and is also related to blood coagulation, fibrinolysis, and the generation of BK (S7 Fig). A summary of Table 2. Association results for the 3 index SNPs genome-wide-significantly associated with serum osteopontin levels in the GWAS discovery of the GCKD study (N = 4,897) and in the replication cohort of the YFS (N = 1,979).

PLOS GENETICS
Osteopontin, genetics and chronic kidney disease annotations combined from different publicly accessible data bases and related to all three SNPs is provided in S2 Table. We next tested whether these three index SNPs were associated with serum OPN levels in the Young Finns Study (YFS) cohort, a population-based study with a mean age of 38 years (SD: 5.0) and a mean eGFR of 92.6 mL/min/1.73m 2 (SD: 20.7; S1 Methods). Both SNPs on chromosome 4, rs10011284 and rs4253311, were significantly associated with OPN levels showing also direction consistency ( Table 2 and Figs 1B and 2B). In contrast, rs2731673 closest to the F12 gene did not replicate in the YFS cohort (S7 Fig).
The two replicated SNPs on chromosome 4 explained 1% of OPN levels each within the GCKD study and did not show a non-additive effect on OPN levels ( S8 Fig). Moreover, statistical fine-mapping was performed for the two replicated loci to resolve associated loci into potentially causal variants by constructing credible sets that collectively accounted for 99% posterior probability of containing the variant or variants that cause the association signal (PPA; Material and methods, [21]). However, fine-mapping results are inconclusive as both constructed sets are large and single variants included only exhibit low PPA estimates (S3 Table).

Colocalization analyses
In order to learn more about the molecular mechanisms and associated phenotypes underlying the identified association signals for OPN, we compared patterns of OPN GWAS results in predefined regions to respective GWAS summary statistics from three other sources using human data (see Material and methods for details). Comparable patterns may indicate a common biological basis.
Gene expression. First, we performed colocalization analyses of OPN GWAS summary statistics related to the two replicated loci with the corresponding GWAS summary statistics of gene expression in cis using data from the GTEx project and the NEPTUNE study (Material and methods). Colocalization (posterior probability of H4 [p12] >0.8) of the OPN association signals were detected with the expression of MEPE in lung (Fig 3A) and of SPP1 in pancreas (Fig 3B and S4 Table). Furthermore, colocalization was found between the OPN association signal and expression of F11 in six other tissues (in descending order of H4): tibial artery (Fig  3C), brain cortex, terminal ileum part of the small intestine, muscularis of the esophagus, transverse colon, and aortic artery (S4 Table). Except for the colocalization of the OPN signal with expression of MEPE in lung, the effect direction of both traits (OPN and gene expression) was concordant (alpha12>0; S4 Table).
Plasma proteome. In addition, colocalization analyses were conducted using GWAS summary statistics of SNP associations in cis and in trans with levels of~3,000 different plasma proteins (pGWAS) reported by Sun et al. (Material and methods, [22]). While no colocalization was present for the summary statistics of the GWAS of OPN and proteins in cis, pGWAS results for 87 proteins from various protein classes were found to trans colocalize with OPN GWAS results for rs4253311 at KLKB1 (S5 Table). For the majority of colocalization results (60/87, 69%), the effect direction of OPN and proteins was concordant, which is in accordance with the fact that the activated KKS is involved in a broad spectrum of processes, like inflammation, cancer, cardiovascular disease, as well as (patho)physiological roles in kidney and the central nervous system. For 86 mapped proteins, a Gene Ontology (GO) enrichment analysis was conducted to assess whether their encoding genes were enriched in terms representing specific cellular components and molecular functions (Material and methods, S6 Table). Because of a hierarchical structure of reference lists, implicated terms are partially dependent on each other showing e.g. overlapping upregulated molecular functions as well as upregulated hormone and receptor activities (S9 Fig). UK Biobank (UKB) diseases. In order to address the interplay between genetic modulation of OPN levels and phenotypes we further conducted a colocalization analysis with binary UKB disease traits that showed a genome-wide significant association in the GeneAtlas resource [23]. Based on marginal association statistics, no positive colocalization was detected between OPN and various disease traits (S7 Table). Still, for four of seven traits with an association signal different from the OPN signal in the region (H3: p1.2>0.8), a second independent association signal for the respective disease trait in UKB was detected. We then performed additional colocalization analyses based on the obtained conditional association statistics, and identified a positive colocalization between the OPN association signal at rs4253311 at the KLKB1 locus and rs1593 for deep venous thrombosis (DVT; H4: p12 = 0.96; S7 Table and S10 Fig). In the GeneAtlas GWAS of DVT, the major allele of rs1593 (A, allele frequency = 0.87) was associated with a higher risk for DVT (OR = 1.2, p-value 2.4e-16), as was the major allele of rs4253311 (major allele: G, allele frequency = 0.51, OR = 1.13, p-value = 2.6e-16).

Rare variant analysis
Based on 4,879 GCKD participants with available exome chip data, we additionally conducted aggregated rare variant testing (S1 Fig). Variants with a MAF <1% and having a major effect on the gene product (nonsynonymous, stop gain/loss, splicing; "qualifying variants") as annotated by dbNSFP v.2.0 were aggregated (Material and methods, [24,25]).
While there was no significant association result when the burden test was used, we found a significant association using the sequence kernel association test (SKAT) for the SPP1 gene ( Table 3, p-value = 2.5E-08), which remained significant after adjustment for the two replicated SNPs from the initial GWAS (p-value = 9.4E-08). Seven variants were aggregated for the analysis of the SPP1 gene. In the single variant analysis, effect estimates of the seven variants ranged from -1.20 to 0.24 with rs139555315 (4,88901197), a splice-site variant (CADD score: 23.7) showing the most significant association (effect estimate = -1.20, SE = 0.19, pvalue = 2.5E-10) and thus likely driving the association. This is supported by the non-significant result (p-value SKAT = 3.5E-01) when rs139555315 was excluded from the variant set.

Discussion
In this study, we focused on characterizing the genetics of OPN within a CKD cohort, because OPN levels are known to be associated with adverse kidney outcomes, but genetic underpinnings of this kidney-enhanced protein are not fully understood. The use of a CKD patient cohort might present an advantageous setting in which the transcription of kidney-specific genes may be altered in comparison to the general population, making identification of specific SNPs possibly easier. We identified three loci, two on chromosome 4 (rs10011284, rs4253311), that could be replicated in an external population-based cohort, and rs2731673 on chromosome 5, which could not be replicated. When aggregating rare variants, the SPP1 gene encoding OPN was detected.
To our knowledge this is the first GWAS of serum OPN levels quantified via ELISA. Other studies like Sun et al. conducted GWAS of plasma proteins (pGWAS) including OPN. Here proteins were quantified differently via an aptamer-based technology (SOMAscan, [22]). While this study was likely too small (N = 3,301) to detect any genetic signals for OPN, a study by Pietzner et al. (N up to 10,708) provided signals on chromosome 3, 4, and 10 associated with SOMAscan-measured OPN [26]. The detected signal on chromosome 4 (rs5860110, 4:88897106, MAF = 0.30) is a common, intronic indel located within SPP1 not in linkage disequilibrium (LD) with common variants identified in our project. In a next step, Pietzner et al. reported association statistics for the SOMAscan detected loci using a different technique to measure proteins (Olink, antibody-based protein panels). Olink measurements were, however, only available for a small fraction of the study population and none of the three SOMAscanmeasured OPN signals could be confirmed. The systematic comparison of protein levels quantified by these two techniques revealed varying correlations (median 0.38, IQR: 0.08-0.64). For OPN, a correlation coefficient of 0.51 was reported. A similar comparison of even more proteomics platforms also reported on a wide range of correlations among measurements [27]. Differences in the detection of genetic signals could thus not only be explained by differences in power but by technical, protein and variant characteristics. Across platforms, any comparison results is thus difficult and meta-analyses could lead to wrong inferences. In turn this may either lead to formation of a premature stop codon and a shortened protein or more likely to a faster mRNA degradation called nonsense mediated decay [28]. The variant driving this association at SPP1, a splice-site variant (rs139555315, 4:88901197, MAF = 1.54E-03), was reported to be associated with pediatric systemic lupus erythematosus [29]. SPP1 is made up of 7 exons containing 942 transcribed nucleotides from the start codon in exon 2 to the stop codon (within exon 7, [30]). OPN belongs to the SIBLING glycoprotein family of secreted phosphoproteins; other members of this family comprise dentin matrix protein 1, dentin-sialophosphoprotein, statherin, bone sialoprotein, and matrix extracellular phosphoglycoprotein (MEPE). Results from our colocalization analysis using GeneAtlas are pointing towards a connection of SPP1 with bone disorders. Fitting with these results, one of OPN's main physiological functions in the body is the regulation of biomineralization processes [31]. Conflicting results have been reported on OPN as well as SPP1 polymorphisms and susceptibility to nephrolithiasis in the past [32][33][34][35][36]. From in vitro studies it may be inferred, that OPN inhibits nucleation, growth, and aggregation of calcium oxalate crystals [37], but clinical studies draw a more unclear picture. Some researchers report on a protective role of OPN, where others do not [38,39]. Nonetheless, a recent study from South Asia found a significant association of the SPP1 rs2853744:G>T polymorphism with urolithiasis [40].
OPN is mostly secreted, but an intracellular form has also been reported [41]. Using reverse-transcription-PCR OPN was found to be expressed in normal human adult kidney, further immunohistochemical analyses and in situ hybridization revealed OPN expression to be restricted to the distal convoluted and straight tubules in kidney cortex and medulla in monkey kidney [42]. Looking at GTEx tissue expression data, a positive colocalization of the OPN association signal at SPP1 with pancreas tissue was detected. This is in line with findings in the literature where OPN has been suggested to have a role in type 2 diabetes. One study performed by Cai et al. investigated a diabetic mouse model SUR1-E1506K+/+ and islets from human donors and was able to demonstrate that in islets from human cadaver donors, OPN gene expression was elevated in diabetic islets, and externally added OPN significantly increased glucose-stimulated insulin secretion from diabetic but not normal glycemic donors [43]. Many other studies have also investigated OPN's role in pancreatic cancer, here, OPN was found to be a prognostic marker associating higher levels with poor overall prognosis in patients [44].
MEPE (Matric Extracellular PhosphoglycoprotEin) is the gene encoding the secreted calcium-binding phosphoprotein MEPE. A common feature of SIBLING proteins is the Acidic Serine Aspartate Rich MEPE associated motif (ASARM), involved in the regulation of mineralization, bone turnover, mechanotransduction, phosphate and energy metabolism [45]. The ASARM motif is also the connecting link between SIBLINGs and FGF23 thereby being part of the physiological bone-kidney link [45]. MEPE is involved in the regulation of the phosphate homeostasis controlled by the kidney and intestine [46,47]. Colocalization of the OPN association signal at MEPE leads to detection of an association with lung tissue. So far a connection between several members of the SIBLING family with lung cancer have been reported, but a definite connection between MEPE and lung has not been made before [48]. Since there are multiple transcript variants known due to alternative splicing a connection between MEPE and lung cannot be ruled out and could offer possible research areas in the future. Results from our colocalization analysis using GeneAtlas are pointing towards a connection of MEPE with bone disorder. Diseases associated with MEPE are osteomalacia and autosomal-dominant hypophosphatemic rickets supporting this connection between MEPE and bone disorders [49]. Another phenotype seemingly related to rs10011284 is gout. This association is most likely driven by ABCG2, which is located in close proximity to SPP1 and MEPE, and is one of the best described uric acid transporter genes to date [19,50].

rs4253311; 4:187174683 (KLKB1 intronic)
The 2 nd replicated index SNP rs4253311 (MAF = 0.50) on chromosome 4 is an intronic variant of the KLKB1 gene. Again, whether this variant is the causal variant responsible for the observed association cannot be answered by this study.
Pathways related to this gene include complement and coagulation cascades, as well as degradation of the extracellular matrix [61,62]. An important paralog of KLKB1 is the gene F11. Since our analysis revealed colocalizations between OPN association signal for rs4253311 and expression in multiple tissues for F11, it could be presumed that rs4253311 is linked to F11 rather than KLKB1. Interestingly, OPN contains several protease cleavage sites that regulate its activity [31]. Some OPN interaction sites require cleavage by thrombin, another serine protease, to become fully functional. In return, OPN has been shown to be a substrate for other proteases, that regulate its activity [31]. One might speculate that inflammatory processes within the kidney of CKD patients bring together, on the one hand, an activated KKS and, on the other hand, higher OPN levels, thus it might be plausible that new bioactive OPN fragments could possibly be generated by KAL.
Conditional colocalization analyses with SNPs located around KLKB1 resulted in positive results for rs1593, mapping intronically into the F11 gene, and DVT. DVT is a serious disease influenced by both genetic and environmental risk factors, but 60% of the variation in risk for DVT has been attributed to genetic risk factors in the past [63]. Genetic studies of DVT have reported several common SNPs in the 4q35.2 locus to be associated with DVT [63]. These common SNPs were localized within KLKB1 and F11 amongst others [63].
Other colocalization results showed association signals between the OPN locus at KLKB1 and 87 plasma proteins. These proteins showed enrichment for proteins of the neuronal cell body in plasma and cerebrospinal fluid of patients. BK, the principal effector of the plasma KKS, is generated systemically and locally (vessel wall) and acts in a paracrine or autocrine way influencing vascular tone and ultrastructure via two G protein-coupled receptors [64,65]. Components of the KKS and in particular BK have been shown to have important functions in the central nervous system by regulating cerebrovascular resistance, vessel capacity and permeability of the blood-brain-barrier. Maintenance of a vascular permeability equilibrium in the central nervous system is critical for maintaining brain integrity. Associations of the KKS in CNS pathology include several disease states among which are neuropsychiatric lupus, Alzheimer's disease, schizophrenia, and epileptic syndrome [66]. These facts in turn explain the enrichment analysis results of the molecular functions: hormone activity via signaling receptor binding as well as receptor regulatory activity.

rs2731673; 5:176839898 (F12/GRK6 intergenic)
Finally, the index SNP rs2731673 (MAF = 0.25) that could not be replicated in the populationbased YFS cohort is located between the genes F12 and GRK6. While this non-confirmation may indicate a false positive result of our GWAS, replication may have failed for some unknown reason such as being a specific result relevant for CKD patients. In the absence of another cohort (whether population-based or CKD cohort) with necessary data on OPN and genetics, we were unable to validate this any further.
The gene nearest to the locus is the F12 gene that encodes coagulation factor XII, which, together with plasma PK, belongs to the contact activation system [67,68]. While KAL can activate factor XII (factor XIIa: active enzyme of factor XII) that, in turn promotes inflammation via the KKS, including PK [69]. Since CKD patients markedly have more inflammation and fibrosis as the joined common final path of kidney disease progression these insights and connection may encourage further validation of this locus. Inflammatory processes also play a major role in CVD and CKD patients, who are well known to suffer from excessive CVD promoting higher morbidity and mortality. In the human cardiovascular system, OPN is primarily expressed in endothelial cells, macrophages, and smooth muscle derived foam cells and can also be detected in human atherosclerotic plaques of the arterial system [70,71]. Higher serum OPN levels were found in patients with acute coronary syndrome vs chronic coronary syndrome. In coronary artery disease (CAD) patients high OPN levels were associated with rapid coronary plaque progression and in-stent restenosis [72]. OPN has been known to be associated with adverse outcomes in patients with CVD [73][74][75], but its function in CVD is diverse. Acute increases of OPN in CVD are associated with wound healing and neovascularization [76,77]. Chronically increased OPN, however, is associated with a poor prognosis of major adverse cardiovascular events [78].
GRK6 on the other hand is involved in blood pressure regulation and was found to have a decreased kidney expression in spontaneously hypertensive rats, providing a similar rationale [79].

Strengths and limitations
The presented analyses have several strengths and limitations: Firstly, our analyses are based on a CKD patient population of European ancestry and mostly CKD stage 3 under regular nephrologist care. While biological mechanisms may be upregulated in impaired kidney function and thus detected more easily in CKD patients, results potentially compromise generalizability to the general population as well as to other ethnicities. Secondly, we could replicate two of three identified loci in a population-based cohort of the YFS, who also applied an ELISA technique to measure OPN, thus confirming the potential to transfer findings from a CKD cohort to the general population. Regarding the third, non-replicated finding, one should await further validation of this result as regarding it as false-positive would be a premature conclusion. Thirdly, serum OPN was measured in the GCKD study from baseline samples using a state-of-the-art ELISA assay. Although serum has been validated for use in the used OPN assay, it is not the recommended sample type, because of proteolytic cleavage by thrombin during the clotting process. In contrast, OPN measurements were obtained from plasma samples in the YFS cohort. While a comparable assay was used, differences between levels in CKD patients and YFS participants might thus be explainable by more than disease status.

Conclusions
In this first GWAS of serum OPN levels in a large CKD cohort two replicated associations on chromosome 4 were detected. One locus closest to the SPP1 gene, as well as a locus mapping into the KLKB1 gene, connecting OPN to its production and the KKS. Further studies are needed to fully explain OPN's role in kidney (patho)physiology and elucidate functions of OPN in connection with the KKS and possibly inflammatory processes during kidney fibrosis.

Ethics statement
The German Chronic Kidney Disease (GCKD) study was approved by all Ethics Committees of participating institutions in Germany that also covers the present project. It was registered in the national registry for clinical studies (DRKS 00003971; S1 Information). Written informed consent was obtained for all participants.

Study population
The GCKD study cohort consists of 5,217 adult CKD patients of European ancestry with (i) an eGFR between 30-60 mL/min per 1.73m 2 or (ii) an eGFR >60 mL/min/1.73 m 2 and 'overt' albuminuria/proteinuria at baseline [20]. At the baseline visit (2010-2012), trained personnel obtained data using a standardized questionnaire and physical examinations. Biosamples were obtained, directly processed and then stored at -80˚C in a central biobank [80]. Study procedures and main baseline findings have been reported before [20,81].

Baseline variables and measurements
A standardized set of biomarkers was measured in a central certified laboratory using standardized protocols [81]. Among others, creatinine and albumin from serum and urine were quantified using an IDMS traceable methodology (Creatinine plus, Roche, Germany) and a turbidimetric method (Tina-quant, Roche, Germany) Roche/Hitachi MODULAR P, respectively.
In 2015, OPN was measured from baseline serum samples of the complete GCKD study cohort using a quantitative sandwich enzyme immunoassay technique (solid-phase ELISA; Quantikine Human OPN Immunoassay DOST00 from R&D Systems (R&D Systems Europe, Abingdon, UK)). Quantification was carried out at the Institute of Clinical Chemistry and Laboratory Medicine, Greifswald, Germany. Coefficients of variation (intra-assay) were 4.5%, 5.3% and 3.5% for low, median and high levels, respectively. The inter-assay coefficient of variation was 6.4%. Reagents and secondary standards were used as recommended by the manufacturer.

Genotyping, quality control and imputation
Detailed information on genotyping and data cleaning in the GCKD study has been described previously [18]. Briefly, DNA was isolated from whole blood and genotyped at 2,612,357 variants for 5,123 GCKD participants using the Illumina HumanOmni2.5 Exome BeadChip array (Illumina, GenomeStudio, Genotyping Module Version 1.9.4) at the Helmholtz Center Munich. Data cleaning was carried out separately for the Omni2.5 content and the exome chip content of the array.
Based on standardized protocols [84], custom written scripts (R, Perl) and Plink1.9 [85] software was used for quality control (QC) of the Omni2.5 content. Sample-based QC steps included checks of call rate, sex, heterozygosity, genetic ancestry and relatedness, leading to the exclusion of 89 samples. On the variant level, single nucleotide polymorphisms (SNPs) were excluded if the call rate was <0.96, and whenever the assumption of the Hardy-Weinberg equilibrium was violated (p-value <1.0E-05). After removing SNPs on duplicate positions, the cleaned dataset contained 5,034 individuals and 2,337,794 SNPs (S1 Fig). Genotypes were then imputed using minimac3 v2.0.1 at the Michigan Imputation Server [86]. The Haplotype Reference Consortium (HRC) haplotypes version r1.1 were used as the reference panel, and Eagle 2.3 was used for phasing. The final dataset contains data of 5,034 participants and 7,750,367 high-quality autosomal bi-allelic variants (imputation quality of R 2 �0.3, MAF �1%).
For the exome chip content, QC was similarly conducted [18]. In addition, checks specific for exome variants were added [87]. In brief, 96 individuals and 3,818 SNPs were removed, the latter of which had a call rate <0.95 and a Hardy-Weinberg equilibrium p-value <1.0E−05. The final exome chip dataset contains 5,027 participants with 226,233 variants (S1 Fig). For the exome chip association analysis, the genotypes were post-processed using zCall with a zscore threshold of six [88]. Genomic positions base on human genome build GRCh37.

Genome-wide association study of common variants
As previously reported [17,18], GWAS was conducted for GCKD participants with complete genotyping (Omni2.5), eGFR, UACR and log 2 (OPN) measurement (N = 4,897) data using linear regression of log 2 (OPN) on SNPs (additive genetic model) with a MAF �1%, adjusted for age, sex, log(eGFR), and log(UACR) (S1 Fig). Association analysis was performed using SNPTEST v2.5 [89]. Summary statistics were checked for quality using GWAtoolbox [90] and for inflation using genomic control [91]. A genomic control correction, however, was not requested (λ = 1.01). Associations with a p-value <5.0E-08 were considered significant. Per chromosome, an index SNP was defined as the SNP with the lowest genome-wide p-value with a 1-Mb interval centered around this SNP. This approach was repeated until no further SNP outside the interval(s) was available passing the genome-wide significance threshold. In order to discover further independent signals, we repeated GWAS analysis for chromosomes with significant results by conditioning on the genotype of the SNP with the lowest association pvalue of the respective chromosome. This procedure was repeated until no further genomewide signal was observed.

Fine-Mapping
Statistical fine-mapping [21] was carried out as previously described [17] for the two replicated SNPs within a region ±500kb. Approximate Bayes factors (ABFs) were then derived from the original GWAS statistics estimates. The SD prior was chosen as 0.61 because 95% of the effect size estimates fell within the −1.2 to 1.2 interval [21]. The ABF of the SNPs were used to calculate the posterior probability for each variant driving the association signal (PPA, 'causal variant'). Credible sets were determined by summing up PPA-ranked variants until the cumulative PPA was >99%.
In order to further understand the molecular mechanisms and associated phenotypes underlying the associations, we performed colocalization analyses of the OPN GWAS summary statistics related to the two replicated OPN loci with GWAS summary statistics from three other sources as outlined below. For all colocalization analyses, we used the 'coloc.fast' function from the R package gtx with default parameters and prior definitions (https://github.com/ tobyjohnson/gtx), an implementation of an adapted version of the colocalization method introduced by Giambartolomei et al. [98]. We consider a positive colocalization when the posterior probability of a shared causal variant at the association locus for both traits (H4, p12) was > 0.8.
The analysis steps of colocalization have been described in detail elsewhere [17]. Firstly, GWAS summaries of GTEx and NephQTL in genomic regions ±100kb of the two OPN SNPs were extracted. The genes in the extracted GWAS are identified and for each gene, a cis window of 500kb flanking the start and end of the gene are defined. Then, for every such cis gene window, with at least one SNP having an association p-value < 0.001, the GWAS summaries of GTEx and NephQTL tissue as well as the OPN GWAS were extracted and used as input for colocalization analysis.
Plasma proteome. In addition, we used GWAS summary statistics of plasma proteins (pGWAS) by Sun et al. [22] to run colocalization analysis to identify consistent association signals between OPN and proteins with effects in cis as well as in trans. In contrast to data from GTEx and NephQTL, the genome-wide available pGWAS summary does allow the assessment of both effects.
In order to detect colocalization with cis-pQTLs, pGWAS summary statistics of any protein-gene region (gene region ±500kb, cis region) were extracted. Per OPN locus and a 100kb region around it, we checked if any of the cis pGWAS extracts overlapped and had a pGWAS association p-value of <0.05/2 (Bonferroni correction for two OPN loci). For all hereby selected proteins, we then extracted the protein-gene-region from the OPN GWAS summary statistics and ran colocalization within the protein-gene-region.
For potential colocalization with trans-pQTLs, we selected all proteins with pGWAS association p-values <0.05/2/3,000 (Bonferroni correction for the two OPN loci and number of proteins evaluated in pGWAS) within a 100kb region around an OPN-associated index SNP. For all hereby selected proteins, colocalization analyses were conducted within the ±500kb region of the OPN-associated index SNP.
For colocalizing proteins, a Gene ontology (GO) enrichment analysis (http://geneontology. org/, [101,102]) in form of a PANTHER [103] overrepresentation test with the two annotation data sets of GO cellular component and GO molecular function (homo sapiens) as references was conducted to assess enriched categories to which identified proteins were assigned to. Overall, 20,595 human genes are mapped to various terms related to cellular component and molecular function. A category is considered enriched if both, the Bonferroni-corrected pvalue of the Fisher's exact test and the false discovery rate based on the Benjamini-Hochberg procedure, are <0.05.
UKB diseases. Finally, we used the GWAS from GeneAtlas database (http://geneatlas. roslin.ed.ac.uk/) to perform colocalization analysis for the two replicated OPN loci (±500kb) and all UKB binary disease traits that showed genome-wide significant associations (p-value <5.0E-08) in at least one of the two replicated OPN loci. Overall, GeneAtlas comprises GWAS results of 660 binary disease traits of~450,000 UKB participants [23].
In addition, we adopted the conditional colocalization analysis approach which was firstly applied in a GWAS of plasma proteome [104]. Performing colocalization on conditionally independent association statistics could reveal true colocalization signals that were missing when using marginal association statistics in the presence of multiple independent association signals. We applied GCTA COJO Slct algorithm to identify independent association signals in the OPN region for the seven traits [105], which showed a trait association signal different from the OPN signal (H3: p1.2>0.8). The cleaned and imputed GCKD genotype dataset mentioned before was used as LD reference by GCTA. We set the collinearity cutoff at 0.1 to be conservative. For loci with more than 1 independent signal, an approximate conditional analysis was conducted by GCTA COJO-Cond algorithm to generate conditional association statistics conditioned on the other independent SNPs in the region [105]. Finally the colocalization analyses were performed as before for each of the independent SNPs using the conditional association statistics as input.

Aggregated rare variant testing
Overall, 4,879 GCKD participants with complete data on genotyping (Exome chip), eGFR, UACR and log 2 (OPN) measurements were included in the analysis of aggregated rare variant testing (S1 Fig). As previously described [106], two types of rare variant aggregation tests (burden test, sequence kernel association test [SKAT]) implemented in the R package seqMeta (v1.6.7, [107]) were conducted using exome chip data and log 2 (OPN) measurements (outcome). Per gene, variants with MAF <1% and having a major effect on the gene product (nonsynonymous, stop gain/loss, splicing; "qualifying variants") as annotated by dbNSFP v.2.0 were aggregated [24,25]. Results were filtered to retain genes with cumulative minor allele count (MAC) �10 and with �2 contributing variants per gene. Analyses were adjusted for age, sex, log(eGFR), and log(UACR). To adjust for multiple testing, the statistical significance level was corrected for the number of assessed genes (N = 17,575) and the two conducted tests: 0.05/(2×17,575) = 1.4E-06. Moreover, analyses were repeated for significantly associated genes additionally adjusted for the two replicated OPN loci.

Replication of identified loci in Young Finns Study
The three OPN loci identified in the GWAS of GCKD participants were tested for replication in the Cardiovascular Risk in Young Finns Study (YFS) cohort. Here, plasma OPN was measured by enzyme-linked immunosorbent assay (Human Osteopontin Quantikine kit, R&D Systems, USA) from samples thawed for the first time for the assay in 2007. Samples of 2,442 participants and 546,677 genotyped SNPs were available for further analysis after QC and imputation. Further details can be found in S1 Methods.
Per selected locus, association analysis of log 2 (OPN) on SNP dosage (additive) was performed by fitting linear regression models adjusted for age, sex, and eGFR by using SNPTEST v2.5.4 [89]. GFR was estimated with the MDRD study equation and log 2 -transformed prior to analysis [108]. Replication was defined by a one-sided association p-value <0.05/3 (Bonferroni correction for three OPN loci).