Bayesian mixed model analysis uncovered 21 risk loci for chronic kidney disease in boxer dogs

Chronic kidney disease (CKD) affects 10% of the human population, with only a small fraction genetically defined. CKD is also common in dogs and has been diagnosed in nearly all breeds, but its genetic basis remains unclear. Here, we performed a Bayesian mixed model genome-wide association analysis for canine CKD in a boxer population of 117 canine cases and 137 controls, and identified 21 genetic regions associated with the disease. At the top markers from each CKD region, the cases carried an average of 20.2 risk alleles, significantly higher than controls (15.6 risk alleles). An ANOVA test showed that the 21 CKD regions together explained 57% of CKD phenotypic variation in the population. Based on whole genome sequencing data of 20 boxers, we identified 5,206 variants in LD with the top 50 BayesR markers. Following comparative analysis with human regulatory data, 17 putative regulatory variants were identified and tested with electrophoretic mobility shift assays. In total four variants, three intronic variants from the MAGI2 and GALNT18 genes, and one variant in an intergenic region on chr28, showed alternative binding ability for the risk and protective alleles in kidney cell lines. Many genes from the 21 CKD regions, RELN, MAGI2, FGFR2 and others, have been implicated in human kidney development or disease. The results from this study provide new information that may enlighten the etiology of CKD in both dogs and humans.


Introduction
Chronic kidney disease (CKD) in humans is comprised of heterogeneous disease pathways that result in structural damage or decreased function presented as reduced glomerular filtration rate (GFR) or other markers of kidney disease for a period of more than three months [1].
In humans, CKD consists of clinically distinct disorders, also including congenital anomalies of both the kidney and urinary tract (CAKUT) and kidney disease secondary to hypertension and diabetes. This group of diseases represents a heavy and increasing disease burden in humans with prevalence estimates >10% and with substantial variation between human populations [2,3].
CKD also commonly occurs in dogs and cats [4,5]. The incidence of CKD in dogs is most likely between 0.5-1.5% and estimates from clinical data indicate that 10% of dogs over the age of 15 years are diagnosed with CKD [4]. Epidemiological analysis based on insurance data has shown an average incidence of kidney disease in general (without distinction between acute and chronic forms) of around 1.6% and significant differences in risk between Swedish dog breeds, with the highest incidence in the Bernese mountain dog, miniature schnauzer and boxer [5].
Although CKD may be initiated by environmental factors, a clear and significant genetic contribution has been shown in humans, strongly supported by heritability estimates of both disease and markers of kidney dysfunction, familial clustering, as well as linkage and genomewide association studies (GWAS) [6][7][8][9]. It has become obvious that hundreds of genes contribute to complex forms of kidney disease as well as to the variation in markers associated with healthy kidneys [9,10]. Estimated glomerular filtration rate (eGFR), which is used as a marker for CKD in humans, was found to have high heritability (~29%) in a large biobank-study and more than 300 loci associated with eGFR-values were identified [11]. To date, more than 600 genes and loci have been implicated in monogenic and complex kidney diseases [12]. Causal variants associated with CKD may include rare monogenic or private variants, or common alleles with smaller effects. Additional variants are associated with kidney function markers, like eGFR and serum creatinine level [13].
A number of reports describe inherited renal disease in different dog breeds: Autosomal recessive hereditary nephropathy is noted in shih tzu [14], cocker spaniel [15], English springer spaniel [16], and Bernese mountain dog [17]; an autosomal dominant hereditary nephropathy in the bull terrier dogs [18]; and an X-linked hereditary nephropathy in Navasota dogs [19]. For decades, there have been concerns about a relatively high incidence of CKD in boxers. Based on pedigree studies breeders have suspected that CKD in young dogs commonly referred to as juvenile kidney disease (JKD), might have a genetic component. In a retrospective juvenile nephropathy study of 37 boxers less than five years old, Chandler [20] reported morphological findings of interstitial fibrosis, cell infiltration, dilated tubules and sclerotic glomeruli, Hoppe and Karlstam [21] described fetal, immature glomeruli and foci with small dysplastic structures surrounded by immature mesenchymal tissue in three CKD boxer puppies. Kolbjørnsen et al [22] studied morphological characteristics in seven related boxer dogs, three males and four females, between two months and five years. All cases had bilaterally small kidneys with segmental scarring and pronounced interstitial fibrosis. Across different studies of boxers, several dogs share these morphologic features of immature glomeruli, atypical tubules, proliferative arterioles and adenomatoid change [21,23,24]. Such features were less prominent in the study by Chandler [20], who also found a high frequency of incontinence and urinary tract infections in the studied boxers. Kolbjørnsen et al [22] classified the morphological features as reflux nephropathy or segmental hypoplasia. One case of juvenile nephropathy/ nephronophthisis in boxers has also been reported [25].
The different studies show variation in morphology and time at onset, indicating that there may be a significant phenotypic and genetic heterogeneity also within the boxer breed. In this study, we aimed to uncover the genetic loci that contribute to canine CKD in a boxer population and explore candidate genes and functional variants within the associated regions. We hope this study will further improve our understanding of the genetic mechanism of CKD in dogs, which may in turn allow us to establish a canine model to facilitate the relevant studies in humans.

CKD sample collection and genotyping
In this study, a total of 362 boxer samples were collected from Australia, Denmark, Finland, Germany, Norway, Sweden, UK and US (S1 Table). Only cases below six years of age and controls above eight years were included in this this study. The diagnosis of CKD in these dogs was supported by clinical characteristics, clinical pathology data, or a combination of them (see Materials and Methods). All individuals were genotyped with the Illumina Canine Bead-Chip. After removing samples for high-inbreeding, missingness, relatedness, and low supported phenotype, a final set of 117 cases and 137 controls with 101,664 markers was used for the association analysis, with UU_Cfam_GSD_1.0 (CanFam4) as the reference assembly.

Candidate regions for CKD
We estimated the effect size of markers for CKD using the BayesR algorithm. BayesR models the effect sizes of all variants simultaneously, provides unbiased estimates of the variants with larger effect sizes, with fewer false negatives and higher rate of true versus false positives, compared to traditional single-SNP GWAS with a linear mixed model [26]. The top 50 markers from Bayesian analysis were selected as disease associated candidates, presenting with an absolute effect size > 0.00047 (8.2 SDs from the mean; Fig 1A and S2 Table). We investigated these top 50 BayesR markers in 75 different dog breeds from a previous study [27], and the frequencies of the risk allele varied greatly among these breeds (S3 Table). Based on whole genome sequencing (WGS) data from 20 Norwegian boxers (S4 Table), we imputed genotypes on autosomes for all samples in the Bayesian set. A total of 5,206 imputed variants (3,924 SNPs + 1,282 Indels) were discovered in the strong linkage disequilibrium (LD, r 2 > 0.9) with the Bayesian candidate markers and defined the CKD candidate regions.
Twenty-one CKD associated regions (Table 1 and S1 Fig) were detected across 15 autosomes with a total size of 15.9 Mb (mean region size of 756 Kb). At each CKD region, marker with the highest effect were selected, and their allele frequencies are illustrated in Fig 1B. The allele load for these 21 markers showed that the cases carried an average of 20.2 risk alleles, significantly higher than the 15.6 in the control group (Fig 1C, p< 2.2x10 -16 ; t-test). With a cutoff of 17 risk alleles, the odds ratio of CKD status was 17.3 (95% CI: 8.5-35.2) between high and low load groups. This risk allele load was counted in the 75 other dog breeds mentioned above (S5 Table). In this boxer population, an ANOVA test indicated that these 21 top markers together explained 57% of phenotypic variation (S6 Table). A total of 146 protein coding genes were identified within or nearby (the closest gene on each side) the CKD regions (Table 1). Gene ontology analysis didn't reveal enriched terms relevant to kidney development or disease. Interestingly, selection signals from dog domestication were found inside the two top CKD regions on chr18 and 28, and within a short distance from the CKD regions (< 300 Kb) on chr2, 11, and 14 (S7 Table).

Functional annotation of variants in CKD regions
Using the UU_Cfam_GSD_1.0 annotation from NCBI, we investigated the genomic location of the 5,206 imputed variants. Most of them were present in intronic or intergenic regions (S8 Table). Fifteen variants occurred in coding regions, and four of these were nonsynonymous  (3) were added for the markers from the same chromosome. (C) Distribution of risk allele load for the cases (red) and controls (blue). On average, the cases carried 20.2 risk alleles, which is 4.6 higher than the controls (grey dash lines indicated the average load for cases and controls).  Table). PROVEAN prediction on the effect suggested only the c.1784G>C (p. Arg595Thr, XM_038560916.1) in ITSN2 as a deleterious mutation (PROVEAN score = -3.11) [28]. According to a canine variant dataset (Dog10K, 1929 individuals), this alternative allele G is rare in purebred dogs (F = 0.002), but common in the tested boxer population, for both cases (F = 0.51) and controls (F = 0.71). Considering its high frequency, we assumed the actual mutation effect could be relatively mild, otherwise it would be rapidly removed through breeding.
The large number of non-coding variants triggered our interest to investigate their potential regulatory function. We analyzed the phyloP constraint score for 3,924 imputed SNPs. 233 SNPs were found with intermediate score (phyloP > 1), of which 38 SNPs were highly constrained (phyloP >2.54, FDR < 5% in dog). The imputed SNPs were lifted to the human genome (hg38) to intersect with regulatory elements from three databases: Candidate cis-Regulatory Elements (cCREs, ENCODE) [29], promoter and enhancer (GeneHancer)[30], and With electrophoretic mobility shift assay (EMSA), we tested the regulatory function of these 17 putative regulatory SNPs (C1-C17; S10 Table and S2 Fig). Four SNPs, two from the introns of MAGI2 (C8 and C9), one from the intron of Polypeptide N-Acetylgalactosaminyltransferase 18 (GALNT18, C13), and one (C14) in the intergenic region on chr28, showed allele-specific binding in HEK293 or/and MDCK cell lines, implying the allele substitution at these positions could lead to change of protein-nucleic acid interaction.

CKD region on chr18
The strongest Bayesian signal was found on chr18. A~5.2 Mb CKD region consisted of two LD blocks and contained 18 Bayesian candidate markers (Fig 2A). The marker with the highest effect (BICF2S23128680; chr18: 16,900,760) was located in LD block 1, in the intron of the Reelin (RELN) gene ( Fig 2B), and the risk allele frequency was 0.62 in cases and 0.38 in controls. RELN encodes a large extracellular glycoprotein that is required for specific biological activities at different times during embryonic development [31]. In the kidney, Reelin is highly expressed in proximal convoluted tubules and distal convoluted tubules in the early fetal stages of development, suggesting its participation in nephrogenesis [32]. Among the 17 putative regulatory SNPs mentioned above, eight of these reside in this CKD region (C3-C10; Fig 2C). The SNP C4 (chr18: 16,949,830) was identified from the intron of RELN, and was constrained among mammals (phyloP = 2.91) with a putative regulatory function (HS signals and cCRE enhancer; Fig 2C). However, EMSA analysis did not detect any protein-nucleic acid interaction on this site. Interestingly, metalloproteinase with thrombospondin motifs-3 (ADAMTS3), which encodes a protease that directly cleaves and inactivates reelin [33], was identified from the other CKD region on chr13. This implies that these two genes could contribute to CKD in the same pathway. Notably, the LD block 1 contained a selection signal (chr18:15.7-16.1 Mb), which was identified as the strongest signal in a demographically-based domestication study [34] and repeatedly reported in a parallel evolution analysis between dogs and humans [35]. From the other LD block 2, MAGI2 is important for kidney barrier function [36]. Loss of MAGI2 in podocytes can disrupt the slit diaphragm and morphologic abnormalities of foot processes in kidney [37]. In humans, MAGI2 is involved in the regulation of cytoskeletal rearrangement in podocytes, with its loss predisposing to proteinuria and CKD [38]. Within introns of the MAGI2 gene, we identified four putative regulatory variants (C7-C10; Fig 2C), and two of these showed allele-specific binding in EMSAs ( Fig 2D). The SNP C8 (chr18:18518972) had a relatively low phyloP score of 1.48, but was located in an enhancer element (EH38E2565734, cCRE) and overlapped with HS signals in 41 cell lines. EMSA exhibited much stronger binding for the C protective allele than the T risk allele in HEK293. According to the Dog10K dataset, the risk allele is rare in wolves (F = 0.07), but has a high frequency in both purebred dogs (F = 0.74) and village dogs (F = 0.51), indicating that the risk allele may be under selection during breeding or domestication. The SNP C9 sits at a highly constrained position (phyloP = 6.2) in an enhancer (EH38E2565751, cCRE). The EMSA binding band was strong for the G protective allele in HEK293 (Fig 2D).

CKD region on chr28
The second strongest association was found in a region at chr28: 30.8-31.4 Mb (Fig 3), with the highest marker effect of 0.002 (BICF2P865971). Most imputed variants in LD (r 2 > 0.9) Annotated genes were identified around the CKD region. One selection signal of dog domestication and a~500 Kb reported genomic duplicates were observed in this region. 2,064 imputed variants were found within the same LD of Bayesian candidate markers. (C) 76 imputed SNPs with phyloP score > 1 were lifted to human genome (hg38), and intersected with the regulatory elements from the candidate cis-Regulatory Elements (cCRE; red bars) and GeneHancer databases (blue bars), as well as the hypersensitivity (HS) signal from 95 cell lines (the height of the bar indicated the number of cell lines with the HS signal). (D) Eight putative regulatory SNPs (C3-C10) from the CKD region were tested with electrophoretic mobility shift assays (EMSAs). SNPs C8 and C9 showed alternative binding ability between the risk and protective alleles in HEK293 cell line. were observed in the intergenic region between WD Repeat Domain 11 (WDR11) and Fibroblast growth factor receptor 2 (FGFR2). Mutations in WDR11 can cause congenital hypogonadotropic hypogonadism (CHH) and Kallmann syndrome (KS) in humans [39], with unilateral or bilateral renal agenesis being a common phenotype in these patients [40]. FGFR2 is critical for early metanephric mesenchyme and ureteric bud formation in kidney [41]. FGFR2 is 197 Kb outside of the CKD region, but resides in the same TAD domain as the top BayesR markers. There is a human FGFR2 enhancer (GH10J121157) annotated in this CKD region, but no imputed variants were detected there. Interestingly, a reported domestication selection signal (chr28:31.1-31.2 Mb) was also found inside the CKD region.

CKD region on chr21
A 510 Kb region on chr21 exhibited the third strongest association with CKD and includes four Bayesian candidate markers, with the highest effect of 0.0016 (BICF2S23054624; Fig 4). This region harbors only one gene, GALNT18, which is expressed ubiquitously in various human tissues including kidney [42]. A longitudinal analysis in lupus patients showed that the demethylation of GALNT18 was associated with the development of active lupus nephritis [43]. Examination of imputed variants in the region revealed two putative regulatory SNPs, C12 and C13 (Fig 4). C12 is present at a constrained position (phyloP = 2.65) in an enhancer element (EH38E1520515; cCRE), 194 Kb downstream of GALNT18. C13 (phyloP = 5.05) is located within an intron of GALNT18, and overlaps with an enhancer element (EH38E1520796; cCRE), which showed HS signals in 17 cell lines. EMSA analysis revealed allele-specific binding of C13 in both HEK293 and MDCK alleles. The risk allele T is rare in wolves (F = 0.009), but had a 16 times higher frequency in purebred dogs (F = 0.14).

Other CKD regions
Eighteen other CKD regions were identified on 14 autosomes, with the top SNP effects ranging from 0.00046 to 0.0015. By screening these CKD regions, we identified and functionally tested seven putative regulatory SNPs.

Discussions
In this study, we used a Bayesian approach and identified 21 genetic regions associated with CKD in boxer dogs. These loci together explain 57% of the phenotypic variation. Meanwhile, we screened variants in LD with the top 50 BayesR markers, and identified 17 putative regulatory SNPs in evolutionary constrained positions. EMSAs confirmed that four SNPs, from the introns of MAGI2 and GALNT18, and an intergenic region of CKD on chr28, exhibited allelespecific binding in kidney cell lines, implying their potential function on CKD in boxers.
The process of domestication may have led to an increase in number and frequency of deleterious genetic variants, due to the artificial selection and population drift [45]. In this study, five CKD regions were identified around domestication selection signals. Additionally, dramatic changes in allele frequency between wolves and purebred dogs were found in two putative regulatory SNPs (C8 and C12). This suggests that selection or drift may be partially responsible for an increased prevalence of CKD in some dog breeds, and may also provide an insight into the origin of the genetic basis of disease.
Estimated LD extends~50-fold greater distances within dog breeds than in humans [46]. In this study, the large region of 5.2 Mb on chr18 showed the strongest association with CKD. Although CKD candidate genes, like RELN and MAGI2, were easily recognized for their relevance to kidney function, it remains unclear if other genes from the locus or the interaction between them participate in CKD pathogenesis. To address the importance of these genes, it would be helpful to examine the CKD region in different breeds. Within the same TAD as the CKD region on chr28, the FGFR2 is an interesting gene due to its function in the development of early embryos [47]. Two major splice variants of FGFR2, FGFR2IIIb, and FGFR2IIIc, were predominantly expressed in distinct tissues with differential ligand affinity [48]. Fgfr2IIIb null mice showed reduction in both kidney size and number of presumptive nephrons [49]. In our study, screening of human-based annotated elements did not reveal any potential regulatory variant in this gene. Thus, investigation of the dog-specific regulatory elements in the region may be helpful.
In addition to the top CKD regions, genes from other CKD regions may also be essential to kidney function. For example, near the CKD region on chr35 (chr35:10.8-11.4 Mb), transcription factor AP-2 alpha (TFAP2A) acts as a gatekeeper of differentiation during kidney development, by activating the terminal differentiation program of distal segments in the pronephros [50]. DNA (cytosine-5)-methyltransferase 3A (DNMT3A) from the CKD region on chr17 (chr17:18.5-19.8 Mb) is responsible for the methylation of gene regulatory regions that act as enhancers during kidney development [51]. Located in the CKD region on chr5 (chr5:65.9-66.0 Mb), Junctophilin 3 (JPH3) was identified in an association study of human CKD in 4,829 Japanese individuals [52]. YTH N6-Methyladenosine RNA Binding Protein 1 (YTHDF1) near the CKD region on chr24 (chr24:47.2-47.7 Mb) was highly expressed in the human fibrotic kidneys as a key contributor for renal fibrosis [53]. Folliculin interacting proteins-1 (FNIP1) was located in the CKD region on chr11 (chr11:19.3-20.2Mb). Disruption of FNIP1 resulted in the enlarged kidney size and significantly increased renal cyst formation [54].
At the 21 identified CKD loci, the load of risk alleles was significantly different between cases and controls ( Fig 1C). Meanwhile, we investigated the risk alleles load of 21 CKD loci in 75 other breeds (S5 Table). It showed some breeds with high prevalence of CKD have a highrisk allele load, e.g., Cavalier King Charles Spaniel (22 alleles) [55], Collie (22 alleles), flat coated retriever (22 alleles) and Shih tzu (20 alleles) [5]. But other high-prevalence breeds were observed with a relatively low load, e.g. miniature schnauzer (18 alleles), boxer (17 alleles) [5] and Labrador retriever (17 alleles) [56]. This finding may indicate genetic heterogeneity, and that different sets of risk loci may play roles in CKD in other breeds. Therefore, the association of 21 risk loci and CKD in other breeds needs to be verified. To fully understand the genetics of CKD in canine, accurate diagnosis and sample collection from a wide range of breeds is required. This will aid cross-breed investigation, but will also increase the power to detect the loci with low effect through meta-analysis, and allow for fine mapping of shared CKD regions across breeds [46]. This analysis also has a potential value in risk estimation. In humans, risk scores based on the high number of identified CKD loci, such as polygenic risk scores (PRS) [57] and genomic risk score (GRS) [58], have been established for early identification of individuals at risk. A similar score system could be developed for canine CKD. We tentatively used the SNP effect from the BayesR analysis to estimate polygenic risk scores on 107 additional dogs, which were excluded from our original material due to the quality control. For this additional cohort predictions of polygenic risk scores yielded an odds ratio of 13.5 for being cases versus controls using risk scores below or above 0 as the decision threshold, where 0 was the average risk score. This finding will open important possibilities for early intervention and preventive measures for young dogs, and additionally provides a unique opportunity for selection of the breeding dogs at the lowest risk to reduce disease incidence. In conclusion, studies of canine CKD may help us understand the pathology of kidney disease in both dogs and human patients, and show an important potential of early identification of patients in predictive medicine.

Ethics statement
All examination and sample collection of involved dogs were performed as part of the necessary diagnostic work-out by certified veterinarians according to ethical guidelines of the Norwegian University of Life Sciences or Swedish University of Agricultural Sciences. Sample collection in Finland was ethically approved by the Animal Ethics Committee of State Provincial Office of Southern Finland (ESAVI/343/04.10.07/2016). Written or verbal consents were obtained from the owners.

CKD diagnosis
There is significant age variation in boxers affected by CKD. Still, because of the high prevalence of CKD in younger dogs, only cases below 6 years of age (average 2.6 years) were included in this study. The support for the diagnosis varied between samples and countries for both cases and controls and was based on a combination of available information and age. The diagnostic support was based on either i) clinical characteristics only. Evaluation of samples includes general clinical evaluation, with urine analysis, clinical chemistry with elevated creatinine, urea and/or SMDA ++; ii) clinics with clinical chemistry. Same as above, and with additional clinic chemistry test of serum sample at Norwegian University of Life Science, or iii) morphology evaluation with clinical data or clinical pathology data (S11 Table). Most samples with kidney tissues available were evaluated pathomorphologically in Norway or Sweden. Controls were collected from healthy elderly dogs (>8 years; average 9,8 years), with no known history of renal failure or urinary tract infections, and often supported with serum biochemistry analyses within reference intervals. For some control dogs, we were able to confirm a healthy kidney by morphology in older dogs euthanized for other reasons than kidney disease.

Macroscopic renal lesions
The typical cases showed bilaterally small, firm, and pale mottled kidneys with irregular surfaces. The cortical surfaces revealed coarse nodular irregularities with numerous segmental fibrotic depressions surrounded by nodular hypertrophic cortical tissue. The renal capsules were frequently adherent to the renal cortical surfaces. On the cut surfaces, the cortices were irregularly thinned with multifocal to coalescing pale radial scars causing depressions of the cortical surfaces (Fig 5A and 5B).

Histological renal lesions
The fibrotic segments of the renal cortex were characterized by variable and often multifocal infiltration of mononuclear inflammatory cells (lymphocytes and plasma cells) and tubular atrophy and paucity of glomeruli combined with enlargement of the interstitial connective tissue (Fig 5C). Tubular microcysts and glomerulocystic atrophy were observed within the fibrotic segments (Fig 5D). Medullary fibrosis and variable multifocal lymphoplasmacytic infiltration were common findings. Both in the juxtamedullary cortex and in the medulla epithelial hyperplasia of collecting ducts was frequently giving the impression of "atypical tubules" also called "adenomatoid tubular change" (Fig 5E). Glomeruli in the surrounding hypertrophic cortex often revealed varying segmental or global sclerosis. Fibrotic thickening around and along the Bowman capsules was a frequent finding.

DNA extraction and genotyping
Blood samples were collected for 362 boxers from Australia, Denmark, Finland, Germany, Norway, Sweden, UK and US (S1 and S12 Tables). DNA mini kit (QIAGEN, Germany) was

Quality control
Of the 362 genotyped samples, we excluded one duplicate sample, and dogs with missing phenotype record (one dog), failed to meet the diagnosis criteria (14 dogs), or age criteria (three dogs). Six samples were identified and removed as outliers in inbreeding analysis with plink (v 1.90b4.9) [59], and 22 dogs were filtered out due to an overall call rate < 90%. The relatedness was tested by the plink2 with KING [60] kingship cutoff of 0.15, and 61 dogs were removed at this step. UU_Cfam_GSD_1.0 (CanFam4; NCBI assembly: GCA_011100685.1) was used reference assembly in this study [61]. Markers with genotyping rate < 95% (55,597 markers) and minor allele frequency < 1% (47,668 markers) were excluded. A final dataset consisting of 254 boxers with 101,664 autosomal markers were used for the analysis.

Population structure
We performed principal components analysis (PCA) with autosomal markers to evaluate population structure with plink (v 1.90b4.9) [59]. The first two components explained 15.5% and 6.5% genetic variation respectively. The PCA plot illustrated that boxers with different country of origin were mixed in one cluster (S4 Fig). Notably, the majority of boxers from the US (25 of 27) were slightly distanced from the others. However, since the controls and cases were evenly distributed across the population with no obvious stratification, all boxers were considered as one single group. We conducted a variance component analysis using WOMBAT (v 17/04/2014) [60], which showed a genomic relationship matrix (GRM)-based heritability of CKD was 0.61 ± 0.09 in this population.

Bayesian association analysis
We used the BayesR (v 01/04/2021) algorithm [26] to perform a genome-wide association analysis of CKD in the boxer population. BayesR assumes that the SNP effects are a priori derived from a mixture of four normal distributions: N(0, 0), N(0, 0.0001 s 2 g ), N(0, 0.001 s 2 g ) or N(0, 0.01 s 2 g ). SNP effects from the four distributions were estimated using the Markov Chain Monte Carlo (MCMC) sampling. BayesR was run with the first principal component as the covariate for a total of 300,000 iterations with a burn-in step of 100,000. The model convergence was assessed from ten BayesR repeats runs, and top 50 markers with the highest absolute effect size were considered as the candidates, according to a previous study [62]. The frequency of these top 50 markers were investigated in other 75 breeds (sample size > = 10) from a previous study [27]. LD block analysis of candidate BayesR markers was performed and visualized with Haploview (v 4.2) [63].

Genome sequencing and variant calling
To discover common variants from the candidate regions, we generated WGS data for 20 Norwegian boxers (12 cases and eight controls, S4 Table). Illumina short reads libraries preparation and sequencing were performed by the Norwegian Sequencing Centre at University of Oslo in Norway. The paired-end reads were mapped to dog UU_Cfam_GSD_1.0 reference [61] using BWA-mem2 (v 2.1) [64]. The alignment was sorted and indexed by SAMtools (v 1.14) [65]. Duplicate reads were detected and marked by the MarkDuplicates module of Picard Toolkit (v 2.27.5; Broad Institute).

Genotype imputation and validation
The WGS genotypes from 20 boxers were phased using SHAPEIT2 (v 2.r904) [67], to generate a reference pool of haplotypes. SNP chip data were pre-phased with SHAPEIT2. Genotype imputation was performed using IMPUTE2 (v 2.3.2) [68] by comparing the chip data haplotype with the reference pool. Imputed genotypes with probability > 0.9 were kept, otherwise they were set as missing. Specifically, a~500 Kb region on chr18 was masked during the imputation, which is known as a genomic duplication caused by the orthologous segments on chr9 [61]. The imputed variants were filtered by requiring a minor allele frequency > 0.01, and call rate > 0.95.
We further cross validated the imputation with seven boxers from previous studies [69][70][71]. The WGS data of seven samples were downloaded and mapped to the canfam4 with coverages ranging from 17-28x (S13 Table). For these boxers, the genotypes were called and filtered (GQ > 30) at 7,264,546 variants (5,601,532 SNPs and 1,663,014 Indels) that were identified in the reference panel of 20 Norwegian boxers (HaplotypeCaller in GATK). The imputation was performed based on a subset of genotypes at 104,839 markers from the illumina chip. Afterward, we validated the imputed genotypes by comparing to the genotypes called from WGS data, with-sample-diff function in plink2 (v alpha-2.3) [59], which revealed an average of imputation rate (successfully imputed; genotype probability > 0.9 in IMPUTE2) of 95%, and imputation accuracy of 96% (correctly imputed, S13 Table).

CKD candidate regions
Imputed variants with high LD (r 2 >0.9) to any of the top 50 BayesR markers were detected with plink (-r2 -ld-window -r2 0.9 -ld-window-kb 1000 -ld-window 5000). LD blocks within 500 Kb were merged, and 21 candidate CKD regions were defined by variant position at the ends of LD block. The pairwise interaction of 21 BayesR markers were screened using "-fastepistasis" function with BOOST method [72] implemented in plink, and no significant interaction was found. BayesR markers with the highest effect size were selected from each of 21 CKD regions. To assess the phenotypic variation explained by the identified CKD regions, we determined the association of phenotype and 21 BayesR markers with analysis of variance (ANOVA) in R (v 4.0.1; aov function) with model "phenotype~SNP1+SNP2+. . .+SNP21". The means of risk alleles load between control and case groups were counted and compared with t-test in R (ttest function). Additionally, we counted the risk allele load individually, then calculated the average load for the 75 breeds described in the Bayesian association analysis.

Gene annotation and gene ontology analysis
Protein-coding genes within the CKD regions and the closest gene on each side were identified based on UU_Cfam_GSD_1.0 annotation from NCBI (GCF_011100685.1). Genes were assigned to the human orthologues for the gene ontology (GO) analysis with Metascape [73].

Overlap of candidate variants with genomic features
To investigate if dog domestication contributes to CKD, we compared CKD regions with selection signals from four studies [34,35,74,75]. Imputed variants in LD with BayesR markers were evaluated by comparing different annotated genomic features: 1) variant effect was annotated using SNPeff (v 4.3t) [76]; 2) location to topologically associating domains (TAD) in CKD regions were extracted from a previous study in dog liver tissue [77]; 3) risk allele frequency of imputed variants was extracted from Dog10K dataset (1,591 purebred dogs, 281 village dogs and 57 wolves; https://kiddlabshare.med.umich.edu/dog10K/) [69]; 4) The phyloP scores from 241 mammals were extracted from the Zoonomia project [78]. To obtain an extended functional annotation, imputed SNPs were lifted to the human hg38 genome using LiftOver (https://genome.ucsc.edu/cgi-bin/hgLiftOver); 5) the lifted SNPs were intersected with the human regulatory elements from cCREs and HS databases from ENCODE [29,79], and with the promoters and enhancers from GeneHancer [30].

Electrophoretic mobility shift assay (EMSA)
Putative regulatory SNPs were selected and tested with EMSA in Madin-Darby Canine Kidney (MDCK) and human embryonic kidney (HEK293) cell lines. All cell lines were cultured in DMEM (Gibco), supplemented with 10% heat inactivated foetal bovine serum (Gibco), 1% penicillin, streptomycin and glutamine (Gibco) and maintained at 37˚C (5% CO 2 ). For EMSAs, nuclear extracts from each cell line were prepared according to the manufacturer's specification (NucBuster Protein Extraction Kit, Merck) and assayed using the appropriate oligo set (S14 Table) and with the Lightshift Electrophoretic Mobility-Shift Assay kit (Thermo Fisher Scientific). The following alterations were made to the manufacturer's protocol: 12-15 μg of appropriate nuclear extract was pre-incubated on ice for 40 minutes in binding buffer (binding buffer supplemented with: 7.5% Glycerol, 0.063% NP-40, 30.1 mM KCl, 2 mM MgCl2, 0.1 mM EDTA, 50 ng/ul Poly (dI�dC)). Biotin labelled ds-oligonucleotides were added at 200 fmol and competed where appropriate with matched 20 pmol unlabelled ds-oligonucleotides. Reactions were incubated on ice for 40 minutes prior resolution on a 5% polyacrylamide gel (BioRad) run in 0.5 × TBE at 100 V for one and half hours. For four variants which showed allele-specific binding in the EMSA, a technical replicate was performed to validate the result.
Supporting information S1