Novel protective and risk loci in hip dysplasia in German Shepherds

Canine hip dysplasia is a common, non-congenital, complex and hereditary disorder. It can inflict severe pain via secondary osteoarthritis and lead to euthanasia. An analogous disorder exists in humans. The genetic background of hip dysplasia in both species has remained ambiguous despite rigorous studies. We aimed to investigate the genetic causes of this disorder in one of the high-risk breeds, the German Shepherd. We performed genetic analyses with carefully phenotyped case-control cohorts comprising 525 German Shepherds. In our genome-wide association studies we identified four suggestive loci on chromosomes 1 and 9. Targeted resequencing of the two loci on chromosome 9 from 24 affected and 24 control German Shepherds revealed deletions of variable sizes in a putative enhancer element of the NOG gene. NOG encodes for noggin, a well-described bone morphogenetic protein inhibitor affecting multiple developmental processes, including joint development. The deletion was associated with the healthy controls and mildly dysplastic dogs suggesting a protective role against canine hip dysplasia. Two enhancer variants displayed a decreased activity in a dual luciferase reporter assay. Our study identifies novel loci and candidate genes for canine hip dysplasia, with potential regulatory variants in the NOG gene. Further research is warranted to elucidate how the identified variants affect the expression of noggin in canine hips, and what the potential effects of the other identified loci are.


Introduction
Many hereditary disorders appear in both humans and dogs with gene variants from common ancestral genes affecting the disease [1].Canine hip dysplasia (CHD) is a non-congenital disease, causing skeletal abnormalities in growing dogs, the first signs appearing at the age of three to four months [2].CHD is defined as the laxity of the joint, resulting in instability and subluxation [2,3].The femoral head is not completely covered by the acetabulum, which then leads to increased force over smaller surface area due to incongruence of the coxofemoral joint [2,3].This in turn causes microfractures in the acetabulum and the femoral head [2,3], detrition of the articular cartilage, inflammation of the synovial membrane and secondary osteoarthritis (OA) [3].CHD can be painful up to a point where it poses a serious welfare problem.Hip dysplasia appears in humans in divergent forms from infancy to adolescence and to adulthood [4].Of these, the adolescent hip dysplasia is clinically and developmentally closest to CHD [5].
Official scoring of CHD varies by country: in Finland it is scored categorically from A (normal) to E (severely affected hip joints with possible OA) using ventrodorsal extension radiographs as standardized by the Finnish Kennel Club under the Fe ´de ´ration Cynologique Internationale (FCI) [6].The scoring is based on the following features: level of congruence between the femoral head and the acetabulum, degree of subluxation of the joint, Norberg angle, shape of the femoral head and neck, shape and depth of the acetabulum, and signs of secondary OA [6,7].Other traits also add into hip morphology, but they are not routinely evaluated [8].In the end, only the categorical score for each hip joint is recorded and made available for later use.
The severity of CHD depends on environmental and genetic factors [2,3,[9][10][11][12][13][14][15][16][17][18][19][20][21][22][23].CHD prevalence varies also by breeds and breed groups [5], but is common for example in German Shepherds with a reported 37% prevalence in Finland between 2000-2017 (6413/16433) [24].The heritability (h 2 ) estimates for the hip score are generally moderate across breeds (0.20-0.38) [12][13][14], although in German Shepherds the h 2 estimates have varied from 0.1 to 0.6 as summarized in [18].In a Finnish German Shepherd population the estimates were 0.31-0.35[25].In earlier studies the h 2 estimates for the traits determining the hip score (e.g.Norberg angle, secondary OA, articular congruence and dorsolateral subluxation) have varied considerably from low h 2 = 0.10 to high h 2 = 0.73 [10,11,14,15,19].Some evidence of major genes affecting CHD exist from studies utilizing variance estimates and Bayesian modeling [16,18,26,27].Also, a recent study investigating quantivative trait locus (QTL) associations with CHD revealed a large effect locus on chromosome 28, which had a 6˚additive effect on Norberg angle values in Golden Retrievers and Labrador Retrievers [28].Fels and Distl suggested QTLs on chromosomes 19, 24, 26 and 34, which associated with CHD in German Shepherds [29].In addition, a number of other small effect loci and potential candidate genes have been found [15,21,23].The reported QTLs and candidate genes are inconsistent between studies, however, as both Sa ´nchez-Molano et al. [15] and Zhu et al. [22] have discussed.Ultimately, different study populations and methods affect the results substantially, which must be recognized when reviewing the data.FBN2 encoding for fibrillin 2 is to our knowledge the only gene in which a mutation has been demonstrated to be associated with CHD using gene expression analysis [30].However, Lavrijsen et al. [23] found no evidence of association between the region harboring FBN2 and CHD, but they suggested this discrepancy may be caused by differences in Dutch and U.S. Labrador Retriever populations.
We have carried out genome-wide association studies (GWAS) in case-control cohorts, revealing a total of four associated loci on two chromosomes.Subsequent sequencing of the underlying region on one chromosome identified putative regulatory variants of NOG, which downregulated a reporter gene expression in vitro, and were associated with the healthy and the mildly dysplastic phenotypes.

Novel CHD loci on chromosomes 1 and 9
To map CHD loci, we originally performed GWAS using the Illumina 173K chip with 160 controls (with A/A hip scores, left/right hip) and 132 cases (D/D, D/E, E/D or E/E), which revealed a suggestive association on canine chromosome 9 (Fig 1).Subsequently, we genotyped 233 more individuals and analyzed the data from all 525 dogs.However, one control was dropped from the subsequent meta-analyses due to a missing genotyping batch covariate.Therefore, the first meta-analysis cohort comprised 277 controls and 132 cases (with D/D, D/E, E/D or E/ E hip scores) (Fig 2 ), and the second, less stringent meta-analysis included the same controls and 247 cases (with C/C, C/D, D/C, D/D, D/E, E/D or E/E hip scores; dogs with C/C are later on referred as mild cases) (Fig 3).None of the top SNPs from the different analyses reached genome-wide significance.The data were analyzed using two different statistical methods within the R package "GenABEL": FASTA and QTSCORE.In the original association analysis and the first meta-analysis QTSCORE was used with environmental residual to be comparable with FASTA.In the second meta-analysis QTSCORE was used with standard genomic control [31].We decided to use FASTA because it is effective in handling highly stratified data of related individuals, while it usually is not as conservative as QTSCORE.However, for the second meta-analysis we used only QTSCORE with standard genomic control, because FASTA ended up losing more power (S1 Fig) .QTSCORE was used to obtain the permuted P-values for genome-wide significance in all analyses, because FASTA cannot be used for this task.Inflation factor lambda, describing possible inflation of test statistics due to population stratification, was estimated as 1.02 for FASTA, deflation of 0.76 for QTSCORE for the original analysis and 1.01 and 0.72 for the first meta-analysis respectively.For the second meta-analysis lambda was 1.43.When lambda was < 1, the deflation was corrected with reverse genomic control [32].The P-values after inflation and deflation factor corrections, after permutation tests with QTSCORE, and genotypic and allelic odds ratios (OR) with the respective 95% confidence intervals (CI) for all of the top SNPs are shown in Table 1 (original GWAS), Table 2 (first meta-analysis), and Table 3 (second meta-analysis).The association on chromosome 9 in the original genome-wide association analysis was 14 times stronger, and in the first metaanalysis over 45 times stronger than in any other loci in the genome (Fig 1 and Fig 2).In the second meta-analysis the association on chromosome 1 was over seven times stronger than what was observed on chromosome 9, and over 14 times stronger than for any other loci in the genome (Fig 3).
The SNPs on chromosome 9 represent two separate loci; there was moderate to high linkage disequilibrium (LD) between the markers within each locus (r 2 = 0.64-0.99),but limited LD between the loci (r 2 = 0.34-0.56)(S1 Table ).Clumping analysis, a tool to estimate the number of independently associated loci, corroborated this by revealing two loci within the targeted region on chromosome 9 (S2 Fig) .OR for the SNPs within the first locus (Chr9: 30993502-32382532) were all less than one (Table 1), and for the markers within the second locus (Chr9: 36543581-36579921) OR varied between 2.25 and 4.90 (Table 1).
The results from the first meta-analysis were similar to the original GWAS (Table 2).We observed a 2.6x difference in the smallest P-values, indicating a small gain in power, when we compared the SNPs with the strongest association in both analyses: 7.8x10-6 in the original GWAS (BICF2P742007) and 3.0x10-6 in the first meta-analysis (BICF2G630837405).The LDstructure of the top SNPs from this analysis (S1 Table) also resembled the respective values seen in the original GWAS, indicating two independent loci.The odds ratios implied a protective role for the top SNPs within the first locus as in the original GWAS.In the second locus, BICF2G630837240 locating at Chr9:36579921 had OR of 2.00-4.78(Table 2), which again was comparable to the values observed in the original GWAS.The other SNPs in this second locus, locating within 36837067-36886621, were only observed in this meta-analysis, although they exhibited the strongest association to the disorder (Table 2).These three SNPs were in moderate LD (0.68-0.71) with BICF2G630837240, but had odds ratios < 1 (Table 2).
As in the original GWAS and the first meta-analysis with the stringent phenotype definition, some of the top SNPs demonstrated notable LD between each other (r2 = 0.80-1.00;S1 Table ) in the second meta-analysis using the relaxed phenotype.The clumping procedure indicated two loci on chromosome 1 and two loci on chromosome 9 (S3 Fig) .The first associated locus on chromosome 1 spans over ~1.1 Mb region, and all of the ORs imply a protective association (OR = 0.26-0.62;Table 3).The second locus is ~367 kb long and ORs for all the three SNPs are also < 1 (Table 3).The two loci on chromosome 9 corresponded to the same regions that have been defined before.However, the association of the second locus on chromosome 9 reached a P-value of 1.1x10-4 at best (BICF2G630837209) and was therefore not included in Table 3.
Table 4 summarizes the four loci and their representative SNPs that displayed the strongest association with the disorder over the analyses.All of the top SNPs in the ~1.1 Mb locus on chromosome 1 were in high LD with each other (S1 Table ).One of the associated SNPs is intronic to NADPH Oxidase 3 (NOX3) (BICF2S23248027), and the rest of the SNPs lie in an  4).The SNPs closest to ARID1B were BICF2P357728 and BICF2P1037296, located 91234 bp and 101945 bp upstream.The second locus on chromosome 1 included many genes (Table 4).The corresponding SNPs were also in high LD with each other (r 2 = 0.84-0.96,S1 Table ) and were located either within MAM Domain Containing 2 (MAMDC2) (BICF2S23329752 and BICF2S23660342) or within Protein Prenyltransferase Alpha Subunit Repeat Containing 1 (PTAR1) (BICF2P1129598).
The top SNPs on chromosome 9 span over a region of ~5.3 Mb.BICF2S23027935 locates to an intron of Ankyrin-Repeat and Fibronectin Type III Domain Containing 1 (ANKFN1) and 153764 bp upstream from NOG, a known bone morphogenetic protein (BMP) inhibitor (Table 4).BICF2P742007 is intergenic and lies close to NOG (66839 bp upstream).The SNP representing the second locus on chromosome 9 (BICF2G630837240) is situated between the genes for Mitochondrial rRNA Methyltransferase 1 (MRM1) and LIM Homeobox 1 (LHX1) (Table 4).
Mild cases resemble controls for the chromosome 9 loci but are more similar to moderate-to-severe cases for the locus on chromosome 1 We compared the genotype frequencies of the top markers (Table 5) between the different phenotype groups to assess if any of the loci behave differently in these comparisons.Here we did the following comparisons: controls (hip scores A/A) to mild cases (hip scores C/C) and mild cases to moderate-to-severe cases (hip scores C/D, D/C or worse), because other comparisons were covered in the in the meta-analyses described above.The allele and genotype frequencies of healthy dogs and mildly dysplastic dogs did not differ significantly on either chromosome (Table 5).Interestingly, the allele and genotype frequencies between mild cases and moderate-to-severe cases were significantly different on chromosome 9, but not on chromosome 1(Table 5).Odds ratios for all the significant comparisons indicated a protective association for the locus near NOG on chromosome 9 (Table 5: BICF2S23027935 and BICF2P742007).The second locus on chromosome 9, near LHX1, increased the odds for hip dysplasia about 2-to 5-fold in all the significant comparisons (Table 5: BICF2G630837240).

Top markers on the first locus on chromosome 9 are in linkage disequilibrium and are differentially associated between mildly and moderately-to-severely dysplastic hips
The three most common genotypes for the locus with the strongest association on chromosome 9 in the original GWAS (BICF2S23027935 and BICF2P742007, Table 1) were the homozygous GA (both SNPs represent the non-risk allele) and AG (both SNPs represent the risk allele), and the heterozygous RR (IUPAC coding) (Table 1: BICF2S23027935 and  � The strand of SNP BICF2P742007 was flipped in the pre-meta-analysis data merge and therefore the alleles are complementary to those in Table 1.All the SNPs in the table showed association with the disorder after both statistical methods (FASTA and QTSCORE) at the P-value threshold of 1.0x10-4.

Identification of additional variants on chromosome 9
To identify additional CHD-associated variants, we resequenced a 7 Mb genomic target (corresponding to bases 30620001-37620000 on chromosome 9) in 24 control and 24 affected dogs representing the most common homozygous SNP genotype combinations (SNPs BICF 2S23027935, BICF2P742007, BICF2G630834826, BICF2G630837209, BICF2G630837240, BICF2G630837405 and BICF2P272135; see Tables 1 and 2 and methods).We used a custom pipeline to systematically screen the target area in comparison with the CanFam3.1 annotation.Altogether we found 30197 unique variants in 21140 positions (S2 Table ) and classified them based on the associated gene, the predicted functional effect of the variant and the phenotype of the individual.The study design, however, does not permit the direct assessment of the association between the phenotype and the genotype as the case and control animals were selected based on both the phenotype and an opposing homozygous combined genotype of seven top SNPs on chromosome 9 (see Tables 1 and 2 and the methods).We therefore screened for variants that segregated completely or nearly completely with either group of dogs.The difference in counts of each variant between the two groups were determined (S2 Table ).61 variants remained after excluding those displaying an absolute difference of 21 or smaller, those with an intergenic or intronic location, and those leading to a synonymous mutation (S3 Table ).An upstream variant was found in the immediate vicinity of SMG8 encoding for a nonsense mediated mRNA decay factor.However, there is a gap in the dog/ human alignment at this position.A DNase I hypersensitivity site and an H3K27Ac signal reside in the human genomic regions homologous to those harboring upstream variants of benzodiazepine receptor (peripheral) associated protein 1 (BZRAP1) and ring finger protein 43 (RNF43).A strong H3K27Ac signal was also seen in an area on the human chromosome 17 corresponding to a variant upstream of RAD51 paralog C (RAD51C).Several potential splicing mutations were seen in RNF43 and testis expressed gene 14 (TEX14).Two variants downstream of ANKFN1 were in close proximity (within 10 kb) to a SNP with a statistically significant association with the phenotype (S3 Table ).See S4 ods for the calculation of the association between the 217 target area SNPs and the phenotype, N = 426.We also assessed how the variants targeted specific genes so that the target genes segregated with the risk or non-risk SNP genotypes leading to various functional effects (Table 7).
A missense variant rs852180586 in Apoptosis antagonizing transcription factor (AATF) is 26.5 kb away of BICF2G630837405 (Tables 2 and 7 and S3 Table ) but is predicted to be tolerated.The two variants downstream ANKFN1 are close to BICF2G630834765 (Table 7, S3 Table ) but there is no evidence for a functional effect.The potentially deleterious missense variant rs24532262 in the myeloperoxidase (MPO) gene was connected with the cases.The mutation does not target the mature protein, however.A missense variant in PCTP corresponds to a location near the carboxy-terminus of the phosphatidylcholine transfer protein that is The threshold for significance to correct for multiple testing was determined as 0.05/8 = 6.25x10-3,where the denominator comes from four independent loci multiplied with two phenotypic tests.The significant P-values and corresponding odds ratios are in bold font. https://doi.org/10.1371/journal.pgen.1008197.t005

NOG variant associates with normal or mildly affected hips but not with moderate-to-severe hip dysplasia
The presence of the deletion variant upstream of NOG (Fig 4) was directly assessed by PCR in the whole population of dogs in this study.The fragment sizes were analyzed by gel electrophoresis for all the samples.PCR failed to give a product in nine samples and the product was ambiguous in one sample.The deletion genotype counts and frequencies for each phenotype category for the remaining 516 dogs are shown in Table 8.

NOG variant correlates with and improves the predictive power of SNP genotypes
The deletion genotype correlated with the genotypes of the SNPs BICF2P742007 and BICF2S23027935 in all three phenotype categories.Spearman's rank correlation coefficient rho [with 95% confidence intervals] was 0.64 [0.56-0.71]for controls, 0.69 [0.56-0.79]for mild cases and 0.67 [0.58-0.75]for moderate-to-severe cases (S6 Table ).The significance of the protective effects of the NOG deletion and GA SNP genotype in various subsets of dogs was next investigated using logistic regression.The odds ratios for the corresponding generalized linear model (GLM) coefficient estimates are presented in Table 9.In contrast to the effect of the GA SNP genotype, the protective effect of the NOG deletion was most significant between the mild and moderate-to-severe cases (Table 9).
To assess the effect of the deletion, we compared the full (with the NOG deletion) and the reduced (without the NOG deletion) GLM models using chi-squared test.There was a statistically significant difference between the full and reduced models on controls and moderateto-severe cases (P < 0.05) and on mild and moderate-to-severe cases (P < 0.001, Table 10).Finally, the receiver operating characteristic curve was used to assess the discrimination potential between the full and reduced models.We argue, based on the results from this comparison, that the full rather than reduced model better discriminates the controls and moderate-to-severe cases (P < 0.01), as well as the mild and moderate-to-severe cases (P < 0.001, Table 10).

The deletions upstream of NOG downregulate reporter gene expression in vitro
We investigated the effects of the deletions on the expression of a luciferase reporter gene in vitro.We designed three constructs (S7 Table ), where the longest construct A with 14 AGGtriplet repeats corresponds to resequencing data from Dog2 (S7 Fig and Fig 4).Construct B had a deletion of eight AGG-triplets, and construct C had a deletion of seven AGG-triplets when compared to construct A (Fig 4).The sequences corresponding to the constructs A and C were common in the cohort, whereas we recovered the sequence corresponding to variant B in only one individual.The constructs were cloned to a plasmid containing a luciferase reporter under the control of a minimal promoter.We used two experimental setups with HEK293 human embryonic kidney and U-2 OS human osteosarcoma cell lines: HEK293 cells with 50 ng and U-2 OS cells with 10 ng DNA.
The results are expressed as mean ± SD of four technical replicates from three independent experiments for each cell line and treatment.The firefly luminescence control was used to normalize the NanoLuc luminescence values.In the first experimental setup with HEK293 cells, construct A had significantly higher luminescence compared to both B and C constructs (Fig 5).Again, in the second setup, with U-2 OS cells, the A construct demonstrated significantly higher luminescence than construct C. All comparisons between the control plasmid and construct luminescence levels were significant.

Genome sequence for the canine NOG locus
The current canine reference genome (CanFam3.1)shows a gap within NOG (Fig 4).We closed the gap by PCR and sequencing (S8 Fig) .The sequence overlapped the 5' NOG sequence from Beagle [35] and corresponded with the variant with three copies of hexanucleotide insertion (GenBank accession AB544074.1).The closure of the gap in the reference genome permits the accurate positioning of the upstream deletion locus in relation with the coding sequence.The sequence corresponding to the 434 bp long gap in the reference is very similar to the corresponding human sequence (S9 Fig) .Alignment introduced six gaps (34, 25, 3, 2, 1 and 1 bp).The nucleotide-level identity was a remarkable 76% (330/434) suggesting conserved function.The NOG promoter (ENSR00000096009) overlaps with the corresponding region in human and spans from 17:56592202 to 17:56594999 with the core promoter at 56592600-56594601.Scanning the human core collection at the JASPAR2018 database [36] with the alignment of dog and human sequences at S9 Fig, uncovered 135 matrix IDs and altogether 1368 putative binding sites for them (S8 Table ).Bonferroni-adjusted P-values were calculated for all sites.Matrix IDs with adjusted P-value less than 0.05 for any site are shown in S9 Table .The corresponding transcription factors are histone 4 transcription factor (HINFP), two E2F-related factors and three AP-2 family members.with a TF binding site (orange box) and TFs that can bind to this site as predicted by the ENCODE ChIP-seq experiments.Green boxes are H3K4Me1 histone mark (commonly associated with enhancers) peaks, and the blue box is a H3K4Me3 histone mark (commonly associated with active promoters) peak from the ENCODE data. https://doi.org/10.1371/journal.pgen.1008197.g004

Discussion
Canine and human hip dysplasia represent one of the most complex and prevalent problems in veterinary and medical sciences.Our GWAS uncovered four novel protective and risk loci on chromosomes 1 and 9.The loci on chromosome 9 differentiated the mild from the moderate-to-severe phenotypes.Alleles upstream of NOG displayed differential enhancer activity in vitro.Three additional candidate genes on chromosomes 1 and 9 were revealed: NOX3, ARID1B and RNF43.
We identified putative regulatory variants of NOG that encodes for a well-known BMP inhibitor, noggin.Noggin is essential for the growth and patterning of the neural tube after neural induction [37,38], but it is also required for embryonic chondrogenesis, osteogenesis and joint formation [38][39][40].Joint formation in Nog knockout mice is defective and most joints are missing from the limbs [40].In humans, NOG missense mutations segregate with proximal symphalangism and multiple synostosis syndrome, both of which are skeletal dysplasias resulting from decreased noggin activity [39,41].Nog is also widely expressed in adult mouse joint cartilage and down-regulated in surgically induced arthritis [42].Nog haploinsufficiency protected mice from arthritis induced by methylated bovine serum albumin [43].Overexpression of murine noggin has been associated with impaired function of osteoblasts, resulting in osteopenia, fractures and decreased bone formation rate [38,44,45].
The affinity of noggin to different BMPs varies.Further, there are other BMP antagonists that can partially compensate for the lack of noggin (e.g.chordin, follistatin, gremlin and sclerostin) [37,40,[46][47][48].However, siRNA-mediated Nog knock-down led to increased BMPmediated osteoblastic differentiation and extracellular matrix mineralization without compensatory induction of gremlin or chordin expression [49].Our in vitro expression data suggests that the variant upstream of NOG has potential gene-regulatory consequences.It is possible that the regulation of noggin expression levels is suboptimal in hip joints of German Shepherds prone to develop moderate-to-severe hip dysplasia.Another study revealed a single-nucleotide  variant affecting the expression of NOG 105 bp downstream of the transcription start site, when the researchers investigated targeted sequencing data of a GWAS locus for human cleft lip, with or without cleft palate [50].
We were not able to close the 434 bp gap upstream of NOG with the targeted resequencing data.The overall coverage was variable, and parts of the target region were not covered at all.This is a general caveat of using probe-enriched genomic DNA templates for sequencing.We finally used PCR and sequencing to close the gap, which enabled the accurate positioning of the upstream deletion locus in relation with the coding sequence.The close proximity with NOG and the high degree of conservation with the corresponding sequence in human NOG promoter suggest that the uncovered new genomic sequence might be involved in the regulation of NOG expression.Together with the discovery of functionally active variant alleles upstream of NOG (Figs 4 and 5), our results suggest more research should be targeted to the characterization of canine NOG and its regulation.
The protective locus on chromosome 1 spans over a 1.1 Mb region and harbors two genes of interest: NOX3 and ARID1B.NOX3 belongs to the family of NADPH oxidases, which catalyse the formation of superoxides and other reactive oxygen species.NADPH oxidase enables the production of hydrogen peroxide (H 2 O 2 ), which is ultimately used in a reaction cascade that participate in the initiation of articular cartilage degradation [51,52].NOX3 is a nonphagocytic member of the NADPH oxidase family and it is mainly expressed in the inner ear and fetal tissues [53].Thus, the role of NOX3 molecule in hip dysplasia remains uncertain, although as shown in S10 Table, an indirect link between NOX3 and TRIO, a protein encoded by another candidate gene for German Shepherd hip dysplasia has been reported in a study by Fels et al. (2014) [54].
The AT-rich interactive domain-containing protein 1B encoded by the second candidate gene (ARID1B) on chromosome 1, functions as a transcriptional activator and repressor via chromatin remodeling [55].Mutations in ARID1B cause Coffin-Siris syndrome (CSS), which is a rare hereditary disorder affecting multiple body systems, for instance the nervous, cardiovascular, and skeletal systems [56,57].As a consequence to this syndrome, ARID1B is associated with joint laxity (66% of the patients) [56,57].However, the dogs with hip dysplasia do not exhibit similar multisystemic symptoms as the CSS patients with causative ARID1B mutations.MAMDC2 is another potential candidate gene on chromosome 1.It encodes a proteoglycan and has been associated with increased intraocular pressure [58].
Other putative candidate genes on chromosome 9 uncovered in the variant analysis (Table 7) include MPO, RNF43, RAD51C.Reactive oxygen species and MPO have been inferred to participate in the regulation of chronic inflammation [59][60][61].Therefore, it was intriguing to discover a potentially deleterious missense variant of MPO (Table 7).The mutated amino acid, however, is not included in the predicted mature protein.Intronic variants close to splice regions in RNF43 are also potentially significant.RNF43 ubiquitin ligase [62] negatively regulates WNT signaling [63].WNT signaling is implicated in osteoarthritis as reviewed in [64,65], and a recent study also suggest it might be affected in CHD [66].RAD51C is a well-known recombination factor [67].
Deciphering polygenic, multifactorial disorders requires large sample sizes.Although dogs have an unique genomic architecture [28,[68][69][70] that facilitates association studies in smaller cohorts than in humans [28], the lack of power is still a regular concern.Our GWAS was unexceptional in this respect.Even after we increased the sample size from 292 to 409 or 524 dogs, and consequently revealed two additional loci on chromosome 1, none of the associations reached genome-wide significance.We observed strong LD in our data (S1 Table ), which was expected due to the genomic architecture of dogs.Therefore, Bonferroni correction threshold could be overly conservative for our data as explained in Methods, part "Genome-wide association analysis".Also, the lack of power may be a consequence of the increasing variation among cases, when we included the mild hip dysplasia phenotypes in the second meta-analysis.We observed significant differences in the allele and genotype frequencies between mild cases (C/C) and the moderate-to-severe cases (C/D, D/C or worse) throughout the loci on chromosome 9 (Table 5), whereas mild cases did not differ from controls in these comparisons.Additionally, the fragment genotype frequencies related to NOG were similar for controls and mild cases but again differed significantly when these two groups were separately compared with the moderate-to-severe cases.These findings corroborate that the dogs with mild hip dysplasia are indeed at lower genetic risk for the disorder.It would be important to find out, if other genetic factors differentiating the dogs in these phenotype groups exist.
In conclusion, using several genetic approaches we have discovered novel variants of a putative NOG enhancer that downregulate reporter gene expression in vitro.The variants are associated with healthy and mildly dysplastic hip joints in German Shepherds.Besides a larger replication study and investigation of the other candidate genes on chromosomes 1 and 9, future research should focus on what kind of biological effects the variants have on the expression of noggin in the canine hips and on the development of hip dysplasia.

Animals
The Finnish Kennel Club (FKC) granted permission to use its data and CHD screening radiographs for our research.All radiographs have been scored by two specialized veterinarians, thus reducing inter-observer bias [7].All hip score results are freely available from the FKC breeding database [71].
Our study cohort consisted altogether 531 German Shepherds (247 cases + 284 controls), born between 1993 and 2013.Cases were dogs with an FCI score C or worse for both hips and controls were dogs with a score A for both hips.We discarded dogs with an FCI score B because their inclusion may lead in a confounded control phenotype.Five control dogs had to be excluded from the analyses due to ambiguous phenotypes.This left us with a total of 526 dogs (247 cases and 279 controls) before quality control.However, one more control had to be removed during quality control due to an outlier genotype, after which we had 525 dogs left for the GWAS.
At least one EDTA blood sample was collected from all the dogs between years 2006 and 2015.The dogs were chosen for our study according to their hip scores and pedigrees, creating a balanced study population of working, mixed and show line dogs (S10 Fig).

Ethics
Guidelines for research ethics and good scientific practices were followed.We hold an ethical license for collecting EDTA blood samples (ESAVI/7482/04.10.07/2015), from ELLA-Animal Experiment Board in Finland under The Regional State Administrative Agency for Southern Finland.The owners signed a form of consent and they were well informed of the project.

DNA preparation and genotyping
The original EDTA-blood samples are stored at the Dog DNA bank at the University of Helsinki.DNA extraction from the EDTA-blood samples was carried out using Chemagic Magnetic Separation Module I (MSMI) with a standard protocol by Chemagen (Chemagen Biopolymer-Technologie AG, Baeswieler, Germany), after which the samples were sent to Geneseek (Lincoln, NE, US) to be genotyped using the high density 173K canine SNP array from Illumina (San Diego, CA, USA).Genotyping was executed in several batches as collection of the original EDTA-samples took place over several years.Batch effect was accounted for as a covariate in our meta-analyses.

Population structure
Our German Shepherd population was divided into five (the original GWAS, S11 Fig) or four (meta-analyses, S12 Fig) subpopulation clusters according to their genomic relationships.This was achieved by first calculating the appropriate number of clusters from a genomic relationship matrix with a package "mclust" [72] in R [73], which uses covariance parametrization and selects appropriate clusters via Bayesian information criterion.A covariate vector was created according to the clustering data, so that each individual belongs to one of the clusters.This covariate was used in our model to account for any differences in disease association between the clusters.

Quality control (QC)
Original GWAS.Initial merging of the genotype sets and a genotype missingness test was performed with PLINK [74].6058 SNPs failed the missingness test with a threshold of 0.05.In total, 166309 SNPs and 293 samples were transferred from PLINK to R. We performed the final QC with the following thresholds: minor allele frequency = 0.05, per sample call rate = 0.90 and per SNP call rate = 0.95, p-value cut-off level < 0.00001 to test for deviations from Hardy-Weinberg equilibrium (HWE).The HWE-check was executed on controls as cases may show deviation from HWE in association with the disease [75].The QC resulted in the final data of 92315 autosomal SNPs and 293 samples.However, one sample was manually removed after the check.marker-function due to an outlier genotype, which left 292 samples for our association analysis.The position map for our SNPs was CanFam3.1.After the GWAS, we checked the genotype call quality of the best SNPs to verify that the associations were not due to genotype-calling errors.

Meta-analyses
The same genotype data was used for both meta-analyses, but the number of included dogs in each analysis was determined by the stringency of the phenotypes.The quality control for this data was carried out in two steps.First, before merging the original data (292 dogs) with the new genotypes (233 dogs), initial quality controls were executed separately on them with PLINK.The following thresholds were used for each data set: per sample call rate 0.90, per SNP call rate 0.95, minor allele frequency 0.05, and p-value cut-off level < 0.00001 for the HWE check.Also, strand had to be flipped for 59980 SNPs in the original data set due to strand inconsistencies with the new genotype data.This was done with the --flip command in PLINK.Second, after merging of the data sets, the data was imported to R, where the QC was repeated with GenABEL for the whole data with the same QC thresholds.This left 88499 SNPs for the meta-analyses.After the meta-analyses, we checked the genotype call quality of the best SNPs to verify that the associations were not due to genotype-calling or other such errors.One SNP on chromosome 4 (BICF2P491963) was observed to show false association in the first meta-analysis due to a batch-specific calling error.The error was not resolved by the use of batch covariates, and the SNP was therefore removed.Also, the genotyping batch of one dog was missing.This dog was therefore removed leaving 524 dogs for the meta-analyses.

Genome-wide association analysis
We performed a case-control GWAS to identify SNPs associated with canine hip dysplasia.The original association study included 160 controls and 132 cases.The GWAS was implemented in R with the package GenABEL [31].The covariates were sex and the genomic cluster of the animal.In the meta-analyses we also used the genotyping batch as a covariate.We used FASTA [76] and QTSCORE [32], in GenABEL to calculate the association test statistics.When used with a binary trait FASTA corresponds to the Cochran-Armitage trend test [77].
FASTA is an efficient tool for association analysis in family-based data sets.However, FASTA has the disadvantage of not being able to compute a genome-wide significance with permutation analysis, because the data structure of the test statistics is not exchangeable.This is due to incorporating the relationship matrix ϕ in the computation of the test statistics [76].QTSCORE does not suffer from this, as the test statistics derive from the environmental residuals that are not correlated with each other.Thus, the data structure is exchangeable and permutation analysis can be used to calculate empirical experiment-wise genome-wide significance levels for the analyzed SNPs [32].
Bonferroni correction threshold for genome-wide significance was determined as (P-value/ Number of SNPs) = 0.05/92315 = 5.42x10-7 for the original GWAS, and 0.05/88499 = 5.65x10-7 for the meta-analyses.However, Bonferroni correction is problematic in genetic association studies, because it expects independence between the comparisons, which does not hold for SNPs due to LD [78].Consequently, when type I error is controlled with overly conservative Bonferroni adjustment, type II error rate might be inflated if the sample size is small, and some QTL with real effects may be ruled insignificant [78].Therefore, we estimated the effective number of independent tests using simpleM [79] for use in permutation analysis for genome-wide significance as 24159 for the original GWAS and 26323 for the meta-analyses.We also used these values to calculate thresholds for significance that rely on more accurate estimates of independent tests: 0.05/24159 = 2.07x10-6 for the original GWAS and 0.05/ 26323 = 1.90x10-6 for the meta-analyses.

Assessment of linkage disequilibrium and number of independently associated loci
We used the function "r2fast" [80] from the GenABEL-package in R to estimate the r 2 values between the top SNPs from the genome-wide association analyses.For one SNP in the first meta-analysis (BICF2P272135), we re-calculated the r 2 values with the RSQ-function in excel, because of a batch specific allele flip that affected the LD-estimation in R. To estimate the number of independently associated loci within the target regions on chromosomes 1 and 9, we used a SNP clumping procedure.This was executed with the "clump.markers"function from the R-package cgmisc [81].The threshold for forming the clumps were as follows.The physical distance cut-off for clumping was set to 7.5 Mb to cover all of the associated loci on both targeted chromosomes, so as not to create any clumps due to distance, but only due to association with the trait (P-value threshold = 5.0x10-5-5.0x10-6),and due to high enough correlation between the SNPs (r 2 threshold = 0.70).

Targeted sequencing
A targeted sequencing of a 7 Mb region on canine chromosome 9 (bases 30620001 to 37620000 from NC_006591.3)was executed by the DNA Sequencing and Genomics lab at the University of Helsinki.The study included 24 cases and 24 controls that were chosen by the combinations of their genotypes for the following markers: BICF2S23027935, BICF2P742007, BICF2G630834826, BICF2G630837209, BICF2G630837240, BICF2G630837405 and BICF2P272135 (See also Tables 1 and 2).SNP genotype combinations for 24 controls and 23 cases were GAGAGCG and AGAGATC, respectively.In addition, one case had the combination AGARRYS.An indexed Illumina library was created for all 48 samples.Briefly, DNA was sheared using a Bioruptor NGS sonicator (Diagenode, Denville, NJ, US) and the obtained fragments were end-repaired, A-tailed and truncated Illumina Y-adapters ligated.In a PCR step (20 cycles) full-length P5 and indexed P7 adapters were introduced using KAPA Hifi DNA Polymerase (KAPA Biosystems, Wilmington, MA, US).Pools containing four samples each were made for sequence capture with custom SeqCapEZ probes (Nimblegen/Roche, Madison, WI, US) targeting the 7 Mb area from the genome.The sequence capture was performed according to the manufacturer's protocols (Nimblegen/Roche, Madison, WI, US).The captured fragments were amplified (20 cycles) using Illumina adapters P5 and P7 as described above.The PCR products were purified, and size selected using AMPure XP beads (Beckman Coulter Inc., Brea, CA, US).The obtained final libraries were paired-end (300 bp + 300 bp) sequenced on a MiSeq Sequencer (Illumina, San Diego, CA, US).The adapter sequences were removed and the raw reads were filtered using PRINSEQ [82].After quality control, the remaining 47272947 (94.6%) reads were mapped to the reference sequence CanFam3.1 using Burrows-Wheeler Alignment tool [83].The aligned reads were visualized in Tablet and Integrative Genomics Viewer [84,85].

Variant analysis
We implemented a targeted re-sequencing analysis pipeline to screen for coding variants in comparison with CanFam3.1 reference genome.FASTX was used to perform base quality check of the raw reads and Burrows-Wheeler Aligner (BWA) version 0.5.9 [83] was used to map the reads to the reference genome.Picard tools (http://broadinstitute.github.io/picard/) was used to sort and mark possible PCR duplicates.Re-alignment around indels and base quality score recalibration was done using GATK.The variant calling was carried out using the Genome Analysis Tool Kit (GATK) version 3.5 [86] and SAMtools version 1.2 [87,88].The detected variants were annotated to Ensembl and NCBI gene annotation databases using ANNOVAR [89].
Using 258 controls (hip scores A/A, including the sequenced 24 controls) and 168 moderate-to-severe-cases (hip scores C/D, D/C or worse, including the 24 sequenced cases), we determined the statistical association between the phenotype and SNP variants in the target area (S4 Table ).The Cochran-Mantel-Haenszel test variable M 2 for the independence of variants and the phenotype could be determined for 217 SNPs (S5 Table ).The null distribution of maximum M 2 from 10000 permutations had a mean value of 8.25 and with 95% confidence interval ranging between 4.06 and 15.01.Using the null distribution as a reference, 28 of the 217 SNPs were statistically associated with the phenotype (Bonferroni-corrected, adjusted pvalue < 0.05, N = 217) (S5

Fragment analysis
We performed a PCR with a region of 400 bp encasing the deletion revealed in the targeted sequencing.We designed the primers for this with the NCBI Primer-BLAST tool [90].The primer sequences are in the supporting information (S11 Table ).Basic and 5'-FAM-labeled primers were from Oligomer (Helsinki, Finland).The annealing temperatures were calculated with Thermo Fisher Scientific Tm calculator for Phusion DNA polymerase [91].The PCR was run with a T100 Thermal Cycler (Bio-Rad, California, US) with a standard 3-step protocol for Phusion reaction.Standard 1.2% and 2% agarose gels were used (A9539; Sigma Aldrich, St. Louis, MO, US), with 1 x TBE buffer and ethidiumbromide staining.Sample and ladder volume were 5 μl in all lanes.We used GeneRuler 100 bp (SM0242) and 100 bp Plus (SM0321), from Thermo Fischer Scientific (Waltham, MA, US) as the DNA ladders.The gelimaging was performed with AlphaImager (Alpha Innotech, Kasendorf, Germany).The PCR amplicon was validated with sequencing.PCR products from 18 dogs were ambiguous on gels and were sent for fragment analysis.Nine samples did not yield a product with either method and one sample remained ambiguous leaving us with 516 fragment genotypes.The DNA Sequencing and Genomics lab at the University of Helsinki carried out both the sequencing and the fragment analysis.They used capillary electrophoresis to analyze the fragments, with a GeneScan 500 ROX dye (4310361; Thermo Fisher Scientific, Waltham, MA, US) size standard.Subsequently, we analyzed the data with Peak Scanner v1.0 (Applied Biosystems, Foster City, CA, US).

Logistic regression models
Logistic regression models with or without the NOG regulatory variants were computed in R [73].The odds ratios corresponding with the GLM coefficients were calculated using R package 'oddsratio' [92].AUC calculations and comparisons were done using R package 'pROC' [93].

Assembly of the resequencing data
A reference sequence was assembled using CSC computational hub based on the targeted sequencing reads from a case that did not exhibit the deletion upstream of NOG.The adapters were removed and quality of the fastq files was assessed using FastQC [94].The de novo assembly was done using the Spades assembler [95].Assembly was done for the following k-mer values (21,33,55,77,99,127); the Spades assembler then generates a combined assembly (i.e.scaffolds) based on the kmers used.The assembly QC for the scaffolds was done using 'Quast' [96].

Closure and characterization of the gap upstream of NOG in the CanFam3.1. reference genome sequence
Genomic DNA from Dog6 was amplified using primers CanNOG-F1 and CanNOG-R1 from Ishii et al. [35].The PCR products were sequenced, low quality sequences were discarded and a consensus sequence was derived.The alignments between Dog6, CanFam3.1 chr9 and GRCh38 chr17 were done using MAFFT [97].The human core JASPAR2018 database [36] was queried with the alignment in S9 Fig using TFBSTools [98].Bonferroni correction was used to adjust the P-values for each putative binding site for all the matrix ID's.The matrix ID specific prediction was considered significant if the bonferroni-corrected P-value for any of its binding sites was less than 0.05.

Dual luciferase reporter assay
According to the findings from the targeted sequencing we designed three different sequence variant constructs: A, B and C, where A is our German Shepherd reference sequence, and B and C are variants with deletion of eight or seven AGG-triplets.The construct sequences are shown in the supporting information (S7 Table ).The longest construct (construct A) was designed based on the Dog2 scaffolds generated from the resequencing data (S7 Fig) .The NOG enhancer sequence variants were cloned into the pNL3.1[Nluc/minP]NanoLuc luciferase vector (Promega, Madison, WI, US).pGL4.54[luc2/TK]firefly luciferase was used as a constitutively expressed control plasmid.24 h prior to transfection, 2 x 10 4 HEK293 or 8 x 10 3 U-2 OS cells were plated to 96 well plates in DMEM medium supplemented with 10% FBS and without antibiotics.The HEK293 cells were transfected with 50 ng of each plasmid DNA and 50 ug carrier DNA / well and the U-2 OS cells with 10 ng of each plasmid and 80 ug carrier DNA/well using Fugene HD transfection reagent (Promega, Madison, WI, US).Luciferase activities were measured after 24 h using the Nano-Glo Promega Dual-Luciferase reporter assay system according to the manufacturer's instructions.The NanoLuc luminescence values were normalized by division with the control firefly luminescence.The data for every setup (three transfection experiments each with four technical replicates) was analyzed in R using the Kruskal-Wallis rank sum test followed by Dunn's test for multiple pairwise comparisons with Bonferroni adjustment for P-values.P-value < 0.05 was considered significant.help and continuous support.We are grateful to all the dog owners who have donated samples from their dogs for the study.

Fig 1 .
Fig 1. Genome-wide association for canine hip dysplasia: FASTA and QTSCORE, N(cases) = 132, N(controls) = 160.For the upper two figure segments the red horizontal lines are the thresholds for Bonferroni correction for significance.The blue horizontal line is the threshold for significance for independent tests.In the undermost segment, where the permuted P-values are shown, the single red line represents the threshold for genome-wide significance level of 0.05.https://doi.org/10.1371/journal.pgen.1008197.g001

Fig 2 .
Fig 2. Genome-wide association for canine hip dysplasia: First meta-analysis, N(cases) = 132, N(controls) = 277.For the upper two figure segments the red horizontal line is the threshold for Bonferroni correction for significance.The blue horizontal line is the threshold for significance for independent tests.In the undermost segment, where the permuted P-values are shown, the single red line represents the threshold for genome-wide significance level of 0.05.https://doi.org/10.1371/journal.pgen.1008197.g002

Fig 3 .
Fig 3. Genome-wide association for canine hip dysplasia: Second meta-analysis, N(cases) = 247, N(controls) = 277.For the upper figure segment the red horizontal line is the threshold for Bonferroni correction for significance.The blue horizontal line is the threshold for significance for independent tests.In the undermost segment, where the permuted P-values are shown, the single red line represents the threshold for genome-wide significance level of 0.05.https://doi.org/10.1371/journal.pgen.1008197.g003 no or low H3K27Ac signal in corresponding human genomic region on chr.17; rs850849831 (H->R, tolerated), rs851033426 (M->T, tolerated), rs850562371 (N->S, tolerated), rs851547031 (W->L, tolerated), rs850850258 (Q->E, tolerated); splicing mutations in the introns overlap with Ensembl Regulatory features or with Ensembl Motif featuresThe numbers indicate the difference between the numbers of cases and controls carrying variants of a particular functional effect regardless of the position within the indicated gene.Completely or nearly completely segregating variants are in bold.Negative numbers indicate that the variants were more prevalent among the controls.The various variants are commented in the last column.SD = splice donor.SA = splice acceptor.Single letter IUPAC amino acid coding is used.Functional consequences of coding variants have been estimated with SIFT (sift.jcvi.org)and provided by the Ensembl genome browser (www.ensembl.org).The H3K27Ac chromatin marks are from the ENCODE project (www.encodeproject.org)and provided by UCSC Genome Browser (genome.ucsc.edu).https://doi.org/10.1371/journal.pgen.1008197.t007Identification of a regulatory variant upstream of NOGTwenty-eight SNPs in the resequenced target region associated with the phenotype (S5 Table).These SNPs concentrated on two loci (S5 Fig,S5 Table) corresponding to those found in the LD-analysis (S1 Table, S3Fig).As the sequencing depth was variable we combined the reads from cases and controls to separate pools.Visual inspection of two pools of sequences revealed a deletion variant at chr9:31453837-31453860 in the first locus (vertical black line in S5 Fig).This 24-bp deletion variant at resided within an AGG-triplet repeat region in close proximity to the NOG gene in eight control dogs.In addition, one dog had a 27 bp deletion.NOG and its upstream sequence are conserved across species [33] (S6 Fig).The corresponding region on the human chromosome 17 is placed within a putative gene regulatory element upstream of NOG gene with binding sites for several transcription factors [34] (Fig 4).Additionally, there are H3K4Me1 (mono-methylation of lysine 4 of the H3 histone protein) and H3K4Me3 (tri-methylation of lysine 4 of the H3 histone protein) histone mark peaks linked to this region; H3K4Me1 marks associate with enhancers and H3K4Me3 with active promoters [34] (Fig 4).The corresponding region on mouse chromosome 11 overlaps with binding sites for Suz12 (OREG1916695) and Mtf2 (OREG1828914) transcription factor binding sites.

Fig 4 .
Fig 4. Human NOG enhancer region compared to the related region in the dog genome.At the top of the image there is a sequence comparison between human and dog.hs = human genomic (GRCh38.p7)region at chr17:56592775-56592815. cf = canine genomic (CanFam3.1)region at chr9:31453829-31453899. d1-d4 = the corresponding sequences of Dog1-Dog4.Below the sequence comparison is first the canine genomic region including the deletion (blue triangle), gap region marked with Ns, and the complete coding sequence for NOG from a Beagle (GenBank: AB544074.1).At the bottom of the image are the corresponding human genomic regions

Table 9 .
The odds ratios based on logistic regression modeling of the CHD phenotype using the NOG deletion allele and the GA top SNP combination genotype as independent variables.272); moderate-to-severe cases (N = 164) 0.43 [0.

Fig 5 .
Fig 5.The differential effects of three NOG regulatory alleles in a dual luciferase reporter assay.Relative luminisence (NanoLuc/firefly luminisence expressed as median value ± standard deviation, three biological replicates with four technical replicates each per every construct).Left: HEK293 cells transfected with 50 ng plasmid DNA + 50 ng carrier DNA.Right: U-2 OS cells transfected with 10 ng plasmid DNA + 10 ng carrier DNA.pNL: empty control vector.A: construct A, B: construct B, C: construct C. � : P<0.05 relative to construct A. The difference between relative luminescence of the control plasmid and each of the constructs was always statistically significant.https://doi.org/10.1371/journal.pgen.1008197.g005

Table 4 . Protein coding genes near the top SNPs from the genome-wide association analyses. SNP ID (CanFam 3.1) Location Protein coding genes within ± 500 kb from the SNP Relation to the closest gene(s)
Genes closest to the SNPs are in bold.https://doi.org/10.1371/journal.pgen.1008197.t004

Table 6 . Cross-tabulated frequencies and counts of the most common genotypes of the two top SNPs on chromosome 9 (BICF2S23027935, BICF2P742007) with the hip phenotypes.
All other coding variants were predicted to be tolerated.A potentially regulatory variant was discovered 364 bp upstream of RAD51C.This variant was found in 22 cases (16 homozygous, 6 heterozygous animals) and none of the controls.The corresponding site on human chromosome 17 displays a strong H3K27Ac (acetylation of lysine 27 of the histone H3 protein) chromatin mark signal.No other evidence for gene regulatory variants was found.Intronic variants close to splice regions were discovered in the gene for ring finger protein 43 (RFN43) and TEX14.

Table 10 . Comparison of full (with NOG deletion) and reduced (without NOG deletion) generalized linear models on three subsets of dogs.
Df = degrees of freedom, P = probability value, Z = Z test statistic, AUC = area under the receiver operating characteristic curve https://doi.org/10.1371/journal.pgen.1008197.t010 Table, S4 Fig).