University of Huddersfield Repository Meta-Analysis of Genome-Wide Association Studies and Network Analysis-Based Integration with Gene Expression Data Identify New Suggestive Loci and Unravel a Wnt-Centric Network Associated with Dupuytren’s Disease

Dupuytren´s disease, a fibromatosis of the connective tissue in the palm, is a common complex disease with a strong genetic component. Up to date nine genetic loci have been found to be associated with the disease. Six of these loci contain genes that code


Introduction
Dupuytren's disease (DD; OMIM 126900) is among the most common genetic diseases of connective tissues. DD is a fibroproliferative disorder that affects the palmar aponeurosis and the cutaneous retinacula of the hand. It manifests first through the formation of subcutaneous nodules, fibrotic cords are found afterwards, and flexion contractures of single fingers can then occur. There is no known cure for the disease. Surgical intervention is the most common treatment option when hand function becomes hampered. The prevalence of DD has been estimated as around 4% in England [1] and 2.5% in Germany [2]. It increases significantly with age [3,4] and was determined as 22% in a cross-sectional study of the population aged over 50 years in the northern part of the Netherlands [5] and 30% in the Norwegian population over 60 years of age [6]. Histopathologically, an increased proliferation of fibroblasts and differentiation into myofibroblasts can be associated with a massive deposition of extracellular matrix (ECM). The disease shows a progressive clinical behaviour with frequent local recurrence.
The causes for DD are multifactorial. Several environmental factors have been proposed to contribute to DD development. Smoking and alcohol consumption have been associated with DD [7][8][9]. Elevated blood glucose levels, low body weight, and low body mass index (BMI) have also been shown to contribute to DD [9]. Heavy manual labour and exposure to vibrations probably trigger the manifestation of the disease [10,11]. Importantly, DD has a strong genetic basis. Frequent familial occurrence is observed [12]. Studies have determined a family predisposition in 12.5% [2] and 27% [13] of cases, respectively. The sibling recurrence risk λ s has been estimated to equal 2.9 based on a prevalence of 3.5% in northwestern England [14]. Patients with known family history show a trend to undergo surgery earlier than patients without known family history [15]. DD has a polygenic basis, where different genetic risk loci strongly contribute to disease susceptibility [16]. In a recent population-based twin study the heritability of DD was calculated to be 80% [17].
The first GWAS for this disease [18] identified nine genome-wide significant loci. Six of these loci contain genes known to be involved in Wnt signalling, namely WNT4, SFRP4, WNT2, SULF1, RSPO2, and WNT7B. Notably, these six genes are either Wnt ligands or extracellular modulators of Wnt signalling, and therefore lie upstream of the Wnt signalling cascade. This finding was the first insight that Wnt signalling may be one of the molecular mechanisms that underlie the genetics of DD. The Wnt signalling pathway plays an important role during embryogenesis and is tightly regulated on multiple levels [19]. Considering the late age of onset in DD, which peaks in the 5 th decade of life [5,15], highly damaging mutations in Wnt or other genes are not expected in DD. Accordingly, the majority of SNPs associated with complex diseases map to non-coding regions, and it has been suggested that regulatory effects commonly underlie these GWAS signals [20]. Many of these regulatory effects will be small and difficult to detect individually [21]. On top of that, GWAS for complex diseases usually point to only a proportion of the causal genetic basis of the respective diseases and many signals attributed to causal genetic variants may be hidden in the statistical noise of those GWAS. A systems approach to analyse the pathways and networks involved in DD may overcome at least partially these limitations. Network modelling can help to increase the statistical power to detect susceptibility loci with more subtle effects by including candidate loci and modeling relationships between these loci and previously known, significantly associated loci derived from GWAS. Network modelling is used extensively for gene expression data analysis, and a consensus network interfered from mutable methods was shown to perform best [22].
To identify new loci associated with DD and corroborate the nine previously described GWAS loci we performed a meta-analysis on the imputed genotypes of three genome-wide GWAS data sets comprising 1,580 cases and 4,480 controls. As a first step in a systems genetics approach to understand the pathogenesis of DD, we have integrated the GWAS results with whole-transcriptome data. In a network analysis based on protein-protein interaction data we identified functional modules of genes/proteins with shared characteristics that were overrepresented in our DD case/control data sets.

Study participants
The study was approved by the Ethics Commission of the Faculty of Medicine of the University of Cologne and participants provided written informed consent. All participating subjects were of European origin (see S1 and S3 Tables for details). The German study population is described in detail elsewhere [15]. DNA was extracted from peripheral blood samples. RNA was obtained from primary tissue samples from patients and sex and age matched controls. Control connective tissue samples were obtained during trigger finger or carpal tunnel release.

Data sets
We included 1,580 cases and 4,480 controls in a meta-analysis on the imputed genotypes of three genome-wide GWAS data sets (S1 Table). Samples were genotyped on three different genotyping platforms. The first data set comprised 186 cases from Germany and Switzerland typed on the Affymetrix Genome-Wide Human SNP Array 6.0 (809,858 SNPs) and 447 controls typed on the same array from the KORA study (KORA [Cooperative Health Research in the Region of Augsburg], Helmholtz Center Munich). The second data set consisted of 538 German and Swiss cases and 1197 controls from the Popgen study (Popgen, University of Kiel) genotyped for 587,326 markers on the Affymetrix Axiom Genome-Wide CEU 1 Array. The third data set comprised 856 cases (University Medical Center Groningen) and 2,836 controls (LifeLines) genotyped for 234,758 markers on the Illumina HumanCytoSNP-12. This data set was the discovery data set from Dolmans et al. [18], the first described GWAS for DD. The other two data sets were partly used for replication of 34 discovery SNPs [18] but not analysed for genome-wide association before. We performed sample imputation and phenotypic association testing separately for each of the three platforms. Subsequently, the data from the three GWAS were combined in a genome-wide meta-analysis. See also S1 Fig for an overview of the study design.

Quality control
The same quality control parameters were applied separately to each data set. Following Dolmans et al. [18], we only considered bi-allelic non-ambiguous SNPs on autosomes and excluded those with a marker call rate <95%, a minor allele frequency <1% or evidence for deviations from Hardy Weinberg equilibrium in controls (p < 10 −4 ). Moreover, we removed samples with a call rate <99%, excess homo-or heterozygosity, an ambiguous sex assignment or indication of being an outlier with respect to the genetic composition of the other samples.
In addition, for pairs of duplicates or close relatives (average allele sharing identical-by-descent >0.4), the sample with the lower call rate was excluded. The final data sets for analysis comprised 619 (180 cases, 439 controls), 1,530 (441, 1,089) and 3,654 (852, 2,802) samples, respectively.

Genome-wide association testing
We used SNPTEST v2.5 [26] (https://mathgen.stats.ox.ac.uk/genetics_software/snptest/ snptest.html) for phenotypic association testing under an additive genetic model, using the predicted allele dosage from the imputation. Top-ranking components obtained from a multidimensional scaling (MDS) analysis based on genome-wide data of allele sharing identical-bystate were included as fixed effects in the regression-based association test in order to adjust for population stratification. The number of MDS components was determined by minimizing genomic inflation factor (see S1 Table).

Meta-analysis
We performed the genome-wide meta-analysis using GWAMA v2.1 [27] (http://www.well.ox. ac.uk/gwama/index.shtml). The analysis was restricted to bi-allelic markers that had been studied for an association with DD in at least two cohorts (n = 5,894,478 SNPs). Effect sizes were synthesized using a fixed-effects regression model, thereby hypothesizing that the studies differed only in their statistical power to detect the outcome of interest. Both Cochran's Q statistic and I 2 index were used to evaluate statistical heterogeneity between studies [28].
Only SNPs that passed imputation post-hoc quality control and that were studied for an association with Dupuytren's disease in at least two cohorts were considered in the meta-analysis (n = 5,894,478). SNPs for which Cochran's Q test (p < 0.05) or the I² index (I² > 25) indicated heterogeneity between studies were excluded (n = 1,543,737). In particular, the global Manhattan plot and QQ plot, both generated via R-script provided by GWAMA, were finally based upon 4,350,741 SNPs. 4,199,404 associations were studied in all three cohorts (96.52%).

Gene and pathway analysis
Search for overrepresented GO terms using Alligator. ALLIGATOR [29] (http://x004. psycm.uwcm.ac.uk/~peter) was used to test for Gene Ontology (GO) categories that were overrepresented in a list of significantly associated SNPs from the meta-analysis. More specifically, out of 4,350,741 SNPs that indicated no heterogeneity between studies, those 185,986 SNPs that were nominally significant (p<0.05) and showed the same effect direction in all three studies were considered for the pathway enrichment analysis. Alligator assigns SNPs to genes based on genomic location. More specifically, a SNP is assigned to a gene if it falls within the gene plus 20kb upstream or downstream of the first and last exon.
Gene-based test using VEGAS2. The offline version of VEGAS2 [30] () was used to obtain gene-based p-values from single-SNP p-values. Based on SNP association p-values the software calculates empirical gene-based p-values by a simulation procedure. We based our analysis on those 4,350,741 SNPs that did not show indication of heterogeneity between studies. Estimation of inter-marker linkage disequilibrium was based on the EUR population from the 1000 Genomes Project phase I. Gene boundaries were set to ±50 kb of each gene. Up to 10 6 simulations were performed per gene. 24,596 genes were tested and genes with p < 2.03x10 -6 (Bonferroni correction for multiple testing, i. e. 0.05/24,596) were considered to be significantly associated with DD.
Connecting genes from associated loci. We used three different software tools to search for relationships between genes from those genetic loci that harbored at least one index SNP with p < 10 −5 in our meta-analysis: GRAIL [31] (https://www.broadinstitute.org/mpg/grail/) looks for co-citations of genes from the associated regions in scientific publications. Since GRAIL uses the hg18 genome assembly and CEU HapMap SNPs, genomic regions instead of SNPs were used as query input because most top-ranking SNPs from the meta-analysis were not present in HapMap. Regions +/-50kb around the top-ranking SNP of each region were used as query regions. Chromosomal positions were translated to hg18 coordinates with the UCSC LiftOver tool (http://genome. ucsc.edu/cgi-bin/hgLiftOver) and used as query/seed regions for GRAIL analysis.
DAPPLE2 [32] (http://www.broadinstitute.org/mpg/dapple/dappleTMP.php) searches for connections between query genes in a high confidence protein-protein interaction (PPI) dataset (InWeb). We used SNPs, genomic regions, and genes as queries. Direct and indirect connections are given by the software and parameters calculated by random permutation to access the significance of the found connections.
In PrixFixe [33] (http://llama.mshri.on.ca/~mtasan/GranPrixFixe/html/) SNPs reported by GWAS act as seeds for LD-based query regions. We assessed connections between genes based on shared function using PrixFixe. Functional similarities between genes across distinct candidate sets were then located and genes scored based on the frequency and strength of identifying such functional connections.
Network-based pathway analysis. We performed a network-assisted search for enriched protein-protein interactions (PPIs) based on all gene-based p-values from the VEGAS2 analysis. We used two different software tools for this approach, namely dmGWAS in R and Cytoscape (see below). For both tools, we used the 2014 version of the Reactome FI (http://www. reactome.org/) as PPI reference dataset. This dataset contains 11,879 nodes and 217,249 edges and is by design enriched for true biologically functional relationships [34].
The R package dmGWAS 3.0 [35] (http://bioinfo.mc.vanderbilt.edu/dmGWAS) was used to search for enriched modules in Dupuytren´s disease using gene-based p-values as node weights and differential co-expression of genes as edge weights (based on the whole transcriptome data set). The EW_dmGWAS algorithm implemented in dmGWAS 3.0 integrates GWAS signals and gene expression profiles to extract dense modules from the background PPI network. Node weights are derived from GWAS gene-based p-values and edge weights are derived from gene expression profiling. The resulting module score is a combination of node weight and edge weight.
PINBPA [36] Cytoscape (http://www.cytoscape.org/) also searches for PPI in GWAS datasets. Genes with a p-value < 0.05, 0.1 and 0.2 were used as seed genes for a greedy search algorithm-based search for connected modules. Genes from the gene-based test (VEGAS2) were mapped to the Reactome FI network. All mapped genes with a p-value < 0.05 (or 0.1 and 0.2, respectively) were used as seed genes in the greedy search for enriched modules. We considered modules with a z-score > 5 and a network-size 5.

Transcriptome analysis
The Illumina HumanHT-12 v3 Expression BeadChip was used for whole genome expression analysis of disease samples and controls. Twelve primary tissue samples from patients were compared with 12 sex and age matched controls. Total RNA was extracted from 150mg RNAlater stabilised tissue by phenol/chloroform extraction. The tissue was cut into very small pieces and homogenised in a beadmill (Tissue Lyser 2, Qiagen, Germany) for 2-6 min at 30s -1 and room temperature. The phenol/chloroform purified RNA was precipitated with ethanol; dried and resuspended in RNase free water. RNA concentration and integrity were assessed with a 2100 Bioanalyzer (Agilent, Germany). For the whole-genome gene expression analysis only RNA samples with RNA integrity number (RIN) 6.9 were used. The Illumina TotalPrep RNA Amplification Kit (Ambion, USA) was applied for generating biotinylated, amplified RNA for hybridisation and samples were hybridised with the Illumina HumanHT-12 v3 Expression BeadChip according to the manufactures' protocol. Data were analysed with the Illumina GenomeStudio Data Analysis Software and normalised, the background was subtracted, the Illumina custom error model applied and p-values were adjusted for multiple testing (Benjamini-Hochberg correction). Ingenuity Pathways Analysis (IPA) software was used to identify altered pathways and protein networks in DD. Microarray data were submitted to the Gene Expression Omnibus (GEO) database (accession number: GSE75152).

GWAS meta-analysis in DD
Here we performed genome-wide genotype imputation and meta-analysis for three GWAS data sets for DD, which comprise 5,803 samples after quality control, from Germany, Switzerland, and the Netherlands (S1 Fig). Manhattan and QQ plots were generated based on those 4,350,741 SNPs from the meta-analysis that did not indicate heterogeneity between studies (Fig 1). In the subsequent analysis only SNPs that showed the same direction of effect in all three data sets were considered (n = 4,199,404 SNPs) (S2 Table). Overall, the meta-analysis yielded 910 SNPs showing suggestive association with DD (p <10 −5 ), located in 23 chromosomal regions. Out of these, 371 SNPs reached genome-wide significance (p <5×10 −8 ). These SNPs are located on chromosomes 7p14.1 (n = 162), 8q13.2 (n = 79), 8q23.1 (n = 29), 9p24.3 (n = 25), 19q13.43 (n = 6), and 22q13.31 (n = 70). The top-ranking SNP from each of these loci is given in Table 1. We confirmed six of the nine loci previously shown to be associated with DD [18] on a genome-wide significant level. As in Dolmans et al. [18], we observed the strongest association signal on chromosome 7p14.1. Here the imputed SNP rs17171229 (p = 1.11x10 -28 , OR 2.02) gave a slightly stronger signal in the meta-analysis than the previous GWAS top SNP, rs16879765 (p = 1.52x10 -27 , OR 2.02). SNPs rs17171229 and rs16879765 are in close linkage disequilibrium (r 2 = 0.93; based on 1000 Genomes Project EUR phase 1 population). For one locus that previously showed genome-wide significance, on chromosome 20q12, the top SNP in our meta-analysis, rs3577 located in the 5' UTR of MAFB reached a pvalue of 1.12x10 -7 . We did not observe suggestive association signals for two previously genome-wide significant loci, on chromosomes 1p36.12 (including WNT4) and 7q31.2 (WNT2). On chromosome 1p36.12, rs7524102 associated with DD in Dolmans et al. [18], showed a p-value of 4.88x10 -5 (OR 1.28). The SNP with the lowest p-value in this region was rs7537281 (p = 1.48x10 -5 ; OR 1.30). SNP rs4730775 (within WNT2) on chromosome 7q31.2 showed a p-value of 0.000791 (OR 0.85). The SNP with the lowest p-value in this region was rs56145820 (p = 5.81x10 -5 ; OR 1.28).
Of the remaining 16 regions that contained SNPs with p-values <10 −5 two have been previously identified: one on chromosome 6q25.1 near TAB2 (rs11155615, top-ranking SNP from the meta-analysis, located intronic of UST), and the other one on chromosome 15q26.1 with the topranking SNP from the meta-analysis, rs4932195, located between the genes ISG20 and ACAN. Moreover, we identified 14 new suggestive loci on chromosomes 1q32.  (Table 1). These loci require replication in an independent sample set.

Search for connections between potentially associated loci
In total, we identified 23 loci with suggestive evidence for association with DD. All these were used for pathway analysis with GRAIL, DAPPLE2 and PrixFixe to identify the most likely candidate gene(s) from each region based on connectivity. The candidate gene(s) with the best scores/lowest p-values for each region are given in Table 2. In order to allow comparison, Table 2 also contains the gene(s) with the lowest p-values for each region from the VEGAS2 analysis. All three software tools unanimously identified SFRP4, KDR and WNT7B as the best candidate genes in their respective loci. SFRP4 and WNT7B locate to the two regions with the lowest p-values in our study. For one region, on 10q23.1 (rs10786118), no candidate genes were identified by any software tool. Most interestingly, additional Wnt signalling components were identified in regions with suggestive p-values, among them WNT5A on chromosome 5p14.3 and AXIN1 on chromosome 16p13.3. AXIN1 locates to a gene rich region with more than thirty potential candidate genes.
GRAIL identified one or more candidate genes for 17 out of 23 regions, with four genes, SFRP4, WNT7B, AXIN1 and RSPO2, having corrected p-values <0.01. When using candidate SNPs as input, DAPPLE2 identified genes in 10 out of 23 regions. Connectivity among these genes was not higher than expected by chance (and no single gene with p-values < 0.05). When regions +/-50kb around the top-ranking SNP were used as input/seeds in the DAPPLE2 analysis (similar to GRAIL input), DAPPLE2 considered 13 out of 23 regions in the analysis and identified two direct connections, between WNT7B and SFRP4 and between SMAD3 and AXIN1. Again, the identified genes were not more connected than expected by chance (i. e. no significant p-values for overall connectivity and for single candidate genes). PrixFixe identified candidate genes in 19 regions. The software does not give p-values but a normalised scores between 0 and 1, where a higher score indicates higher connectivity. Three genes, WNT5A, SULF1 and TAB2, were uniquely identified by PrixFixe.

Search for connections between GWAS loci
While we first restricted our search for functionally connected genes to the 23 regions showing at least suggestive association in our meta-analysis, we subsequently used the software tool Alligator to identify overrepresented GO terms in the whole meta-analysis data set. For 141 out of 2,631 analysed GO categories, the category-specific p-value for overrepresentation was <0.05 when using a cutoff of p<0.01 for defining significant SNPs (S4 Table). The number of all functional types of GO categories (i.e. cellular components, molecular functions, and biological processes) that reached significance levels for overrepresentation of 0.05, 0.01, and 0.001 are shown in S5 Table. Significance of the number of overrepresented categories was observed when using less stringent p-values for defining significant SNPs (i.e. p<0.05 or p<0.01). This suggests that the genetic susceptibility to DD may involve risk alleles with rather small individual effects. In addition, excess of overrepresented categories was most significant when using 0.01 as the significance criterion defining overrepresentation. Associations for DD are thus supposed to be concentrated in a moderate number of categories, with each of them showing strong evidence of overrepresentation. The three highest ranking overrepresented pathways were "positive regulation of cellular protein metabolic process" (GO:0032270, p-value 0), "positive regulation of protein metabolic process" (GO:0051247, p-value 0) and "response to drug" (GO:0042493, p-value 2x10 -4 ).

Differential gene expression and pathway analysis
Next we compared transcriptome levels in 12 primary disease tissue samples with that of 12 control tissue samples (S1 Fig). Cases and controls clearly clustered into two distinct groups based on whole transcriptome expression patterns. We considered only transcripts with a detection p-value of 0 in either one or both sample sets. 782 transcripts were upregulated with a fold change 2 (S6 Table) and 865 transcripts were downregulated with a fold change of -2 (S7 Table). Among the most upregulated transcripts with fold changes >10 (N = 75) were extracellular matrix components including THBS4, TCN, COL11A1, COL13A1, POSTN, but also other relevant genes such as TGFB2, ITGA10 and WISP1. Among the most downregulated transcripts with a fold change <10 (N = 80) were genes related to muscle function/contraction, notably ACTA1 and others. Considering the importance of Wnt signalling for the development of DD, we specifically searched for differential expression of the genes involved. From all Wnt genes covered by the array, however, only WNT2, WNT5B and WNT11 passed our quality criteria. WNT2 showed no differential expression, WNT5B was 2.6-fold upregulated and WNT11 was 3.5-fold downregulated in DD cases.
Genes with a fold change cutoff of 1.4 were further investigated in the IPA core analysis (N = 4,062; S8-S12 Tables). The top canonical pathway was the Wnt/β-catenin signalling pathway (ratio 60/166 (0.361), p-value 1.57x10 -6 , z-score 0.302). A slightly positive z-score indicates upregulation of this pathway in DD. The second highest ranking pathway was "LPS/IL-1 Mediated Inhibition of RXR Function" (ratio 70/208 (0.337), p-value 4.40x10 -6 , z-score 1.80) and the third highest ranking pathway was "Hepatic Fibrosis/Hepatic Stellate Cell Activation" (ratio 66/196 (0.337), p-value 7.97x10 -6 , z-score NA). The top network in the IPA analysis contained 68 nodes and a score of 42, with the seed gene CTNNB1 encoding β-catenin. Functional annotation of those 68 genes with DAVID and Enrichr [38] revealed Wnt signalling as the best enriched signalling pathway for these genes. The top upstream regulator identified in the IPA core analysis was TGFB1, predicted to be activated with a p-value of 4.96x10 -13 and an activation z-score of 3.75. TGFB1 had 110 target molecules in the analyzed data set. The top biological function predicted to be activated with a z-score of 2.89 and a p-value of 4.81x10 -25 was "proliferation of cells".

Network-based pathway analysis
We further used two software tools for whole-genome pathway analysis, dmGWAS and the cytoscape app iPINBPA. Both tools apply a greedy search algorithm to search for enriched modules.
For dmGWAS we considered 24,596 genes with p-values and 3,873 genes with expression values (fold change cutoff 1.4). The final background network had 2,165 nodes and 7,737 edges. The default lambda calculated by the software was 0.47. The analysis resulted in 943 modules with normalised module scores (Sn) between 18.91 and 0.76. We selected the top 5% modules for further analysis to have sufficient numbers of genes for functional annotation clustering. Within these 121 genes there were two genes from genomic regions with SNP-based pvalues <10 −5 , MAFB and ISG20. A full list of the genes is given in S7 Table. Functional annotation for the 121 genes with DAVID [37] yielded 79 enrichment clusters. The best cluster had an enrichment score of 6.76 and consisted of sixteen GO terms concerning fiber contraction and the (actin) cytoskeleton. Other clusters concerned functions related to contraction, integrin, adhesion, and the ECM (S13 and S14 Tables).
In contrast to dmGWAS, iPINBPA can restrict the overlap between modules resulting in bigger top modules with a default of 20% maximal overlap. The top module (S15 Table) from the iPINBPA analysis contained 851 nodes and had a z-score of 44.9. The start gene for this module was MYC. The mean gene-based p-value from the top modules was 0.12±0.08 and the mean SNP-based p-value was 0.01±0.02. Five genes from the top GWAS meta-analysis regions were included in this network, namely AXIN1, EIF3E, ISG20, MAFB, SFRP4, and RBBP5.

Discussion
We performed a genome-wide meta-analysis as well as enrichment and functional analysis for Dupuytren's disease (DD), based on three data sets comprising in total 1,580 cases and 4,480 control samples from Germany, Switzerland and the Netherlands. We only considered those SNPs that showed the same direction of effect in all three data sets in order to increase the robustness of our results, restricting the analysis to 4,199,404 SNPs in total. Because of the still relatively small sample size, we opted for including all samples in one meta-analysis and did not split the samples into discovery and replication data sets. A weakness of this study is, thus, the current lack of replication in an independent data set. Nevertheless, our approach of a genome-wide meta-analysis after stringent quality procedure application in all data sets is likely to have revealed robust findings of suggestive evidence for new susceptibility loci and important insight into disease mechanisms involved in the pathogenesis of DD.
First, we have confirmed six of the nine previously described loci associated with DD on genome-wide significance level in an enlarged data set. Second, we have identified 14 new suggestive loci, with p-values ranging between 1x10 -7 and 9x10 -6 . In any case, these loci require confirmation and replication in an independent data set. Intriguingly, however, several new loci of suggestive evidence again contain genes related to Wnt signalling. These genes include WNT5A on chromosome 3p14.3, which is a ligand of frizzled receptors; EBF2 on 8p21.2, which acts in synergy with the Wnt-responsive LEF-1/β-catenin pathway; SMAD3 on 15q22.33, which is involved in positive regulation of the canonical Wnt signalling pathway; and AXIN1 on 16p13.3, which is a component of the β-catenin destruction complex. These observations clearly support the notion that the suggestive loci identified in this study may be indeed true signals, they corroborate the outstanding importance of Wnt signalling in the susceptibility for DD and meet the expectation that Wnt signalling pathway related genes were also located in genomic regions with weaker association signals. Our results provide evidence in addition to previous observations that Wnt signalling is involved in the genetic basis of the disease.
The observed distribution of small p-values derivates early from the expected distribution as seen in the QQ plot, both for the SNP-based and gene-based data. Even after stringent correction for genomic inflation (population stratification) a large number of SNPs (and genes) with small p-values remain. Although data from a larger sample set strictly corrected for population stratification are necessary to substantiate this observation regarding Dupuytren's disease, a broad genetic basis is common in truly complex genetic diseases [39,40]. About 6% of all tested genes had p-values <0.05 in the gene-based association test conducted with VEGAS2. This is similar to the percentages reported recently, for instance, for three schizophrenia GWAS data sets described by Chang et al [41]. Assuming many loci involved in the susceptibility for the disease, genome-wide network analysis may help to identify true genetic associations with small effect sizes in the increasing noise of false positives with larger p-values. Predicting the importance of a functional module is often easier than that of single genes [42].
In this study, we have performed a comprehensive pathway and network analysis for DD using data from GWAS meta-analysis mainly with two different software tools. Both tools use a greedy search algorithm to search for densely connected modules that are enriched for small pvalues. We increased the power to detect functionally relevant PPI networks by integrating GWAS with differential gene expression data. Indeed, our results indicate a broad functional spectrum for the genetic basis of DD, covering many small effect size variants supposed to play a role. It is apparent from our analysis that in addition to the Wnt signalling, other signalling pathways and biological processes, notably focal adhesions and Notch signalling, may play a role. This is in concordance with the observations that complex diseases, with the probable exception of immune diseases, may be genetically based on a number of different (signalling) pathways [43,44]. The Wnt and Notch signalling pathways are both highly conserved, and heavy crosstalk is known to take place between them during development and disease [45]. Knowledge of the interplay of different pathways will ultimately help to understand pathogenic mechanisms and design treatment options for a disease [46]. Notably, functional annotation of gene lists is only the first step to functional characterization especially in the context of disease and affected tissues.
In our network analysis many factors point to the importance of ECM and focal adhesion, especially the results from dmGWAS, where the network analysis was based on GWAS and gene expression data. The expression of many ECM component genes was found to be upregulated in DD tissues compared to control tissues in our and other studies [47][48][49][50][51]. This network analysis may help to identify those ECM components deregulated in DD that are critically influenced by genetic factors. Of note, some of the genes from the suggestive GWAS regions code for ECM components or secreted proteins, such as ACAN and TPSD1.
Whole-transcriptome analysis correlated with the GWAS data results in that it highlighted the Wnt signalling pathway to be differentially expressed in disease compared to control tissue. Our transcriptome results support the GWAS findings, highlighting the canonical Wnt/β-catenin pathway to be significantly altered in disease tissues compared to control tissues. Although there are numerous expression studies published in DD [47][48][49][50][51] none of them reported upregulation of Wnt/β-catenin signalling so far, though upregulation and the accumulation of nuclear β-catenin levels in DD tissues have been described [52,53]. Moreover, micro-RNA profiling in DD showed deregulation of β-catenin in DD recently [54]. Other deregulated pathways concerned immune responses and fibrotic processes, which might fit well to the proposed disease mechanisms. The observed downregulation of WNT11 in patient tissues is in accordance with previous findings [55].

Conclusion
We replicated for the first time previously identified susceptibility loci for DD in a meta-analysis of three imputed GWAS data sets. Our findings further corroborate the strong genetic basis and the wide importance of Wnt signalling genes in the susceptibility for DD. In addition, the results demonstrate that genome-wide imputation increases the power of association studies for finding new risk loci for DD. Moreover, we performed whole transcriptome analysis of DD tissues compared with control tissues and found the Wnt signalling pathway significantly deregulated in DD. In summary, our pathway and network analyses, integrating GWAS and transcriptome data, indicate that DD may have a broad genetic basis. The pathogenesis involves Wnt signalling and several other biological processes including the regulation of focal adhesions and the deposition of the ECM.  Table. Summary of the three data sets used for association testing before and after quality control.  Table. IPA top regulator effects. (XLSX) S10 Table. IPA top upstream analysis. (XLSX) S11 Table. IPA top disease and function. (XLSX) S12 Table. IPA top pathway-Wnt signalling. (XLSX) S13 Table. Genes from dmGWAS top 5% modules. (XLSX) S14 Table. Functional annotation with DAVID for the 121 genes from the dmGWAS top 5% modules. (XLSX) S15 Table. Genes from iPINBPA top subnetwork. (XLSX) S16 Table. DAVID functional enrichment analysis of the top network from iPINBPA. (XLSX) S17 Table. Overlap between dmGWAS 5% and iPINBPA top module. (XLSX)