Combining network-guided GWAS to discover susceptibility mechanisms for breast cancer

Systems biology provides a comprehensive approach to biomarker discovery and biological hypothesis building. Indeed, it allows to jointly consider the statistical association between gene variation and a phenotype, and the biological context of each gene, represented as a network. In this work, we study six network methods which identify subnetworks with high association scores to a phenotype. Specifically, we examine their utility to discover new biomarkers for breast cancer susceptibility by interrogating a genome-wide association study (GWAS) focused on French women with a family history of breast cancer and tested negative for pathogenic variants in BRCA1 and BRCA2. We perform an in-depth benchmarking of the methods with regards to size of the solution subnetwork, their utility as biomarkers, and the stability and the runtime of the methods. By trading statistical stringency for biological meaningfulness, most network methods give more compelling results than standard SNPand gene-level analyses, recovering causal subnetworks tightly related to cancer susceptibility. For instance, we show a general alteration of the neighborhood of COPS5, a gene related to multiple hallmarks of cancer. Importantly, we find a significantly large overlap between the genes in the solution networks and the genes significantly associated in the largest GWAS on susceptibility to breast cancer. Yet, network methods are notably unstable, producing di erent results when the input data changes slightly. To account for that, we produce a stable consensus subnetwork, formed by the most consistently selected genes. The stable consensus is composed of 68 genes, enriched in known breast cancer susceptibility genes (BLM, CASP8, CASP10, DNAJC1, FGFR2, MRPS30, and SLC4A7, Fisher’s exact test P-value = 3 ◊ 10≠4) and occupying more central positions in the network than average. The network seems organized around CUL3, encoding an ubiquitin ligase related protein that regulates the protein levels of


Introduction
In human health, genome-wide association studies (GWAS) aim at quantifying how singlenucleotide polymorphisms (SNPs) predispose to complex diseases, like diabetes or some forms of cancer [12]. To that end, in a typical GWAS thousands of unrelated samples are genotyped: the cases, su ering the disease of interest, and the controls from the general population. Then, a per-SNP statistical association test is conducted (e.g. based on logistic regression). Those SNPs with a P-value lower than a conservative Bonferroni threshold are candidates to further studies in an independent cohort. Once the risk SNPs have been discovered, they can be used for risk assessment, and to deepen our understanding of the disease.
GWAS have successfully identified thousands of variants underlying many common diseases [11]. However, this experimental setting also presents intrinsic challenges. Some of them stem from the high dimensionality of the problem, as every GWAS to date studies more variants than samples are genotyped. This limits the statistical power of the experiment, as only variants with larger e ects can be detected [57]. This is particularly problematic since the prevailing view is that most genetic architectures involve many variants with small e ects [57]. Additionally, to avoid false positives, a conservative multiple test correction is applied, typically the previously mentioned Bonferroni correction. However, Bonferroni correction is overly conservative when the statistical tests are correlated, as is the case in GWAS [59]. Another open issue is the interpretation of the results, as the functional consequences of most common variants are not well understood. On top of that, recent large-sampled studies suggest that numerous loci spread all along the genome contribute to a degree to any complex trait, in accordance with the infinitesimal model [5]. The recently proposed omnigenic model [8] o ers an explanation: genes are very functionally inter-related and influence each other's behavior, which allows alterations in most genes to impact the subset of "core" genes directly involved in the disease's mechanism. Hence, a comprehensive statistical framework which includes the structure of biological data might address the aforementioned issues.
For this reason, many authors turn to network biology to handle the complex interplay of biomolecules that lead to disease [22]. As its name suggests, network biology models biology as a network, where the biomolecules under study, often genes, are nodes, and selected functional relationships are edges that link them. These relationships come from evidence that the genes jointly contribute to a biological function; for instance, their expression is correlated, or their products establish a protein-protein interaction. Under this view, complex diseases are not the consequence of a single altered gene, but of the interaction of multiple interdependent molecules [4]. In fact, an examination of biological networks shows that disease genes have di erential prop-erties [4,47]. This is particularly true for cancer driver genes, which tend to be key players in connecting di erent, densely-connected communities of genes. Therefore, studying the neighborhood of disease-associated genes is e ective at identifying new ones that are involved in the same biological functions [27].
Network-based biomarker discovery methods exploit this relatedness to identify disease genes on GWAS data [2]. In essence, each SNP has a score of association with the disease, given by the experiment, and functionally biological relationships, given by a network built on prior knowledge. Then, the problem becomes finding a functionally-related set of highly-scoring genes. Di erent solutions have been proposed to this problem, often stemming from divergent mathematical frameworks and considerations of what the optimal solution looks like. Some methods strongly constrain the problem to certain kinds of subnetworks. Such is the extreme case of LEAN [25], which focuses on "star" subnetworks, i.e. instances were both a gene and its direct interactors are associated with the disease. Other algorithms, like dmGWAS [31] and heinz [20], focus on larger subnetworks interconnecting genes with high association scores. However, they di er in their tolerance to the inclusion of low-scoring nodes, and the topology of the solution. Lastly, other methods also consider the topology of the network, favoring groups of nodes that are not only high-scoring, but also densely interconnected; such is the case of HotNet2 [32], SConES [3], and SigMod [36].
In this work, we analyze the e ectiveness of these six network methods for biomarker discovery on GWAS data. While all of them capture susceptibility mechanisms resembling that postulated by the omnigenic model, they do so in di erent ways, and provide a representative view of the field. We worked on the GENESIS dataset [54], a study on familial breast cancer conducted in the French population. After a classical GWAS approach, we use these network methods to recover additional breast cancer biomarkers. Lastly, we carry out a comparison of the solutions obtained by the di erent methods, and aggregate them to obtain a consensus network of predisposition to familial breast cancer.

GENESIS
The GENE Sisters (GENESIS) study was designed to investigate risk factors for familial breast cancer in the French population [54]. Index cases are patients with infiltrating mammary or ductal adenocarcinoma, who had a sister with breast cancer, and who have been tested negative for BRCA1 and BRCA2 pathogenic variants. Controls are una ected colleagues and/or friends of the cases, born around the year of birth of their corresponding case (± 3 years). We focused on the 2 577 samples of European ancestry, of which 1 279 are controls and 1 298 are cases. The genotyping was performed using the iCOGS array, a custom Illumina array designed to study genetic susceptibility of hormone-related cancers [51]. It contains 211 155 SNPs, including SNPs putatively associated with breast, ovarian, and prostate cancers, SNPs associated with survival after diagnosis, and SNPs associated to other cancer-related traits, as well as candidate functional variants in selected genes and pathways.

Preprocessing and quality control
We discarded SNPs with a minor allele frequency lower than 0.1%, those not in Hardy-Weinberg equilibrium in controls (P-value < 0.001), and those with genotyping data missing on more than 10% of the samples. A subset of 20 duplicated SNPs in FGFR2 were also removed. In addition, we removed the samples with more than 10% missing genotypes. After controlling for relatedness, 17 additional samples were removed (6 for sample identity error, 6 controls related to other samples, 2 cases being related to an index case, and 3 additional controls having a high relatedness score). Lastly, based on study selection criteria, 11 other samples were removed (1 control having cancer, 4 index cases with no a ected sister, 3 half-sisters, 1 sister with lobular carcinoma in situ, 1 with a BRCA1 or BRCA2 pathogenic variant detected in the family, 1 with unknown molecular diagnosis). The final dataset included 1 271 controls and 1 280 cases, genotyped over 197 083 SNPs.
We looked for population structure that could produce spurious associations. A principal component analysis revealed no visual di erential population structure between cases and controls (Supplementary Figure 1). Independently, we did not find evidence of genomic inflation (⁄ = 1.05) either, further confirming the absence of confounding population structure.

SNP and gene association
To measure association between a genotype and the phenotype, we performed a per-SNP 1 d.f. ‰ 2 allelic test using PLINK v1.90 [14]. Then, we used VEGAS2 [41] to compute the gene-level association score from the P-values of the SNPs mapped to them. Specifically, for each gene we only used the 10% of SNPs mapped to it with lowest P-values. We mapped SNPs to genes through their genomic coordinates: all SNPs located within the boundaries of a gene, ±50 kb, were mapped to that gene. We used the 62 193 genes described in GENCODE 31 [21], although only 54 612 could be mapped to at least one SNP. Out of those, we focused exclusively on the 32 767 that had a gene symbol. Out of the 197 083 SNPs remaining after quality control, 164 037 were mapped to at least one of these genes.
We used such mapping to compare the outputs of methods that produce SNP-lists to those that produce gene-lists, and vice versa. For the former, we considered any gene that can be mapped to any of the selected SNPs as selected as well. For the latter, we considered all the SNPs that can be mapped to that gene as selected by the method.

Mathematical notations
In this article, we use undirected, vertex-weighted networks, or graphs, G = (V, E, w). V = {v 1 , . . . , v n } refers to the vertices, with weights w : V ae R. Equivalently, E ™ {{x, y}|x, y oe V · x " = y} refers to the edges. When referring to a subnetwork S, V S is the set of nodes in S and E S is the set of edges in S. A special case of subgraphs are connected subgraphs, which occur when every node in the subgraph can be reached from any other node.
In addition to a weight, nodes have other properties, provided by the topology of the graph. In this article we focus on two of those: degree centrality, and betweenness centrality. The degree centrality, or degree, is the number of edges that a node has. The betweenness centrality, or betweenness, is the number of times a node participates in the shortest paths between two other nodes.
In addition, we use two matrices that describe di erent properties of a graph. The described matrices are square, and have as many rows and columns as nodes are in the network. The element (i, j) represents a selected relationship between v i and v j . The adjacency matrix W G contains a 1 when the corresponding nodes are connected, and 0 otherwise; the diagonal is zero. The degree matrix D G is a diagonal matrix which contains the degree of the di erent nodes.
GM network, we used the mapping described in Section 2.3.1. In all three the node scores are the association scores of the individual SNPs with the phenotype (1 d.f. ‰ 2 ). The properties of these three subnetworks are available in Supplementary Table 1. Genes that contribute to the same function are nearby in the PPIN, and can be topologically related to each other in diverse ways (densely interconnected modules, nodes around a hub, a path, etc.). But this is not the only aspect to model when developing a network method: how to score the nodes, whether the a ected mechanisms form a single connected component or several, how to frame the problem in a computationally e cient fashion, which network to use, etc. Unsurprisingly, multiple solutions have been proposed. We examined six of them: five that explore the PPIN, and one which explores SNP networks. We selected methods that were open source, had an implementation available, and an accessible documentation. Their main di erences are summarized in Table 1.

Network methods
dmGWAS dmGWAS seeks the subgraph with the highest local density in low P-values [31].
To that end it searches candidate subnetwork solutions using a greedy, "seed and extend", heuristic: 1. Select a seed node i and form the subnetwork S i = {i}.
2. Compute Stou er's Z-score Z m for S i as HotNet2 HotNet2 was developed to find connected subgraphs of genes frequently mutated in cancer [32]. To that end, it considers both the local topology of the network and the scores of the nodes. The former is captured by an insulated heat di usion process: at initialization, the score of the node determines its initial heat; iteratively each node yields heat to its "colder" neighbors, and receives heat from its "hotter" neighbors, while retaining part of its own (hence, insulated). This process continues until equilibrium is reached, and results in a di usion matrix F . F is used to compute the similarity matrix E that models exchanged heat as where diag(w(V )) is a diagonal matrix with the node scores in its diagonal. For any two nodes i and j, E ij models the amount of heat that di uses from node j to node i, which can be interpreted as a (non-symmetric) similarity between those two nodes. To obtain densely connected subnetworks, HotNet2 prunes E, only preserving edges such that w(E) > ". Lastly, HotNet2 evaluates the statistical significance of the subnetworks by comparing their size to the size of networks obtained by permuting the node scores. We assigned initial node scores as in Nakka et al. [43], assigning a score of 0 for the genes with low probability of being associated to the disease, and -log 10 (P-value) to those likely to be. In this dataset, the threshold separating both was a P-value of 0.125, which was obtained using a local FDR approach [52]. HotNet2 has two parameters: the restart probability -, and the threshold heat ". Both parameters are set automatically by the algorithm, which is robust to their values [32]. HotNet2 is implemented in Python [33].
LEAN LEAN searches altered "star" subnetworks, that is, subnetworks composed by one central node and all its interactors [25]. By imposing this restriction, LEAN is able to exhaustively test all such subnetworks (one per node). For a particular subnetwork of size m, the P-values corresponding to the involved nodes are ranked as p 1 AE . . . AE p m . Then, k binomial tests are conducted, to compute the probability of having k out of m P-values lower or equal to p k under the null hypothesis. The minimum of these k P-values is the score of the subnetwork. This score is transformed into a P-value through an empirical distribution obtained via a subsampling scheme, where gene sets of the same size are selected randomly, and their score computed. Lastly, P-values are corrected for multiple testing through a Benjamini-Hochberg correction. We used the implementation of LEAN from the LEANR R package [24].
SConES SConES searches the minimal, modular, and maximally associated subnetwork in a SNP graph [3]. Specifically, it solves the problem where ⁄ and ÷ are hyperparameters that control the sparsity and the connectivity of the model. The connectivity term penalizes disconnected solutions, with many edges between nodes that are selected and nodes that are not. Given a ⁄ and an ÷, the aforementioned problem has a unique solution, that SConES finds using a graph min-cut procedure. As in Azencott et al. [3], we selected ⁄ and ÷ by cross-validation, choosing the values that produce the most stable solution across folds. In this case, the selected hyperparameters were ÷ = 3.51, ⁄ = 210.29 for SConES GS; ÷ = 3.51, ⁄ = 97.61 for SConES GM; and ÷ = 3.51, ⁄ = 45.31 for SConES GI. We used the version on SConES implemented in the R package martini [16].
SigMod SigMod aims at identifying the highest-scoring, most densely connected gene subnetwork [36]. It addresses an optimization problem similar to that of SConES (Equation 1), but the connectivity term encourages connected solutions by favoring solutions where many edges connect two selected nodes, rather than penalizing disconnected ones.

arg max
As SConES, this optimization problem can also be solved by a graph min-cut approach.
SigMod presents three important di erences with SConES. First it is designed for gene-gene networks. Second, it favors subnetworks containing many edges. SConES, instead, penalizes connections between the selected and unselected nodes. Third, it explores the grid of hyperparameters di erently, and processes their respective solutions. Specifically, for the range of ⁄ = ⁄ min , . . . , ⁄ max for the same ÷, it prioritizes the solution with the largest change in size from ⁄ n to ⁄ n+1 . Such a large change implies that the network is densely interconnected. This results in one candidate solution for each ÷, which are processed by removing any node not connected to any other. A score is assigned to each candidate solution by summing their node scores and normalizing by size. The candidate solution with the highest standardized score is the chosen solution. SigMod is implemented in an R package [35].

Consensus
We built a consensus network by retaining the nodes that were selected by at least two of the six methods (using SConES GI for SConES).

Pathway enrichment analysis
We searched for pathways enriched in the gene subnetworks produced by the above methods. We conducted an hypergeometric test on pathways from Reactome [30] using the R package Reac-tomePA [63]. The universe of genes included any gene that we could map to a SNP in the iCOGS array (Section 2.3.1). We adjusted the P-values for multiple testing as in Benjamini and Hochberg [7] (BH).Pathways with an BH adjusted P-value < 0.05 were deemed significant. As the significant pathways are often overlapping and redundant, the results were manually curated afterwards.

Evaluation of methods
We evaluated multiple properties (described below) of the di erent methods through a 5-fold subsampling setting. We applied each method to 5 random subsets of the original dataset containing 80% of the samples (train set). When pertinent, we evaluated the solution on the remaining 20% (test set). We used the 5 repetitions to estimate the average and the standard deviation of the di erent measures.

Properties ot the solution
We compared the runtime, the number of selected genes/SNPs, and the stability (sensitivity of the result to small changes in the input, here, using di erent train sets) of the di erent network methods. The stability was quantified using the Pearson correlation to measure the overlap between di erent runs as suggested by Nogueira and Brown [46].

Classification accuracy of selected SNPs
A desirable solution o ers good predictive power on unseen test samples. We evaluated the predicting power of the SNPs selected by the di erent methods through the performance of an L1penalized logistic regression classifier, a machine learning algorithm that searches a small subset of SNPs which provide good classification accuracy. We trained the classifier exclusively on those selected SNPs to predict the outcome (case/control). The L1 penalty helps to account for linkage disequilibrium by reducing the number of SNPs included in the model (active set), while improving the generalization of the classifier. This penalty was set by cross-validation, choosing the value that minimized misclassification error. We applied each network method to each train set, and trained the classifier on it as well using only on the selected SNPs. When the method retrieved a list of genes (all of them except SConES), we considered as selected all the SNPs mapped to any of those genes. Then we evaluated the sensitivity and the specificity on the test set. The active set gave an estimate of a plausible, more sparse solution with a comparable predictive power to the original solution. To obtain a baseline, we also trained the classifier on all the SNPs. We do not expect a linear model on selected SNPs to be able to separate cases from controls well. Indeed, the lifetime incidence of breast cancer among women with a family history of breast or ovarian cancer, and no BRCA1/2 mutations, is only 3.9 times more than in the general population [39]. However, classification accuracy may be one additional informative criterion on which to evaluate solutions.

Comparison to state-of-the-art
An alternative way to evaluate the results is comparing our results to an external dataset. For that purpose, we recovered a list of 153 genes associated to familial breast cancer from DisGeNET [48]. Across this article we refer to these genes as breast cancer susceptibility genes.
Additionally, we used the summary statistics from the Breast Cancer Association Consortium (BCAC), a meta-analysis of case-control studies conducted in multiple countries which included 13 250 642 SNPs genotyped or imputed on 228 951 women of European ancestry mostly from the general population [40]. Hence, a high proportion of breast cancer cases investigated in BCAC are sporadic (not selected according to family history), while GENESIS is a homogeneous dataset not included in BCAC and which focus on the French high-risk population attending the family cancer clinics. Despite these di erences, we expect some degree of shared genetic architecture, especially at the gene level. For that purpose, we searched associated genes as in Section 2.3.1. We provided VEGAS2 with both the summary statistics of all available SNPs, and the genotypes from European samples from the 1000 Genomes Project [56] to compute the LD patterns.

Code availability
We developed computational pipelines for several steps of GWAS analyses, such as physically mapping SNPs to genes, computing gene scores, and performing six di erent network analyses. For each of those processes, we created a pipeline with a clear interface that should work on any GWAS dataset. They are compiled in https://github.com/hclimente/gwas-tools. Although the GENESIS data is not public, the code to apply the pipelines to this data, as well as the code that reproduces all the analyses in this article are available at https://github.com/hclimente/genewa. We deposited all the produced gene subnetworks on NDEx (http://www.ndexbio.org), under the UUID e9b0e22a-e9b0-11e9-bb65-0ac135e8bacf.

Conventional SNP-and gene-based analyses confirm that FGFR2 locus is associated with familial breast cancer
We conducted association analyses in the GENESIS dataset at both SNP and gene levels (Section 2.3.1). Two genomic regions have a P-value lower than the Bonferroni threshold on chromosomes 10 and 16 (Supplementary Figure 2A). The former overlaps with gene FGFR2 ; the latter with CASC16, and it is located near the protein-coding gene TOX3. Variants in both FGFR2 and TOX3 have been repeatedly associated with breast cancer susceptibility in other case-control studies [40], BRCA1 and BRCA2 carrier studies [42], and in hereditary breast and ovarian cancer families negative for mutations in BRCA1 and BRCA2 [50]. In our studied population, only FGFR2 was significantly associated with breast cancer at the gene-level (Supplementary Figure 2B). Closer examination reveals two regions (3p24 and 8q24) having low, albeit not genome-wide significant, P-values. Both of them have been associated to breast cancer susceptibility in the past [10,53]. We applied an L1-penalized logistic regression on the whole dataset (Section 2.5.2). It selected 100 SNPs, both from all aforementioned regions and new ones (Supplementary Figure 2C). Yet, it is unclear why those SNPs were selected, as emphasized by the high P-value of some of them, which further complicates the biological interpretation. Moreover, and in opposition to what would be expected under the omnigenic model, the genes to which these SNPs map to (Section 2.3.1) are not interconnected in the PPIN (Section 2.3.3). In addition, the classification performance of the method is very low, and L1-penalized methods select only one of several correlated variables and are prone to instability, which further complicates interpretation. This motivates exploring network methods, which trade statistical significance for biological relevance to find susceptibility subnetworks. In fact, such methods provided comparably (poor) classification performance to L1-penalized logistic regression ( Figure 3B), while providing more interpretable solutions. We applied six network methods to the GENESIS dataset (Section 2.3.4). As none of the networks examined by LEAN was significant (BH adjusted P-value < 0.05), we obtained six solutions ( Figure 1): one for each of the remaining four gene-based methods, one for SConES GI (which works at the SNP level), and the consensus. These solutions di er in many aspects, making it hard to draw joint conclusions. For starters, the overlap between the genes featured in each solution is quite small ( Figure 1B). Another prominent di erence is their size: the largest solution, produced by HotNet2, contains 440 genes, while heinz's contained only 4 genes. While SConES GI failed to recover any protein coding gene, by dealing with SNP networks it retrieved four subnetworks in intergenic regions, and another one overlapping an RNA gene (RNU6-420P). Their topologies also di er, as measured by the median centrality (Table 2) and the number of connected com- ponents ( Figure 1D). Only two methods have more than one connected component: SConES, as described above, and HotNet2. HotNet2 produced 135 subnetworks, 115 of which have less than five genes. The second largest subnetwork (13 nodes) contains the two breast cancer susceptibility genes CASP8 and BLM. Lastly, a pathway enrichment analysis (Section 2.4) also showed similarities and di erences in the underlying mechanisms. It linked di erent parts of SigMod's solution network to four processes: protein translation (including mitochondrial), mRNA splicing, protein misfolding, and keratinization (BH adjusted P-values < 0.03). Interestingly, dmGWAS solution is also related to protein misfolding (attenuation phase, BH adjusted P-value = 0.01). But, additionally, it includes submodules of proteins related to mitosis, DNA damage, and regulation of TP53 (BH adjusted P-values < 0.05), which match previously known mechanisms of breast cancer susceptibility [44]. As with SigMod, the genes in HotNet2's solution are involved in mitochondrial translation (BH adjusted P-value = 1.87 ◊ 10 -4 ), but also in glycogen metabolism and transcription of nuclear receptors (BH adjusted P-value < 0.04).

Network methods successfully identify genes associated with breast cancer
Despite their di erences, there are additional common themes. All obtained solution subnetworks have lower association P-values than the whole PPIN (median P-value π 0.46, Table 2), despite containing genes with higher P-values as well ( Figure 1A). This exemplifies the trade-o between statistical significance and biological relevance. However, there are nuances between solutions in this regard: heinz strongly favored genes with lower P-values, while dmGWAS was less conservative (median P-values 0.0012 and 0.19, respectively); SConES tended to select whole LD-blocks; and HotNet2 and SigMod were less likely to select low scoring genes. Additionally, the solution subnetworks presented other desirable properties. First, five of them were enriched in known breast cancer susceptibility genes (consensus, dmGWAS, heinz, HotNet2, and SigMod, Fisher's exact test one-sided P-value < 0.03). Second, the genes in four solution subnetworks displayed on average a higher betweenness centrality than the rest of the genes, a di erence that is significant in four solutions (consensus, dmGWAS, HotNet2, and SigMod, Wilcoxon rank-sum test P-value < 1.4 ◊ 10 -21 ). This agrees with the notion that disease genes are more central than other non-essential genes [47], an observation that holds in breast cancer (one-tailed Wilcoxon rank-sum test P-value = 2.64 ◊ 10 -5 when comparing the betweenness of known susceptibility genes versus the rest). Interestingly, SConES selected SNPs that are also more central than the average SNP (Supplementary table 1), suggesting that causal SNPs are also more central than non associated SNPs.

A case study: the consensus network
Despite the heterogeneity of the solutions, their shared properties suggest that each method captures di erent aspects of cancer susceptibility. Indeed, only 20 genes are common to more than two solutions (Supplementary Figure 5A), but encouragingly, the more methods selected a gene, the higher its association score to the phenotype (Supplementary Figure 5B). To leverage on their strengths and compensate their respective weaknesses, we built a consensus subnetwork that captures the mechanisms most shared among the solution subnetworks (Section 2. . Each node is represented by a pie chart, which shows the methods that selected it. We labeled (and enlarged) the two most central genes (COPS5 and OFD1 ) and those genes that are known breast cancer susceptibility genes and/or significantly associated with breast cancer susceptibility in the BCAC dataset. The latter ones are also colored in pink. All gene names are indicated in Supplementary Figure 4. solutions: enrichment in breast cancer susceptibility genes and higher betweenness centrality than the rest of the genes.
A pathway enrichment analysis of the genes in the consensus network also shows similar pathways as the individual solutions. We found two involved mechanisms: mitochondrial translation and attenuation phase. The former is supported by genes like MRPS30 (VEGAS2 P-value = 0.001), which encode a mitochondrial ribosomal protein and was also linked to breast cancer susceptibility [49]. Interestingly, increased mitochondrial translation has been found in cancer cells [64], and its inhibition proposed as a therapeutic target. With regards to attenuation phase of heat shock response, it involves three Hsp70 chaperones: HSPA1A, HSPA1B, and HSPA1L. The genes encoding these proteins are all near each other at 6p21, in the region known as HLA. In fact, out of the 22 SNPs that map to any of these three genes, 9 map to all of them, and 4 to two, making hard to disentangle their e ects. HSPA1A was the most strongly associated gene (VEGAS2 P-value = 8.37 ◊ 10 -4 ).
Topologically the consensus consists of a connected component composed of 49 genes, and multiple smaller subnetworks. Among the latter, 19 genes are in subnetworks containing a single gene or two connected nodes, implying that they do not have a consistently altered neighborhood, but are strongly associated themselves and hence picked by at least two methods. The opposite would be the case of highly central genes in the PPIN, a property which is weakly anti-correlated with the P-value of association to the disease (Pearson correlation coe cient = -0.26, Supplementary Figure 5D). This suggests that they were selected because they were on the shortest path between two highly associated genes. In view of this, we hypothesize that highly central genes might contribute to the heritability through alterations of their neighborhood, consistently with the omnigenic model of disease [8]. For instance, the most central node in the consensus network is COPS5, a component of the COP9 signalosome which regulates multiple signalling pathways. COPS5 is related to multiple hallmarks of cancer and is overexpressed in multiple tumors, including breast and ovarian cancer [34]. Despite its lack of association in GENESIS or BCAC (VEGAS2 P-value of 0.22 and 0.14 respectively), its neighbors in the consensus subnetwork have consistently low P-values (median VEGAS2 P-value = 0.006).

Network methods boost biomarker discovery
We compared the results of di erent network methods to the European sample of the Breast Cancer Association Consortium (BCAC) [40], the largest GWAS to date on breast cancer (Section 2.5.3). Although BCAC case-control studies do not necessarily target cases with a family history of breast cancer, this comparison is pertinent since we expect a shared genetic architecture at the gene level, at which most network metods operate. This shared genetic architecture, together with BCAC's scale (90 times more samples than GENESIS) provides a reasonable counterfactual of what we would expect if GENESIS had a larger sample size. We computed a gene association score on BCAC, in an equivalent way to the one described in Section 2.3.1. The solutions provided by the di erent network approaches overlap significantly with BCAC findings (Fisher's exact test P-value < 0.019). The gene-based network methods achieve comparable precision (2%-25%) and recall (1.3-12.1%) at recovering BCAC-significant genes ( Figure 1C). Interestingly, while SConES GI at the SNP-level achieves a similar recall (8.6%), it shows a much higher precision (47.3%).

Network methods share limitations
We compared the six network methods in a 5-fold subsampling setting (Section 2.5). Specifically, we measured five properties (Figure 3): size of the solution subnetwork; sensitivity and specificity of an L1-penalized logistic regression classifier on the selected SNPs; stability; and computational runtime. The solution size varies greatly between the di erent methods ( Figure 3A). Heinz produced the smallest solutions, with an average of 182 selected SNPs. The largest solutions came from SConES GI (6 256.6 SNPs), and dmGWAS (4 255.0 SNPs). To determine whether the selected SNPs could be used for patient classification, we computed the performance of the classifier on the test dataset ( Figure 3B). The di erent classifiers displayed similarly poor sensitivities and specificities, all in the 0.52 -0.56 range. Interestingly, the classifier trained on all the SNPs had a similar performance, despite being the only method aiming only at minimizing prediction error. It should be considered that, although these performances are low, we do not expect to separate cases from controls well using exclusively genetic data.
Another desirable quality of an algorithm is the stability of the solution with regards to small changes in the input (Section 2.5.1). Both heinz and LEAN displayed a high stability in our benchmark, consistently selecting the same genes and no genes over the 5 subsamples, respectively ( Figure 3C). Conversely, the other methods displayed similarly low stabilities.
In terms of computational runtime, the fastest method was heinz ( Figure 3D), which returned a solution in a few seconds. HotNet2 was the slowest (3 days and 14 hours on average). Including the time required to compute the gene scores, however, slows down considerably gene-based methods; on this benchmark, that step took on average 1 day and 9.33 hours. Considering that, it took 5 days on average for HotNet2 to produce a result.

Network topology matters, and might lead to ambiguous results
As shown above, and despite their similarities, the network methods produced remarkably di erent solutions. This is due to the particularities of each methods, and directly or indirectly provide information about the dataset. For instance, the fact that LEAN did not return any biomarker implies that there is no gene such that both itself and its environment are on average strongly associated with the disease.
In this dataset, heinz's solution is very conservative, providing a small solution with the lowest median P-value for the subnetwork ( Table 2). Due to this parsimonious and highly associated solution, it was the best method to stably select a set of biomarkers ( Figure 3C). Its conservativeness stems from its preprocessing step, which models the gene P-values as a mixture model of a beta distribution and a uniform distribution, controlled by an FDR parameter. Due to the limited . For gene network-based methods, inverted triangles represent the runtime of the algorithm itself, and circles the total time, which includes the algorithm themselves and the additional 119 980 seconds (1 day and 9.33 hours) that VEGAS2 took on average to compute the gene scores from SNP summary statistics. We highlighted the genes selected by each method, and the ones selected by more than one ("Consensus"). We labeled the three most central genes that were picked by any method.
(C) Overlap between the solutions of SConES GS, GM or GI in the di erent genomic regions. SNPs that were not selected in the studied network, but were selected in another one, are displayed in background color.
signal at the gene level in this dataset (Supplementary Figure 2B), only 36 of all the genes retain a positive score after that transformation. Yet, this small solution does not provide much insight into the susceptibility mechanisms to cancer. Importantly, it ignores genes that are associated to cancer in this dataset like FGFR2.
On the other end of the spectrum, dmGWAS, HotNet2, and SigMod produced large solutions. dmGWAS' subnetwork is the least associated subnetwork on average. This is due to the greedy framework it uses, which has a bias for larger solutions [45]. It considered all nodes at distance 2 of the examined subnetwork, and accepted a weakly associated genes if it was linked to another, strongly associated one. This is exacerbated when the results of successive greedy searches are aggregated, leading to a large, tightly connected cluster of unassociated genes ( Figure 4A). This relatively low signal-to-noise ratio combined with the large solution requires additional analyses to draw conclusions, such as enrichment analyses. In the same line, HotNet2's subnetwork is even harder to interpret, being composed of 440 genes divided into 135 subnetworks. Lastly, SigMod misses some of the most strongly associated, breast cancer susceptibility genes in the dataset, like FGFR2 and TOX3.
Another peculiarity of network methods is their relationship to degree centrality. On the one hand we observed that highly central genes often had no association to disease ( Figure 4B). On the other, network methods favor central genes, as they often connect high scoring nodes. This was specially the case of SigMod, which selected three highly central, unassociated genes: COPS5, CUL3 and FN1. As we showed in Section 3.3, and will show in 3.7, there is evidence in the literature of the contribution of the first two to breast cancer susceptibility. With regards to FN1, it encodes a fibronectin, a protein of the extracellular matrix involved in cell adhesion and migration. Overexpression of FN1 has been observed in breast cancer [28], and it is negatively correlated with poor prognosis in other cancer types [62,55].
By virtue of using a SNP subnetwork, SConES analyzes each SNP in their functional context. It therefore can select SNPs in genes without any associated interactor, as well as SNPs in noncoding regions or in non-interacting genes. In fact, due to linkage disequilibrium, SConES favors such genes, as selecting SNPs in an LD-block which overlaps with a gene favors selecting the rest of the gene. This might explain why SConES produces similar results on the GS and GM networks, heavily a ected by linkage disequilibrium (Supplementary Figure 3). On the other hand, SConES penalizes selecting SNPs and not their neighbors. This makes it conservative regarding SNPs with many interactions, like those mapped to hub genes in the PPIN. For this reason, SConES GI did not select any protein coding gene, despite selecting similar regions as SConES GS ( Figure 4C). In fact SConES GS and SConES GM select regions related to breast cancer, like 3p24 (SLC4A7 /NEK10 [1]), 5p12 (FGF10, MRPS30 [49]), 10q26 (FGFR2 ), and 16q12 (TOX3 ). On top of those SConES GS selects region 8q24 (POU5F1B [9]). We hypothesize that the lack of results on the PPIN network of SConES GI and LEAN are due to the same cause: the absence of joint association of a gene and a majority of its neighbors. Although in the case of SConES other hyperparameters could lead to a more informative solution (e.g. a lower ⁄ in Equation 1), it is unclear what the best strategy to find them is. In addition, due to the design of the iCOGS array, the genome of GENESIS participants has not been unbiasedly surveyed: some regions are fine-mapped -which might distort gene structure in GM and GI networks -while others are under studied -hindering the accuracy with which the GS network captures the genome structure. . Each node is represented by a pie chart, which shows the methods that selected it. We labeled (and enlarged) the most central genes (CUL3 ) and those genes that are known breast cancer susceptibility genes and/or significantly associated with breast cancer susceptibility in the BCAC dataset. The latter ones are also colored in pink. All gene names are indicated in Supplementary  Figure 6.

Adjusting for instability preserves global network properties
Most of the network methods, including the consensus, were highly unstable, raising questions about the reliability of the results. We built a new, stable consensus network using the genes selected most often across the 30 solutions obtained by running the 6 methods on 5 di erent splits of the data (Section 2.5). Such a network is expected to capture the subnetworks more often found altered, and hence should be more resistant to noise. We used only genes selected in at least 7 of the solutions, which corresponded to 1% of all genes selected at least once. The resulting stablilitybased consensus was composed of 68 genes ( Figure 5). This network shares most of the proterties of the consensus: breast cancer susceptibility genes are overrepresented (P-value = 3 ◊ 10 -4 ), as well as genes involved in mitochondrial translation and the attenuation phase (adjusted P-values 0.001 and 3 ◊ 10 -5 respectively); the selected genes are more central than average (P-value = 1.1 ◊ 10 -14 ); and a considerable number of nodes (19) are isolated.
However, although this new network exhibits similar global properties as the previous one, the lack of stability results in di erent genes being selected. In this case, the most central gene is CUL3, which is absent from the previous consensus network and has a low association score in both GENESIS and BCAC (P-values of 0.04 and 0.26, respectively). This gene is a component of Cullin-RING ubiquitin ligases. Encouragingly, it impacts the protein levels of multiple genes relevant for cancer progression [15], and its overexpression was also linked to increased sensitivity to carcinogens [38].

Discussion
In recent years, the ability of GWAS to unravel the mechanisms leading to complex diseases has been called into question [8]. On the one hand, the omnigenic model proposes that gene functions are interwoven with each other in a dense co-function network. The practical consequence is that larger and larger GWAS will lead to the discovery of an uninformative wide-spread pleiotropy. On the other hand, discovery in GWAS is hindered by a conservative statistical framework. Network methods tackle these two issues by using both the association score and an interaction network to take into consideration the biological context of each of the genes and SNPs. Based on what could be considered diverse interpretations of the omnigenic model, several methods for network-guided biomarker discovery have been proposed in recent years. In this article we evaluated the relevance of six of them by examining a GWAS dataset on familial breast cancer.
Most of the network methods produced a relevant subset of biomarkers, recapitulating known breast cancer susceptibility genes. In general, the selected genes and SNPs were more central than average, in accordance with the observation that disease genes are more relatively central [47]. However, very central nodes are also more likely to be connecting any given random pair of nodes, making them more likely to be selected by these network methods. Across this article we show that highly central genes that were selected (COPS5, CUL3 and FN1 ) could plausibly be involved in breast cancer susceptibility. Yet, further work is needed to characterize the impact of centrality on network methods' outputs. Despite these similarities, the solutions were notably di erent. At one end of the spectrum, SConES and heinz preferred small, highly associated solutions, providing a conveniently short list of biomarkers, at the expense of not shedding much light on the etiology of the disease. On the other end, SigMod and dmGWAS gravitate towards larger, less associated solutions which provide a wide overview of the biological context. While this deepens our understanding of the disease and provide biological hypotheses, they require further analyses, which might deter unexperienced practitioners. HotNet2 balances both approaches at the expense of producing the largest solution: a constellation of many, highly associated, small subnetworks. Additionally, all solutions share two drawbacks. First, they are all equally bad at discriminating cases from controls.
Yet, the classification accuracy of network methods is similar to that of a machine learning classifier, which suggests that cases and controls are di cult to separate in this dataset. This might be due to unaccounted for environmental factors, and limited statistical power, which reduces the ability to identify relevant SNPs. Second, all methods are remarkably unstable, yielding di erent solutions for slightly di erent inputs. This might partly be caused by the instability of the P-values themselves in low statistical power settings [26]. Hence, heinz's conservative transformation of P-values, which favors only the most extreme ones, leads to improved stability. Another source of instability might be the redundancy inherent to biological networks.
To overcome the limitations of the individual methods while exploiting their strengths, we proposed combining them into a consensus subnetwork. We use a straightforward strategy of including any node that was recovered by multiple methods. We proposed two networks: a consensus network that tackled the heterogeneity of the solutions in the full dataset, and a stable consensus network, that addressed the instability of the methods. They both synthesized the altered mechanisms: they both included the majority of the strongly associated smaller solutions and captured genes and broader mechanisms related to cancer. Thanks to their smaller size and their network structure, they provided compelling hypotheses on genes like COPS5 and CUL3, which lack genome-wide association with the disease, but who are related to cancer at the expression level and whose neighborhood has consistent high association scores. Crucially, the consensus was as unstable as the tested methods, while the stable consensus shared these properties while accounting for instability. This supports that instability might be caused by redundant but equivalent biological mechanisms, and hence validates the conclusions obtained on the individual solutions and the consensus.
The strength of network-based analyses comes from leveraging prior knowledge to boost discovery. In consequence, they show their shortcomings with respect to understudied genes, especially those not in the network. Out of the 32 767 genes that we can map the genotyped SNPs to, 60.7% (19 887) are not in the protein-protein interaction network. The majority of those (14 660) are non-coding genes, mainly lncRNA, miRNA, and snRNA (Supplementary Figure 7). Yet, RNA genes like CASC16 are associated to breast cancer (Section 3.1), reminding us of the importance of using networks beyond coding genes. In addition, even protein-coding genes linked to breast cancer susceptibility [1], like NEK10 (P-value 1.6 ◊ 10 -5 , located near SLC4A7 ) or POU5F1B, were absent from the network. However, on average protein-coding genes absent from the PPIN are less associated with this phenotype (Wilcoxon rank-sum P-value = 2.79 ◊ 10 -8 , median P-values of 0.43 and 0.47). As we are using interactions from high-throughput experiments, such di erence cannot be due to well-known genes having more known interactions. As disease genes tend to be more central [47], we hypothesize that it is due to interactions between central genes being more likely. It is worth noting that network approaches that do not use PPIs, like SConES GS and GM, did recover SNPs in NEK10 and CASC16. Moreover, both SConES GM and GI recovered intergenic regions, which might contain key regulatory elements [23] and, yet, are excluded from gene-centric approaches. This shows the potential of SNP networks, in which SNPs are linked when there is evidence of co-function, to perform network-guided GWAS even in the absence of gene-level interactions. Lastly, all the methods are heavily a ected by how SNPs are mapped to genes. In Section 3.3 we highlight ambiguities that appear when genes overlap or are in linkage disequilibrium. In fact, the presented case is paradigmatic, since the genes are in the most genedense region of the genome [61]. Network methods are prone to selecting such genes when they are functionally related, and hence interconnected in the network, but might be more resilient to them when the overlapping genes are unrelated. Making use of more targeted mappings of SNPs to genes (e.g. eQTLs, SNPs associated to the expression of a gene), altogether with a stringent LD pruning, might address such problems.
As not all databases compile the same interactions, the choice of the PPIN determines the final output. In this work we used exclusively interactions from HINT from high-throughput experiments. This responds to concerns about adding interactions identified in targeted studies and prone to a "rich getting richer" phenomenon: popular genes have a higher proportion of their interactions described [13,17], and they might bias discovery by reducing the average shortest path length between two random nodes. On the other hand, Huang et al. [27] found that the best predictor of the performance of a network for disease gene discovery is the size of the network, which supports using the largest amount of interactions. When we compared the impact of using a larger network containing interactions from both high-throughput experiments and the literature (Section 2.3.3), we found that for most of the methods it did not greatly change the size or the stability of the solution, the classification accuracy, or the runtime (Supplementary Figure 8). This supports using only interactions from high-throughput experiments, which produces apparently similar solutions and avoids falling into "circular reasonings", where the best known genes are artificially pushed into the solutions.
A crucial step for the gene based methods is the computation of the gene score. In this work we used VEGAS2 [41] due to the flexibility it o ers to use user-specified gene annotations. However, it presents known problems (selection of an appropriate percentage of top SNPs, long runtimes and P-value precision limited to the number of permutations [43]), and other algorithms [43,29,58] might have more statistical power. Another important decision is how to handle LD in a GWAS. VEGAS2 accounts for LD patterns, and hence an LD pruning step would not impact gene-based network methods, although it would speed up VEGAS2's computation time. With regards to SConES, fewer SNPs would lead to simpler SNP networks and, possibly, shorter runtimes. However, as mentioned in Section 3.6, LD patterns seem paramount to SConES' solutions, and an LD pruning step could potentially alter them.
In order to produce the consensus networks, we faced the di erent interfaces, preprocessing steps, and unexpected behaviors of the various methods. To facilitate that other authors apply them to new datasets and aggregate their solutions, we built six nextflow pipelines [18] with a consistent interface and, whenever possible, parallelized computation. They are available on GitHub: https: //github.com/hclimente/gwas-tools. Importantly, those methods that had a permissive license were compiled into a Docker image for easier use, which is available on Docker Hub hclimente/gwastools.