Biological networks and GWAS: comparing and combining network methods to understand the genetics of familial breast cancer susceptibility in the GENESIS study

Network approaches to disease use biological networks, which model functional relationships between the molecules in a cell, to generate hypotheses about the genetics of complex diseases. Several among them jointly consider gene scores, representing the association between each gene and the disease, and the biological context of each gene, modeled by a network. Here, we study six such network methods using gene scores from GENESIS, a genome-wide association study (GWAS) on French women with non-BRCA familial breast cancer. We provide a critical comparison of these six methods, discussing the impact of their mathematical formulation and parameters. Using a biological network yields more compelling results than standard GWAS analyses. Indeed, we find significant overlaps between our solutions and the genes identified in the largest GWAS on breast cancer susceptibility. We further propose to combine these solutions into a consensus network, which brings further insights. The consensus network contains COPS5, a gene related to multiple hallmarks of cancer, and 14 of its neighbors. The main drawback of network methods is that they are not robust to small perturbations in their inputs. Therefore, we propose a stable consensus solution, formed by the most consistently selected genes in multiple subsamples of the data. In GENESIS, it is composed of 68 genes, enriched in known breast cancer susceptibility genes (BLM, CASP8, CASP10, DNAJC1, FGFR2, MRPS30, and SLC4A7, P-value = 3 × 10 4) and occupying more central positions in the network than most genes. The network is organized around CUL3, which is involved in the regulation of several genes linked to cancer progression. In conclusion, we showed how network methods help overcome the lack of statistical power of GWAS and improve their interpretation. Project-agnostic implementations of all methods are available at https://github.com/hclimente/gwas-tools. Author summary Genome-wide association studies (GWAS) scan thousands of genomes to identify variants associated with a complex trait. Over the last 15 years, GWAS have advanced our understanding of the genetics of complex diseases, and in particular of hereditary cancers. However, they have led to an apparent paradox: the more we perform such studies, the more it seems that the entire genome is involved in every disease. The omnigenic model offers an appealing explanation: only a limited number of core genes are directly involved in the disease, but gene functions are deeply interrelated, and so many other genes can alter the function of the core genes. These interrelations are often modeled as networks, and multiple algorithms have been proposed to use these networks to identify the subset of core genes involved in a specific trait. This study applies and compares six such network methods on GENESIS, a GWAS dataset for familial breast cancer in the French population. Combining these approaches allows us to identify potentially novel breast cancer susceptibility genes and provides a mechanistic explanation for their role in the development of the disease. We provide ready-to-use implementations of all the examined methods.


Introduction
In human health, genome-wide association studies (GWAS) aim at quantifying how 17 single-nucleotide polymorphisms (SNPs) predispose to complex diseases, like diabetes or 18 some forms of cancer [1]. To that end, in a typical GWAS, thousands of unrelated 19 samples are genotyped: the cases, suffering from the disease of interest, and the controls, 20 taken from the general population. Then, a statistical test of association (e.g., based on 21 logistic regression) is conducted between each SNP and the phenotype. Those SNPs 22 with a P-value lower than a conservative Bonferroni threshold are candidates to further 23 studies in an independent cohort. Once the risk SNPs have been discovered, they can be 24 used for risk assessment and deepening our understanding of the disease. 25 GWAS have successfully identified thousands of variants underlying many common 26 diseases [2]. However, this experimental setting also presents inherent challenges. Some 27 of them stem from the high dimensionality of the problem, as every GWAS to date 28 studies more variants than samples are genotyped. This limits the statistical power of 29 the experiment, as it can only detect variants with larger effects [3]. This is particularly 30 problematic since the prevailing view is that most genetic architectures involve many 31 variants with small effects [3]. Additionally, to avoid false positives, most GWAS apply 32 a conservative multiple test correction, typically the previously mentioned Bonferroni 33 correction. However, Bonferroni correction is overly conservative when the statistical 34 tests correlate, as happens in GWAS [4]. Another open issue is the interpretation of the 35 results, as the functional consequences of most common variants are unknown. On top 36 of that, recent large-sampled studies suggest that numerous loci spread all along the 37 genome contribute to a degree to any complex trait, in accordance with the infinitesimal 38 model [5]. The recently proposed omnigenic model [6] offers an explanation: genes are 39 strongly interrelated and influence each other's function, which allows alterations in 40 most genes to impact the subset of "core" genes directly involved in the disease's 41 mechanism. Hence, a comprehensive statistical framework that includes the structure of 42 biological data might help alleviate the issues above. 43 For this reason, many authors turn to network biology to handle the complex 44 interplay of biomolecules that lead to disease [7,8]. As its name suggests, network applying these six network methods to GWAS data. They use different interpretations 73 of the omnigenic model and provide a representative view of the field. We worked on 74 the GENESIS dataset [19], a study on familial breast cancer conducted in the French 75 population. After a classical GWAS approach, we used these network methods to 76 identify additional breast cancer susceptibility genes. Lastly, we compared the solutions 77 obtained by the different methods and aggregated them to obtain consensus solutions of 78 predisposition to familial breast cancer that addressed their shortcomings. 79 2 Materials and methods 80 2.1 GENESIS dataset, preprocessing, and quality control 81 The GENE Sisters (GENESIS) study investigated risk factors for familial breast cancer 82 in the French population [19]. Index cases were patients with infiltrating mammary or 83 ductal adenocarcinoma, who had a sister with breast cancer, and tested negative for 84 BRCA1 and BRCA2 pathogenic variants. Controls were unaffected colleagues or friends 85 of the cases born around the year of birth of their corresponding case (± 3 years). We 86 focused on the 2 577 samples of European ancestry, of which 1 279 were controls, and 87 1 298 were cases. The genotyping platform was the iCOGS array, a custom Illumina 88 array designed to study the genetic susceptibility to hormone-related cancers [20]. It 89 contained 211 155 SNPs, including SNPs putatively associated with breast, ovarian, and 90 prostate cancers, SNPs associated with survival after diagnosis, and SNPs associated to 91 other cancer-related traits, as well as candidate functional variants in selected genes and 92 pathways.
genotypes. After controlling for relatedness, we excluded 17 additional samples (6 for 98 sample identity error, 6 controls related to other samples, 2 cases related to an index 99 case, and 3 additional controls having a high relatedness score). Lastly, based on study 100 selection criteria, 11 other samples were removed (1 control having cancer, 4 index cases 101 with no affected sister, 3 half-sisters, 1 sister with lobular carcinoma in situ, 1 with a 102 BRCA1 or BRCA2 pathogenic variant detected in the family, 1 with unknown 103 molecular diagnosis). The final dataset included 1 271 controls and 1 280 cases, 104 genotyped over 197 083 SNPs. 105 We looked for population structure that could produce spurious associations. A 106 principal component analysis revealed no visual differential population structure 107 between cases and controls (S1 Fig). Independently, we did not find evidence of 108 genomic inflation (λ = 1.05) either, further confirming the absence of confounding 109 population structure.  111 To measure the association between genotype and susceptibility to breast cancer, we 112 performed a per-SNP 1 d.f. χ 2 allelic test using PLINK v1.90 [21]. To obtain significant 113 SNPs, we performed a Bonferroni correction to keep the family-wise error rate below 114 5%. The threshold used was 0.05 197083 = 2.54 × 10 −7 .

115
Then, we used VEGAS2 [22] to compute the gene-level association score from the 116 P-values of the SNPs mapped to them. More specifically, we mapped SNPs to genes 117 through their genomic coordinates: all SNPs located within the boundaries of a gene,

118
±50 kb, were mapped to that gene. We computed VEGAS2 scores for each gene using 119 only the 10% of SNPs with the lowest P-values among all those mapped to it. We used 120 the 62 193 genes described in GENCODE 31 [23], although only 54 612 mapped to at 121 least one SNP. Out of those, we focused exclusively on the 32 767 that had a gene 122 symbol. Out of the 197 083 SNPs remaining after quality control, 164 037 mapped to at 123 least one of these genes. We also performed a Bonferroni correction to obtain significant 124 genes; in this case, the threshold of significance was 0. shortest paths between two other nodes. 138 We also used two matrices that describe two different properties of a graph. Both 139 matrices are square and have as many rows and columns as nodes are in the network.

140
The element (i, j) hence represents a relationship between v i and v j . The adjacency 141 matrix W G contains a 1 when the corresponding nodes are connected, and 0 otherwise; 142 its diagonal is zero. The degree matrix D G is a diagonal matrix that contains the degree 143 of the different nodes.  [17] was the only network method designed to handle SNP 164 networks. As in gene networks, two SNPs were connected in a SNP network when there 165 was evidence of shared functionality between them. Azencott et al. [17] proposed three 166 ways of building such networks: connecting the SNPs consecutive in the genomic 167 sequence ("GS network"); interconnecting all the SNPs mapped to the same gene, on top 168 of GS ("GM network"); and interconnecting all SNPs mapped to two genes for which a 169 protein-protein interaction exists, on top of GM ("GI network"). We focused on the GI 170 network using the PPIN described above, as it fitted the scope of this work better.

171
However, at different stages, we also compared GI to GS and GM to understand how 172 including the PPIN affects SConES' output. For the GM network, we used the mapping 173 described in Section 2.3.5. In all three, we scored the nodes using the 1 d.f. χ 2 statistic 174 of association. The properties of these three subnetworks are available in S1  [14]. To that end, it searches candidate solutions using a greedy, "seed 189 and extend", heuristic: where k is the number of genes in S i , z j is the Z score of gene j, computed as 193 φ -1 (1 -P-value j ), and φ -1 is the inverse normal distribution function. 194 3. Identify neighboring nodes of S i , i.e., nodes at distance ≤ d. 4. Add the neighboring nodes whose inclusion increases Z m+1 by more than a 196 threshold Z m × (1 + r).  6. Add S i to the list of subnetworks to return. Normalize its Z-score as where Z m (π) represents a vector containing 100 000 random subsets of the 200 same number of genes.

201
DmGWAS carries out this process on every gene in the PPIN. We used the 202 implementation of dmGWAS in the dmGWAS 3.0 R package [25]. Unless 203 otherwise specified, we used the suggested parameters d = 2 and r = 0.1. We used 204 the function simpleChoose to select the solution, which aggregates the top 1% 205 subnetworks.

206
heinz The goal of heinz is to identify the highest-scored connected subnetwork [15].

207
The authors proposed a transformation of the genes' P-value into a score that is 208 negative under weak association with the phenotype, and positive under a strong 209 one. This transformation is achieved by modeling the distribution of P-values by a 210 beta-uniform model (BUM) parameterized by the desired false discovery rate 211 (FDR). Thus formulated, the problem is NP-complete, and hence solving it would 212 require a prohibitively long computational time. To solve it efficiently, it is re-cast 213 as the Prize-Collecting Steiner Tree Problem, which seeks to select the connected 214 subnetwork S that maximizes the profit p(S), defined as: were al. [26] which, in practice, is often fast and optimal, although neither is 219 guaranteed. We used BioNet's implementation of heinz [27,28].

220
HotNet2 HotNet2 was developed to find connected subgraphs of genes frequently 221 mutated in cancer [16]. To that end, it considers both the local topology of the 222 PPIN and the nodes' scores. An insulated heat diffusion process captures the 223 former: at initialization, the score of the node determines its initial heat; 224 iteratively each node yields heat to its "colder" neighbors and receives heat from 225 its "hotter" neighbors while retaining part of its own (hence, insulated). This 226 process continues until a stationary state is reached, in which the temperature of 227 the nodes does not change anymore, and results in a diffusion matrix F . F is used 228 to compute the similarity matrix E that models exchanged heat as where diag(w(V )) is a diagonal matrix with the node scores in its diagonal. For 230 any two nodes i and j, E ij models the amount of heat that diffuses from node j 231 to node i. Hence, E ij can be interpreted as a (non-symmetric) similarity between 232 those two nodes. To obtain densely connected solutions, HotNet2 prunes E, only 233 preserving edges such that w(E) > δ. Lastly, HotNet2 evaluates the statistical 234 significance of the solutions by comparing their size to the size of PPINs obtained 235 by permuting the node scores. We assigned the initial node scores as in Nakka et 236 al. [29], giving a 0 to the genes with a VEGAS2 P-values of association with the 237 disease, and -log 10 (P-value) to those likely to be. In the GENESIS dataset, the 238 threshold separating both was a P-value of 0.125, which we obtained using a local 239 FDR approach [30]. HotNet2 has two parameters: the restart probability β, and 240 the threshold heat δ. Both parameters are set automatically by the algorithm, 241 which is robust to their values [16]. HotNet2 is implemented in Python [31].

242
LEAN LEAN searches altered "star" subnetworks, that is, subnetworks composed of 243 one central node and all its interactors [13]. By imposing this restriction, LEAN 244 can exhaustively test all such subnetworks (one per node). For a particular star   We adjust these P-values for multiple testing through a Benjamini-Hochberg 254 correction. We used the implementation of LEAN from the LEANR R package [32]. 255 October 27, 2020 7/32 SConES SConES searches the minimal, modular, and maximally associated 256 subnetwork in a SNP graph [17]. Specifically, it solves the problem 257 arg max where λ and η are parameters that control the sparsity and the connectivity of the 258 model. The connectivity term penalizes disconnected solutions, with many edges 259 between selected and unselected nodes. Given a λ and an η, Eq 5 has a unique 260 solution that SConES finds using a graph min-cut procedure. SigMod SigMod searches the highest-scoring, most densely connected subnetwork [18]. 267 It addresses an optimization problem similar to that of SConES (Eq 5), but with 268 a different connectivity term that favors solutions containing many edges: As for SConES, this optimization problem can also be solved by a graph min-cut 270 approach.

271
SigMod presents three important differences with SConES. First, it was designed 272 for PPINs. Second, it favors solutions containing many edges between the selected 273 nodes. SConES, instead, penalizes connections between selected and unselected 274 nodes. Third, it explores the grid of parameters differently, and processes their 275 respective solutions. Specifically, for the range of λ = λ min , . . . , λ max for the same 276 η, it prioritizes the solution with the largest change in size from λ n to λ n+1 .

277
Additionally, that change needs to be larger than a user-specified threshold 278 maxjump. Such a large change implies that the network is densely interconnected. 279 This results in one candidate solution for each η, which is processed by removing 280 any node not connected to any other. A score is assigned to each candidate 281 solution by summing their node scores and normalizing by size. Finally, SigMod 282 chooses the candidate solution with the highest standardized score, and that is not 283 larger than a user-specified threshold (nmax). We used the default parameters 284 maxjump = 10 and nmax = 300. SigMod is implemented in an R package [34].

285
Consensus We built a consensus solution by retaining the genes selected by at least 286 two of the six methods (using SConES GI for SConES). It includes any edge 287 between the selected genes in the PPIN. 288 We performed all the computations in the cluster described in Section 2.8.  290 We used the network methods with the parameters recommended by their authors, or 291 with the default parameters in their absence. Additionally, we explored the parameter 292 space of the different methods to study how they alter the output.

Comparing SNP-methods to gene-methods and vice versa 307
In multiple steps of this article, we compared the outcome of a method that works on 308 genes with the outcome of one that works on SNPs. For this purpose, we used the 309 SNP-gene correspondence described in Section 2.2. To convert a list of SNPs into a list 310 of genes, we included all the genes mapped to any of those SNPs. Conversely, to convert 311 a list of genes into a list of SNPs, we included all the SNPs mapped to any of those 312 genes. 314 We searched for pathways enriched in the gene solutions produced by the above 315 methods. We conducted a hypergeometric test on pathways from Reactome [35] using 316 the function enrichPathway from the ReactomePA R package [36]. The universe of 317 genes included any gene that we could map to a SNP in the iCOGS array (Section 2.2). 318 We adjusted the P-values for multiple testing as in Benjamini and Hochberg [37] (BH): 319 pathways with a BH adjusted P-value < 0.05 were deemed significant.

321
We evaluated multiple properties of the different methods (described in Sections 2.5.1 322 and 2.5.2) through a 5-fold subsampling setting. We applied each method to 5 random 323 subsets of the original GENESIS dataset containing 80% of the samples (train set).

324
When pertinent, we evaluated the solution on the remaining 20% (test set). We used   329 We compared the runtime, the number of selected features (genes or SNPs), and the 330 stability (sensitivity to the choice of train set) of the different network methods. 331 Nogueira and Brown [38] proposed quantifying a method's stability using the Pearson 332 correlation between the genes selected on different subsamples. This correlation was 333 calculated between vectors with the length of the total number of features, containing a 334 0 at position i if feature i was not selected and a 1 if it was. solution. The L1 penalty was set by cross-validation, choosing the value that minimized 344 misclassification error. 345 We applied each network method to each train set. Then, we trained the classifier on 346 the same train set using only the selected SNPs. When the method retrieved a list of 347 genes, we proceeded as explained in Section 2.3.5. Lastly we evaluated the sensitivity 348 and the specificity of the classifier on the test set. To obtain a baseline, we also trained 349 the classifier on all the SNPs of the train set. 350 We did not expect a linear model on selected SNPs to separate cases from controls 351 well. Indeed, the lifetime cumulative incidence of breast cancer among women with a 352 family history of breast or ovarian cancer, and no BRCA1/2 mutations, is only 3.9 353 times more than in the general population [39]. However, classification accuracy may be 354 one additional informative criterion on which to evaluate solutions.  356 An alternative way to evaluate the methods is by comparing their solutions to an 357 external dataset. For that purpose, we used the 153 genes associated to familial breast 358 cancer on DisGeNET [40]. Across this article, we refer to these genes as breast cancer 359 susceptibility genes. 360 Additionally, we used the summary statistics from the Breast Cancer Association 361 Consortium (BCAC), a meta-analysis of case-control studies conducted in multiple 362 countries. BCAC included 13 250 641 SNPs genotyped or imputed on 228 951 women of 363 European ancestry, mostly from the general population [41]. Through imputation, 364 BCAC includes more SNPs than the iCOGS array used for GENESIS (Section 2.1). 365 However, in all the comparisons in this paper we focused on the SNPs that passed 366 quality control in GENESIS. Hence, we used the same Bonferroni threshold as in 367 Section 2.2 to determine the significant SNPs in BCAC. We also computed gene-scores 368 in the BCAC data using VEGAS2, as in Section 2.1. In this case, we did use the 369 summary statistics of all 13 250 641 available SNPs and the genotypes from European 370 samples from the 1000 Genomes Project [42] to compute the LD patterns. Since these 371 genotypes did not include chromosome X, we excluded it from this analysis. All 372 comparisons included only the genes in common between GENESIS and BCAC, so we 373 used a different Bonferroni threshold (1.66 × 10 −6 ) to call gene significance. Rewiring the PPIN while preserving the number of edges of each gene allowed to study 376 the impact of the topology on the output of network methods. Indeed, the edges lose 377 their biological meaning while the topology of the network is conserved. We produced 378 100 such rewirings by randomly swapping edges in the PPIN. We still scored the genes 379 as described in Section 2.3.3. We only applied only four methods on the rewirings: 380 heinz, dmGWAS, LEAN, and SigMod. We excluded HotNet2 and SConES since they 381 took notably longer to run.   406 We conducted association analyses in the GENESIS dataset (Section 2.1) at both SNP 407 and gene levels (Section 2.2). At the SNP level, two genomic regions had a P-value  Closer examination revealed two other regions (3p24 and 8q24) having low, albeit 416 not genome-wide significant, P-values. Both of them have been associated with breast 417 cancer susceptibility in the past [45,46]. We applied an L1-penalized logistic regression 418 using all GENESIS genotypes as input and the phenotype (cancer/healthy) as the outcome (Section 2.5.2). The algorithm selected 100 SNPs, both from all regions (sensitivity = 55%, specificity = 55%, Section 2.5). Together, these issues motivated 427 exploring network methods, which consider not only statistical association but also the 428 location of each gene in a PPIN to find susceptibility genes.  These solutions differed in many aspects, making it hard to draw joint conclusions. 437 For starters, the overlap between the genes featured in each solution was relatively small 438 ( Fig 1A). However, the methods tended to agree on the genes with the strongest signal: 439 genes selected by more methods tended to have lower P-value of association ( Fig 1B). 440 Another major difference was the solution size: the largest solution, produced by 441 HotNet2, contained 440 genes, while heinz's contained only 4 genes. While SConES GI 442 did not recover any protein coding gene, working with SNP networks rather than gene 443 networks allowed it to retrieve four subnetworks in intergenic regions and another 444 subnetwork overlapping an RNA gene (RNU6-420P).

445
The topologies of the five solutions differed as well (Fig 1C), as measured by the 446 median centrality and the number of connected components ( Table 2). Three methods 447 yielded more than one connected component: SConES, as described above, SigMod, and 448 HotNet2. HotNet2 produced 135 subnetworks, 115 of which have fewer than five genes. 449 The second largest subnetwork (13 nodes) contained the two breast cancer susceptibility 450 genes CASP8 and BLM (Section 2.6). number of genes selected out of those that are part of the PPIN; for SConES GI, the total number of genes, including RNA genes, was added in parentheses. # components: number of connected components. Betweenness: median betweenness of the selected genes in the PPIN.P gene : median VEGAS2 P-value of the selected genes. # genes in consensus: number of genes in common between the method's solution and the 93 genes in the consensus solution.  Table): protein translation (including mitochondrial), 454 mRNA splicing, protein misfolding, and keratinization (BH adjusted P-values < 0.03). 455 Interestingly, the dmGWAS solution (S3 Table) was also related to protein misfolding 456 (attenuation phase, BH adjusted P-value = 0.01). However, it additionally included 457 submodules of proteins related to mitosis, DNA damage, and regulation of TP53 (BH 458 adjusted P-values < 0.05), which match previously known mechanisms of breast cancer 459 susceptibility [47]. As with SigMod, the genes in HotNet2's solution (S4 Table) were 460 involved in mitochondrial translation (BH adjusted P-value = 1.87 × 10 -4 ), but also in 461 glycogen metabolism and transcription of nuclear receptors (BH adjusted 462 P-value < 0.04).

463
Despite their differences, there were additional common themes. All obtained 464 solutions had lower association P-values than the whole PPIN (median VEGAS2 465 P-value 0.46, Table 2), despite containing genes with higher P-values as well 466 (Fig 1D). This illustrates the trade-off between controlling for type I error and biological 467 relevance. However, there are nuances between solutions in this regard: heinz strongly 468 favored genes with lower P-values, while dmGWAS was less conservative (median 469 VEGAS2 P-values 0.0012 and 0.19, respectively); SConES tended to select whole 470 LD-blocks; and HotNet2 and SigMod were less likely to select low scoring genes.

471
Additionally, the solutions presented other desirable properties. First, four of them 472 were enriched in known breast cancer susceptibility genes (dmGWAS, heinz, HotNet2, 473 and SigMod, Fisher's exact test one-sided P-value < 0.03). Second, the genes in three 474 solutions displayed, on average, a significantly higher betweenness centrality than the 475 rest of the genes (dmGWAS, HotNet2, and SigMod, Wilcoxon rank-sum test 476 P-value < 1.4 × 10 -21 ). This agrees with the notion that disease genes are more central 477 than other non-essential genes [48], an observation that holds in breast cancer 478 (one-tailed Wilcoxon rank-sum test P-value = 2.64 × 10 -5 when comparing the 479 betweenness of known susceptibility genes versus the rest). Interestingly, the SNPs in 480 SConES' solution were also more central than the average SNP (S1 Table), suggesting 481 that causal SNPs are also more central than non-associated SNPs.  483 Despite their shared properties, the differences between the solutions suggested that 484 each of them captured different aspects of cancer susceptibility. Indeed, out of the 668 485 genes that were selected by at least one method, only 93 were selected by at least two, 486 20 by three, and none by four or more. Encouragingly, the more methods selected a 487 gene, the higher its association score to the phenotype (Fig 1B), a relationship that 488 plateaued at 2. Hence, to leverage their strengths and compensate for their respective 489 weaknesses, we built a consensus solution using the genes shared among at least two  Table). We found two involved 495 mechanisms: mitochondrial translation and attenuation phase. The former is supported 496 by genes like MRPS30 (VEGAS2 P-value = 0.001), which encode a mitochondrial 497 ribosomal protein and was also linked to breast cancer susceptibility [49]. Interestingly, 498 increased mitochondrial translation has been found in cancer cells [50], and its 499 inhibition was proposed as a therapeutic target. With regards to the attenuation phase 500 of heat shock response, it involved three Hsp70 chaperones: HSPA1A, HSPA1B, and 501 HSPA1L. The genes encoding these proteins are all near each other at 6p21, in the 502 region known as HLA. In fact, out of the 22 SNPs mapped to any of these three genes, 9 503 mapped to all three, and 4 to two, which made it hard to disentangle their effects. represented by a pie chart, which shows the methods that selected it. We enlarged the two most central genes (COPS5 and OFD1 ), the known breast cancer susceptibility genes, and the BCAC-significant genes (Section 2.6). (C) The nodes are in the same disposition as in panel B, but we indicated every gene name. We colored in pink the names of known breast cancer susceptibility genes and BCAC-significant genes.

505
Topologically, the consensus consisted of a connected component composed of 49 506 genes and multiple smaller subnetworks (Fig 2B and C). Among the latter, 19 genes 507 were in subnetworks containing a single gene or two connected nodes. This implied that 508 they did not have a consistently altered neighborhood but were strongly associated 509 themselves and hence picked by at least two methods. The large connected component 510 contained genes that are highly central in the PPIN. This property weakly 511 anticorrelated with the P-value of association to the disease (Pearson correlation 512 coefficient = -0.26, S3 Fig). This anticorrelation suggested that these genes were 513 selected because they were on the shortest path between two high scoring genes.

514
Because of this, we hypothesize that highly central genes might contribute to the 515 heritability through alterations of their neighborhood, consistent with the omnigenic 516 model of disease [6]. For instance, the most central node in the consensus solution was 517 COPS5, a component of the COP9 signalosome that regulates multiple signaling 518 pathways. COPS5 is related to multiple hallmarks of cancer and is overexpressed in 519 multiple tumors, including breast and ovarian cancer [51]. Despite its lack of association 520 in GENESIS or in studies conducted by the Breast Cancer Association Consortium 521 (BCAC) [41] (VEGAS2 P-value of 0.22 and 0.14, respectively), its neighbors in the 522 consensus solution had consistently low P-values (median VEGAS2 P-value = 0.006).   538 We compared the six network methods in a 5-fold subsampling setting (Section 2.5). In 539 this comparison we measured properties (Fig 3 and S4 Fig): the size of the solution; the 540 sensitivity and the specificity of an L1-penalized logistic regression classifier on the 541 selected SNPs; the stability of the methods; and their computational runtime. The 542 solution size varies greatly between the different methods ( Fig 3A). Heinz produced the 543 smallest solutions, with an average of 182 selected SNPs (Section 2.3.5) while the largest 544 ones came from SConES GI (6 256.6 SNPs) and dmGWAS (4 255.0 SNPs). LEAN did 545 not produce any solution in any of the subsamples.

546
To determine whether the selected SNPs could predict cancer susceptibility, we  Interestingly, the classifier trained on all the SNPs had a similar performance, despite 550 being the only method aiming to minimize prediction error. Of course, although these 551 performances were low, we did not expect to separate cases from controls well using 552 exclusively genetic data [52].

553
Another desirable quality of a selection algorithm is the stability of the solution with 554 respect to small changes in the input (Section 2.5.1). Heinz was highly stable in our 555 benchmark, while the other methods displayed similarly low stabilities (Fig 3B). 556 In terms of computational runtime, the fastest method was heinz (Fig 3C), which . We used as true positives BCAC-significant SNPs (for SConES and χ 2 + Bonferroni) and genes (for the remaining methods, Section 2.6). We used the whole dataset in this panel.
average). Including the time required to compute the gene scores, however, slowed down 559 considerably gene-based methods; on this benchmark, that step took on average 1 day 560 and 9.33 hours. Including this first step, it took 5 days on average for HotNet2 to 561 produce a result.

562
Using different combinations of parameters (Section 2.3.4), we computed how good 563 each of the methods was at recovering the results of a conventional GWAS on BCAC 564 (Section 2.6, Fig 3D). SConES exhibits the largest area under the curve since, when 565 λ = 0 (i.e., network topology is disregarded), it is equivalent to a Bonferroni correction. 566 The remaining network methods have similar areas under the curve, with heinz having 567 the largest one. assumptions the methods made allowed us to understand the results more in depth. For 573 instance, the fact that LEAN did not return any gene implied that there was no gene 574 such that both itself and its environment were, on average, strongly associated with the 575 disease.

576
In the GENESIS dataset, heinz's solution was very conservative, providing a small 577 solution with the lowest median P-value (Table 2). By repeatedly selecting this compact 578 solution, heinz was the most stable method (Fig 3B). Its conservativeness stemmed from 579 its preprocessing step, which modeled the gene P-values as a mixture model of a beta positive score after that transformation. However, this small solution did not provide 583 much insight into the susceptibility mechanisms to cancer. Importantly, it ignored genes 584 that were associated with cancer in this dataset, like FGFR2. 585 On the other end of the spectrum, dmGWAS, HotNet2, and SigMod produced large 586 solutions. DmGWAS' solution was the lowest scoring solution on average because of its 587 greedy framework, which is biased towards larger solutions [53]. It considered all nodes 588 at distance 2 of the examined subnetwork and accepted a weakly associated gene if it 589 was linked to another, high scoring one. Aggregating the results of successive greedy 590 searches exacerbates this bias, leading to a large, tightly connected cluster of 591 unassociated genes (Fig 4A). This relatively low signal-to-noise ratio combined with the 592 large solution requires additional analyses to draw conclusions, such as enrichment 593 analyses. In the same line, HotNet2's solution was even harder to interpret, being 594 composed of 440 genes divided into 135 subnetworks. Lastly, SigMod missed some of the 595 highest scoring breast cancer susceptibility genes in the dataset, like FGFR2 and TOX3. 596 Another peculiarity of network methods was their relationship to degree centrality. 597 We studied random rewirings of the PPIN that preserved node centrality (Section 2.7). 598 In this setting, network methods favored central genes (Fig 4B) even though highly 599 central genes often had no association to breast cancer susceptibility (Fig 4C). We found 600 this bias especially in SigMod (S6 Fig), which selected three highly central, unassociated 601 genes in both the PPIN and in many of the random rewirings: COPS5, CUL3, and FN1. 602 However, as we showed in Section 3.3 and will show in 3.8, there is evidence in the 603 literature of the contribution of the first two to breast cancer susceptibility. With 604 regards to FN1, it encodes a fibronectin, a protein of the extracellular matrix involved 605 in cell adhesion and migration. Overexpression of FN1 has been observed in breast 606 cancer [54], and it anticorrelates with poor prognosis in other cancer types [55,56].

607
By using a SNP subnetwork, SConES analyzed each SNP in its functional context. 608 Therefore, it could select SNPs located in genes not included in the PPIN and in

634
We computed the Pearson correlation between the different solutions as in 635 Section 2.5.1 to study how the parameters affected which genes and SNPs were selected 636 (S8 Fig). This analysis showed that dmGWAS and SigMod were robust to two 637 parameters: the parameter d determined dmGWAS' output more than r; for SigMod, it 638 was nmax rather than maxjump. and below the best parameters, respectively. This second parameter space was more 645 diverse, which allowed to find more interesting solutions. Most network methods, including the consensus, were highly unstable (Fig 3B), raising 649 questions about the results' reliability. We built a new, stable consensus solution using 650 the genes selected most often across the 30 solutions obtained by running the 6 methods 651 on 5 different splits of the data (Section 2.5). Such a network should capture the 652 subnetworks more often found altered, and hence should be more resistant to noise. We 653 used only genes selected in at least 7 solutions, which corresponded to 1% of all genes 654 selected at least once. The resulting stability-based consensus was composed of 68 genes 655 (Fig 5). This network shared most of the properties of the consensus: breast cancer 656 susceptibility genes were overrepresented (P-value = 3 × 10 -4 ), as well as genes involved 657 in mitochondrial translation and the attenuation phase (adjusted P-values 0.001 and 3 × 658 10 -5 respectively); the selected genes were more central than average (P-value = 1.1 × 659 10 -14 ); and a considerable number of nodes (19) were isolated (Fig 5B and C).

660
Despite these similarities, the consensus and the stable consensus included different 661 genes. In the stable consensus network, the most central gene was CUL3, which was 662 absent from the previous consensus solution and had a low association score in both  [59], and its overexpression was 666 also linked to increased sensitivity to carcinogens [60]. In recent years, the GWAS' ability to unravel the mechanisms leading to complex 669 diseases has been called into question [6]. First, the omnigenic model proposes that gene 670 functions are interwoven in a dense co-function network. The practical consequence is 671 that larger and larger GWAS will lead to discovering an uninformative wide-spread 672 pleiotropy. Second, its conservative statistical framework hinders GWAS discovery. In this article we evaluated six of these methods (Section 2.3.3) by applying them to the 678 GENESIS study, a GWAS dataset on familial breast cancer (Section 2.1).

679
DmGWAS, Heinz, HotNet2, SConES, and SigMod all yielded compelling solutions, 680 which include (but are not limited to) known breast cancer susceptibility genes 681 (Section 3.2). In general, the selected genes and SNPs were more central than most 682 other genes and SNPs, agreeing with the observation that disease genes are more 683 central [48]. However, very central nodes are also more likely to be connecting any given 684 random pair of nodes, making them more likely to be selected by network methods 685 (Section 3.6). However, we found support in the literature for the involvement of the 686 selected highly central genes (COPS5, FN1, and CUL3 ) in breast cancer susceptibility 687 (Sections 3.3, 3.6, and 3.8). Despite these similarities, the methods' solutions were 688 notably different. At one end of the spectrum, SConES and heinz preferred high scoring 689 solutions, which were also small and hence did not shed much light on the disease's 690 etiology. On the other end, dmGWAS, HotNet2 and SigMod gravitated towards lower 691 scoring but larger solutions, which provided a wide overview of the biological context.

692
While this deepened our understanding of breast cancer susceptibility and provided 693 biological hypotheses, they required further analyses. For instance, we examined the 694 centrality of the selected genes to understand how much that property was driving their 695 selection (Section 3.6). However, all solutions shared two drawbacks. First, they were 696 all equally bad at discriminating cases from controls. However, the classification 697 accuracy of network methods was similar to that of a classifier trained on the entire 698 genome (Section 3.5), which suggests that cases and controls are difficult to separate in 699 the GENESIS dataset. This may be due to limited statistical power, which reduces the 700 ability to identify relevant SNPs. However, in any event, we do not expect to separate 701 people who have or will develop cancer from others on the sole basis of their genomes, 702 ignoring all environmental factors and chance events. Hence, network methods were 703 preferable to the logistic regression classifier since they did "no worse" at classification 704 while providing an interpretable solution. Second, all methods were remarkably 705 unstable, yielding different solutions for slightly different inputs. This might partly have 706 been caused by the instability of the P-values themselves in low statistical power 707 settings [61]. Hence, heinz's conservative transformation of P-values, which favored only 708 the most extreme ones, led to improved stability. Another source of instability might 709 have been the redundancy inherent to biological networks, a consequence of an 710 evolutionary pressure to avoid single points of failure [62]. Hence, biological networks 711 will often have multiple paths connecting two high-scoring nodes.

712
To overcome these limitations while exploiting the each method's strengths, we 713 proposed combining them into a consensus solution. We used the straightforward 714 strategy of including any node that was recovered by at least two methods. We thus 715 proposed two solutions (Sections 3.3 and 3.8): a consensus solution, which addressed the 716 heterogeneity of the solutions, and a stable consensus solution, which addresseded the 717 instability of the methods. They both included the majority of the strongly associated 718 smaller solutions and captured genes and broader mechanisms related to cancer, thus 719 synthesizing the mechanisms altered in breast cancer cases. Thanks to their smaller size 720 and network structure, they provided compelling hypotheses on genes like COPS5 and 721 CUL3, which lack genome-wide association with the disease but are related to cancer at 722 the expression level and consistently interact with high scoring genes. Notably, while 723 the consensus approach was as unstable as the individual network-guided methods, the 724 stable consensus network retained the ability to provide compelling hypotheses and had 725 better stability. This supported that redundant but equivalent mechanisms might cause 726 instability and supported the conclusions obtained on the individual solutions.

727
In this work, we have compared our results to significant genes and SNPs in the 728 BCAC study [41]. Network methods showed modest precision but much higher recall at 729 recovering BCAC hits (Section 3.4). While precision might be desirable when a subset 730 of useful markers is required (for instance, for diagnosis), higher recall is desirable in 731 exploratory settings. Nonetheless, BCAC was not an ideal ground truth. First, the 732 studied populations are not entirely overlapping: BCAC focused on a pan-European 733 cohort, while GENESIS targeted the French population. Second, the study designs 734 differed: a high proportion of breast cancer cases investigated in BCAC were sporadic 735 (not selected according to family history), while GENESIS was a homogeneous dataset 736 not included in BCAC focused on the French high-risk population attending the family 737 cancer clinics. Finally, and this is indeed the motivation for this study, GWAS are 738 unlikely to identify all genes relevant for the disease: some might only show up in 739 rare-variant studies; others might have too small effect sizes. Network methods account 740 for this by including genes with low association scores but with relevant topological 741 properties. Hence, network methods and GWAS, even when well-powered, are unlikely 742 to capture exactly the same sets of genes. This might partly excuse the low precisions 743 displayed in Section 3.4 and the low AUC displayed in Section 3.5.

744
As not all PPIN databases compiled the same interactions, the choice of the PPIN 745 determines the final output. In this work, we used only interactions from HINT from 746 high-throughput experiments (Section 2.3.2). This responded to concerns about adding 747 interactions identified in targeted studies and falling into "rich getting richer" problems: 748 since popular genes have a higher proportion of their interactions described [10,24], they 749 might bias discovery towards themselves by reducing the average shortest path length 750 between two random nodes. On the other hand, Huang et al. [12] found that larger 751 networks were more useful than smaller networks to identify disease genes. This would 752 support using the largest networks in our experiments. However, when we compared the 753 impact of using a larger PPIN containing interactions from both high-throughput RNA genes like CASC16 were associated to breast cancer (Section 3.1), reminding us of 766 the importance of using networks beyond coding genes. Besides, even protein-coding 767 genes linked to breast cancer susceptibility [57], like NEK10 (P-value 1.6 × 10 -5 , 768 overlapping with SLC4A7 ) or POU5F1B, were absent from the PPIN. However, on 769 average protein-coding genes absent from the PPIN were less associated with breast 770 cancer susceptibility (Wilcoxon rank-sum P-value = 2.79 × 10 -8 , median P-values of 771 0.43 and 0.47). This could not be due to well-known genes having more known 772 interactions because we only used interactions from high-throughput experiments. As 773 disease genes tend to be more central [48], we hypothesize that it was due to 774 interactions between central genes being more likely. It is worth noting that network 775 methods that do not use PPIs, like SConES GS and GM, recovered SNPs in NEK10 776 and CASC16. Moreover, both SConES GM and GI recovered intergenic regions, which 777 might contain key regulatory elements [63], but are excluded from gene-centric 778 approaches. This shows the potential of SNP networks, in which SNPs are linked when 779 there is evidence of co-function, to perform network-guided GWAS even in the absence 780 of gene-level interactions. Lastly, all the methods are heavily affected by how SNPs are 781 mapped to genes, and other strategies (e.g., eQTLs, SNPs associated to gene expression) 782 might lead to different results.

783
A crucial step for the gene-based methods is the computation of gene scores. In this 784 work, we used VEGAS2 [22] due to the flexibility it offers to use user-specified gene 785 annotations. However, it presents known problems: selection of an appropriate 786 percentage of top SNPs, long runtimes and P-value precision limited to the number of 787 permutations [29]). Additionally, other algorithms like PEGASUS [29], SKAT [64] or 788 COMBAT [65] might have more statistical power.

789
How to handle linkage disequilibrium (LD) is often a concern among GWAS 790 practitioners. Often, the question is whether an LD-based pruning of the genotypes will 791 improve the results. VEGAS2 accounts for LD patterns, and hence an LD pruning step 792 would not impact gene-based network methods, although it would speed up VEGAS2's 793 computation time. In Section 3.3 we highlighted ambiguities that appear when genes 794 overlap or are in LD. The presented case is paradigmatic since all three genes are in the 795 HLA region, the most gene-dense region of the genome [66]. Network methods are prone 796 to selecting such genes when they are functionally related, and hence interconnected in 797 the PPIN. But the opposite case is also true: when genes are not functionally related 798 (and hence disconnected in the PPIN), network methods might disregard them even if 799 they have high association scores. With regards to SConES, fewer SNPs would lead to 800 simpler SNP networks and, possibly, shorter runtimes. However, LD patterns also affect 801 SConES' in other ways, since its formulation penalizes selecting a SNP and not its 802 neighbors, via a nonzero parameter η in Eq 5. Due to LD, nearby SNPs' P-values 803 correlate; since positional information determines SNP networks, nearby SNPs are likely 804 to be connected. Hence, SConES tends to select LD-blocks formed by low P-value SNPs. 805 This might explain why SConES produced similar results on the GS and GM networks, 806 heavily affected by LD (Section 3.6). However, this same behavior raises the burden of 807 proof required to select SNPs with many interactions, like those mapped to hub genes in 808 the PPIN. For this reason, SConES GI did not select any protein coding gene. This 809 could be caused by the absence of joint association of a gene and most of its neighbors, 810 a hypothesis supported by LEAN's lack of results. Yet, a different combination of 811 parameters could lead to a more informative SConES' solution (e.g., a lower λ in Eq 5), 812 although it is unclear how to find it. In addition, due to the design of the iCOGS array 813 (Section 2.1), the genome of GENESIS participants has not been unbiasedly surveyed: 814 some regions are fine-mapped -which might distort gene structure in GM and GI 815 networks -while others are understudied -hindering the accuracy with which the GS 816 network captures the genome structure. A strong LD pruning might address such 817 problems.

818
To produce the two consensus solutions, we faced practical challenges due to the 819 differences in interfaces, preprocessing steps, and unexpected behaviors of the various 820 methods. To make it easier for others to apply these methods to new datasets and 821 aggregate their solutions, we built six nextflow pipelines [67] with a consistent 822 interface and, whenever possible, parallelized computation. They are available on 823 GitHub: hclimente/gwas-tools (Section 2.9). Importantly, we compiled those methods 824 with a permissive license into a Docker image for easier use, available on Docker Hub 825 hclimente/gwas-tools.