GenToS: Use of Orthologous Gene Information to Prioritize Signals from Human GWAS

Genome-wide association studies (GWAS) evaluate associations between genetic variants and a trait or disease of interest free of prior biological hypotheses. GWAS require stringent correction for multiple testing, with genome-wide significance typically defined as association p-value <5*10−8. This study presents a new tool that uses external information about genes to prioritize SNP associations (GenToS). For a given list of candidate genes, GenToS calculates an appropriate statistical significance threshold and then searches for trait-associated variants in summary statistics from human GWAS. It thereby allows for identifying trait-associated genetic variants that do not meet genome-wide significance. The program additionally tests for enrichment of significant candidate gene associations in the human GWAS data compared to the number expected by chance. As proof of principle, this report used external information from a comprehensive resource of genetically manipulated and systematically phenotyped mice. Based on selected murine phenotypes for which human GWAS data for corresponding traits were publicly available, several candidate gene input lists were derived. Using GenToS for the investigation of candidate genes underlying murine skeletal phenotypes in data from a large human discovery GWAS meta-analysis of bone mineral density resulted in the identification of significantly associated variants in 29 genes. Index variants in 28 of these loci were subsequently replicated in an independent GWAS replication step, highlighting that they are true positive associations. One signal, COL11A1, has not been discovered through GWAS so far and represents a novel human candidate gene for altered bone mineral density. The number of observed genes that contained significant SNP associations in human GWAS based on murine candidate gene input lists was much greater than the number expected by chance across several complex human traits (enrichment p-value as low as 10−10). GenToS can be used with any candidate gene list, any GWAS summary file, runs on a desktop computer and is freely available.


Introduction
Genome-wide association studies (GWAS) are an unbiased approach to identify genomic risk loci for complex diseases and to gain insight into underlying pathogenic mechanisms.Over the past decade, GWAS have led to the identification of previously unknown risk loci for hundreds of traits and diseases [1,2].To reduce the type I error and account for association testing of an estimated one million common independent single nucleotide polymorphisms (SNPs) in the human genome [3], a multiple testing corrected significance level (alpha of 5 Ã 10 −8 [0.05/ 1,000,000]) has been adopted in the GWAS community.This rather conservative Bonferroni correction results in an increased type II error: increasingly larger GWAS meta-analyses of the same phenotype have demonstrated that results for a given GWAS meta-analysis contain multiple true positive findings that do not achieve genome-wide significant association p-values.Such associations can then only be identified and replicated at genome-wide significance once sample size is increased in subsequent analyses.However, increasing sample size may not always be feasible due to high costs or because of limited phenotype availability for specific diseases or special populations [4].Therefore, approaches to identify additional candidate genes among these suggestive but not genome-wide significantly associated loci are needed.
Another challenge in the interpretation of associated loci identified through GWAS is that these loci typically contain several or many genes that each contain associated genetic variants in high linkage disequilibrium, complicating the identification of the causal gene(s) and variant (s) within such loci [5].Again, additional sources of evidence to aid in the prioritization of association signals would be desirable.Several existing approaches leverage external information for the prioritization of potentially causal genes from GWAS data [6][7][8][9][10][11][12].Many of these previous approaches evaluate enrichment of associated SNPs in gene sets based on pre-defined pathways [13], gene ontology terms [14], tissue expression analysis or functionally similar genes.They integrate information across different cell types and organisms and from sources as heterogeneous as in vitro protein-protein and chemical interactions.Another external source of information is animal models of phenotypes analogous to the human phenotype of interest, because of the conservation of gene function across species.The mouse represents a suitable model organism because of the relatively short evolutionary distance between humans and mice and because of a comprehensive and systematic effort to generate knock-out animals and/or cells for all murine genes [15,16].Previous approaches that have integrated evidence from GWAS and mouse models have focused on evidence from naturally occurring genetic markers for subsequent use in linkage analysis [17] or genome-wide association testing [18].
We aimed to develop a method that provides complementary information to previous approaches by using a comprehensive resource of genetically manipulated and then systematically phenotyped mice (reverse genetics approach) in order to generate biological candidate gene lists.These genes are then evaluated using summary association statistics from GWAS of a corresponding human disease or phenotype.We validate the method across several human complex traits and diseases including bone mineral density, diabetes, glycemic traits and blood pressure phenotypes, and show that genes causing a specific phenotype in mouse models are significantly enriched for associated SNPs in results from GWAS of a corresponding human phenotype.Finally, we show that the method can identify novel candidate genes not claimed by GWAS so far for future validation.

Results
The GenToS algorithm is built as a three-step procedure.It requires a candidate gene input list that contains gene identifiers of human orthologs of genes causing a specific phenotype in genetically manipulated mice.In a first step, the corresponding genomic coordinates for each gene on the candidate gene input list are obtained (Fig 1A).Next, the number of independent common single nucleotide polymorphisms (SNPs) within each candidate gene region is determined based on a reference population, to subsequently calculate a statistical significance threshold based on the number of independent SNPs across all genes on a list (Fig 1B).Third, all derived gene regions are queried for the presence of SNPs with association p-values below the derived significance threshold in results from a human GWAS of the same or similar phenotype (Fig 1C).In addition to this three-step procedure, a validation step can be performed to examine whether the use of the candidate gene input list leads to the identification of more genes that contain significant associations than expected by chance (enrichment, Fig 1D).Detailed information is provided in the Methods section.

Enrichment of the number of genes with significant association signals based on a candidate gene input list
Enrichment of significant GWAS associations based on a candidate gene input list can be assessed compared to the null distribution of significant GWAS associations expected by chance.The null distribution can be derived by a resampling approach where each randomly drawn gene input list contains an equal number of genes as the candidate gene input list.Since this iterative procedure is time consuming, we assessed the properties of this distribution.The test of identifying SNPs below the significance threshold for a given gene can be considered a Bernoulli trial.Thus, the number of genes that contain significant GWAS association signals from an input gene list should follow a binomial distribution.
First, 2,000 iterations of GenToS were carried out for each of several fixed statistical significance thresholds (range 1 Ã 10 −2 to 1 Ã 10 −8 ).For every threshold, each of the 2,000 iterations used an input gene list that contained 1,292 randomly drawn genes, corresponding to the number of genes on the candidate gene input list for abnormal murine skeleton morphology (see next section).The human GWAS summary statistics dataset used to identify significantly associated SNPs was obtained from a meta-analysis of GWAS for bone mineral density (for details, see Methods).For each of the 2,000 iterations, the number of genes from each input list was counted that contained SNPs associated with bone mineral density below the respective significance threshold.
Next, 2,000 iterations of a binomial experiment were carried out to simulate a binomial distribution.In each of these, p was the probability of observing a significant gene association, estimated by the proportion of genes that contained significant SNP associations below the evaluated fixed significance threshold among all 25,230 entries in the human gene database, and the number of Bernoulli trials n was 1,292, the number of genes in the candidate gene list.After 2,000 iterations of the simulated random draw, the number of significant genes was plotted against the number obtained from the iterative random draw using quantile-quantile (QQ)-plots.Fig 2 shows good agreement of the number of significant genes detected by the two approaches across a range of selected significance thresholds.The QQ plots for all evaluated significance thresholds are shown in S1 Fig for input gene lists that contain as many genes as the abnormal skeleton morphology candidate gene list (the longest candidate gene list) and in S2 Fig for input gene lists that contained 134 genes as the abnormal bone mineralization list (the shortest candidate gene list).Spearman rank-correlation coefficients between the number of significant genes for the two approaches ranged from 0.90-1.00across all QQ plots.We therefore decided to subsequently use the binomial distribution to visually assess and quantify enrichment of human GWAS association signals based on candidate gene input lists.Enrichment p-values were estimated using a complementary cumulative binomial distribution (see Methods).

GWAS of human skeletal phenotypes are enriched for signals in genes causing bone phenotypes in mouse models
Using publicly available summary statistics from the discovery stage of GWAS meta-analyses for femoral neck bone mineral density (FNBMD) and lumbar spine bone mineral density (LSBMD) of the GEFOS Consortium [19,20], GenToS was used to test for enrichment of GWAS association signals in genes that give rise to six different skeletal phenotypes in mouse models.Depending on which of the six candidate gene input lists was used (see Methods), a The graph shows that simulated draws based on a binomial experiment approximate the number of significant genes under the null hypothesis derived from iterations of randomly generated input gene lists, while being computationally more efficient.QQ plots were generated across a range of possible significance thresholds.Spearman correlation coefficients were determined for each setting and found to be in the range of 0.90-1.00.doi:10.1371/journal.pone.0162466.g002range of 6-21 significant genes were identified in human GWAS based on the Bonferroni method to derive the significance threshold (see Methods).The number of significant genes was higher than that expected by chance for each candidate gene input list, with enrichment pvalues ranging from 2.62 Ã 10 −3 to 1.71 Ã 10 −10 depending on the human phenotype (FNBMD or LSBMD) and the mouse candidate gene input list.Fig 3 shows the observed number of genes that contained significant associations compared to 2,000 randomly drawn input gene lists that contained an equal number of genes as the candidate gene input list, as well as the enrichment p-values for each of the six evaluated candidate gene input lists in relation to FNBMD.Results were also significant and very similar for LSBMD (S3 Fig) .Across all six candidate gene input lists and the two human phenotypes, 29 unique genes contained significantly associated SNPs (Tables 1 and 2).The greatest number of genes, 21, was found in association with FNBMD using the longest and rather general candidate gene input list, "abnormal skeleton morphology" (enrichment p-value of 1.71 Ã 10 −10 , Fig 3).

GenToS identifies novel gene associations for human skeletal phenotypes
Of the 29 genes that contained SNPs significantly associated with human skeletal phenotypes, 20 were published as genome-wide significant loci by the GEFOS Consortium (Table 1) [19,20].Of these, only 12 had reached genome-wide significance during the GWAS discovery stage, which is used for GenToS, whereas eight additional genes only achieved genome-wide significance after the replication stage of the study.Further, seven of the 29 genes mapped into significant and subsequently replicated GEFOS loci, but had not been named as the gene underlying the association signal in a given locus (Table 2 and S4 Fig).The remaining genes identified by GenToS had not reached genome-wide significance after discovery and replication at the time of the GEFOS publication.One of them, FGFRL1, was later identified in a bone mineral density study by Zhang et al [21].The last gene, COL11A1, has not been identified by bone-related GWAS to date and thus represents a novel human candidate gene for altered bone mineral density.Altogether, index SNPs in 28 of 29-or >95% of significant genes identified using GenToS with the GEFOS discovery stage data-were subsequently replicated, supporting them as true association signals.Among the genes not previously identified through GWAS or not implicated as the index gene in an associated locus, LRP4 and COL11A1 are known to harbor rare mutations that cause monogenic skeletal disease in humans (Table 2).Thus, additional evidence like Cenani-Lenz syndactylyl syndrome or fibrochondrogenesis-1 and the association between the index SNP in LRP4 and LRP4 transcript abundance strongly support that the genes identified using GenToS may be the causal one or represent an additional phenotype-associated gene in an associated locus (Table 1).

Significant associations with additional human phenotypes
To assess whether the enrichment of GWAS signals for genes causing corresponding or related phenotypes in mouse models can be generalized to phenotypes other than human bone mineral density, we explored additional human traits for which GWAS summary statistics are publicly available.This evaluation showed the GenToS approach to be generalizable (Table 3), but that the observed enrichment varied depending on the human phenotype and the input candidate gene list.
For type 2 diabetes, studied in 57,000 participants of the DIAGRAM Consortium [22], enrichment of genes that contained significantly associated SNPs was observed for two of the candidate gene input lists (S5 Fig) : for the list of candidate genes that when modified cause "hyperglycemia" in mouse models, five significant genes were identified in the DIAGRAM data For each of the six candidate gene input lists, the number of expected significant genes under the null hypothesis was generated based on iterations of randomly drawn gene lists that contained an equal number of genes as the respective candidate gene input list and is displayed as a histogram.In addition, the binomial density distribution corresponding to the candidate gene input list significance threshold was overlaid (dots connected with lines).The observed number of significant genes based on the use of GenTos with the candidate gene input lists and the human GWAS results for femoral neck bone mineral density is indicated by a vertical black line.The enrichment p-value is computed from the complementary cumulative binomial distribution (see Methods).doi:10.1371/journal.pone.0162466.g003(enrichment p-value 3.11 Ã 10 −5 , S1 Table ).For the candidate gene list "abnormal glucose tolerance", seven significant genes were found (enrichment p-value 6.54 Ã 10 −6 , S1 Table ).
For systolic blood pressure, human GWAS summary data from the ICBP Consortium was used (n = 74,000 [23,24]), and 4 different candidate gene input lists were tested (see Methods).None of the tested candidate gene lists showed nominally significant enrichment for association signals in humans (S6 Fig, S1 Table), although the number of genes with significant association signals in the lists "increased systemic arterial blood pressure" and "decreased systemic arterial blood pressure" approached statistical significance.
Finally, glycemic traits studied in the MAGIC Consortium were evaluated.For association with the human trait fasting insulin concentrations (GWAS data based on 38,000 individuals [25]), six different candidate gene input lists ranging from 42 to 385 genes were evaluated (see Methods).Nominally significant enrichment of associated genes was identified for two candidate gene lists (S7 Fig) , "abnormal circulating insulin level" (enrichment p-value 3.21 Ã 10 −2 ) and "increased circulating insulin level" (enrichment p-value 2.05 Ã 10 −2 , with associated genes listed in S1 Table .All other candidate gene lists did not give rise to any significant association signals in humans.The other human trait evaluated was fasting glucose (GWAS data for 46,000 individuals [25]).Six different candidate gene input lists were evaluated, representing three mouse traits, each in the fasting and non-fasting state.Significant enrichment of the number of genes that contained association signals in humans was only observed for the non-fasting candidate gene input lists (S8 Fig): 6 significant genes were identified for "abnormal circulating glucose level" (enrichment p-value 5.12 Ã 10 −4 ), 3 for"decreased circulating glucose level" (enrichment p-value 2.49 Ã 10 −2 ), and 6 for"increased circulating glucose level (enrichment p-value 2.41 Ã 10 −5 ), with associated genes shown in S1 Table .Conversely, no enrichment and in fact no significant genes at all were identified for the candidate gene input lists from the fasting counterpart of the murine phenotype.

Discussion
In this study we introduced GenToS, a tool to prioritize genes from GWAS summary statistics using candidate gene information obtained from another species, the mouse.We show across a variety of complex diseases/traits that GenToS identifies significant enrichment of GWAS association signals in the human orthologs of these candidate genes.The potential of the method is illustrated by the fact that-using bone phenotypes as exemplary data-more than 95% of the genes identified by GenToS were replicated as true positives in a replication step or subsequent studies.Our findings underline the high functional conservation of genes between mice and humans and suggest that the incorporation of murine data can be particularly helpful when further increases in sample size for human GWAS cannot easily be achieved.
There are several other tools to prioritize potentially causal genes in associated loci originating from human GWAS [6,11,12,[26][27][28].An approach taken by programs like DEPICT [11], MAGENTA [28] INRICH [27] and PARIS [12] is to evaluate enrichment of associated SNPs in gene sets based on pathways, tissue expression analysis or functionally similar genes.These gene sets are typically based on pre-existing Gene Ontology terms [14] or KEGG pathways [13], which integrate information across different cell types and organisms and from sources as heterogeneous as in vitro protein-protein and chemical interactions.GenToS on the other hand uses gene sets composed of biological candidate genes based on the systematic generation and grouping of observed phenotypes in the mouse, a widely used model organism to study human disease.Thus, pathway-based analyses and the approach implemented in GenToS provide complementary information.
With respect to using mouse models as the primary source of information for the selection of candidate genes, our approach is complementary to a recently published method by Wang et al. [18].The approach by Wang et al used naturally occurring genetic variants in recombinant inbred mouse strains for association testing with multiple murine (endo-) phenotypes, followed by examination of selected, implicated genes across many phenotypes in a human population genotyped only for the coding portion of the genome (exome chip).Our approach on the other hand uses genetically manipulated mice that feature a specific phenotype, followed by combination with results from a genome-wide genetic screen of a corresponding phenotype in humans.Our approach is therefore more focused in that it concentrates on specific and analogous rather than hundreds of phenotypes as well as on genetic manipulations of strong effect (e.g., complete gene knockouts), which can facilitate the interpretation of findings.In addition, the focus on one or a few related phenotypes allows for the derivation of a conservative multiple-testing corrected significance threshold in GenToS, which is difficult to establish in a phenome-wide context, as discussed by the authors [18].Conversely, the approach by Wang and colleagues allows for discovering novel cross-phenotype associations and for assessing the effects of naturally occurring, hypomorphic genetic variants.The latter should theoretically enable the study of regulatory variants, although the authors chose to study only 12,000 high- For each list, <5% of genes were filtered, mostly because they were mapping to human gonosomes and gonosomal GWAS summary statistics were not available.Other reasons for filtering included ambiguous mapping and accounted for <1% of filtered genes for each list.
doi:10.1371/journal.pone.0162466.t003impact (missense, nonsense, splice, frameshift, CNVs) out of 5 million discovered genetic variants.For many of these high-impact variants, no associated murine phenotype was observed, which can be explained by mechanisms such as compensation or by incomplete phenotype availability.Finally, the use of GWAS in our approach allows for the identification of associated SNPs that map into introns and gene regulatory regions, whereas the approach by Wang et al only focused on human genetic variants in the coding portion of the genome (exome chip).Thus, the evidence generated by the two approaches can be considered complementary.
The comparison of GenToS results across different candidate gene input lists and GWAS summary statistics datasets allows for several observations: first, the strength of enrichment did not increase when the murine phenotype was selected as closely as possible to the phenotype for which human GWAS association statistics were available.This is illustrated by the fact that the enrichment for genes on the rather general murine candidate gene list for skeleton morphology was stronger than that for the more specific murine candidate gene list for abnormal bone mineralization, the phenotype studied in humans.Second, findings across related human traits were very similar, as evidenced by the comparison of GWAS of femoral neck and lumbar spine bone mineral density.Third, our observation of significant enrichment was generalizable to non-skeletal phenotypes, as exemplified by significant enrichment for association signals in murine candidate genes for abnormal insulin levels and hyperglycemia in the corresponding human traits.
It is noteworthy that the significance of the observed enrichment varied across the examined phenotypes/diseases.There are several potential explanations for this observation: firstly, the genetic architecture of the examined phenotypes can differ.Whereas susceptibility to one disease may be explained by variants of large effect in relatively few genes, variants of small effect in several hundreds of genes may contribute to other diseases, requiring better-powered i.e. larger GWAS for their detection.Secondly, the publicly available data used in this report varied in sample size, thereby preventing a comparison of phenotypes at a fixed GWAS sample size.Thirdly, the phenotypic characterization in mice is not equally easy or complete across phenotypes.For instance, abnormal bone morphology in knockout mice is more easily observed than phenotypes requiring invasive measurements such as the recording of blood pressure, which may in addition be subject to biological variation.Finally, for some traits, humans and mice may be more alike than for others, which can additionally be aggravated by factors such as species-specific compensatory mechanisms or interactions with the environment.Regardless of the differing strength, however, we observed enrichment for a variety of the studied traits, supporting the general applicability of our approach.
Advantages of GenToS include its usefulness in settings where the sample size of subsequent GWAS cannot be increased easily, such as for rare diseases, or when replication studies may not be available.Further, the method can be extended to use additional evidence as input: although we used candidate gene input lists derived from murine phenotypes in this report, in principle any other candidate gene list could be used, such as candidate genes implicated by expression quantitative trait locus studies, candidate genes arising from GWAS carried out in other model organisms such as in the report of Wang et al. [18], or genes underlying monogenic human diseases.In support of the latter, many of the associations found with GenToS were already linked to human monogenic diseases in OMIM, supporting a model in which rare mutations of large effect and common variants of small effect in the same set of genes give rise to a continuum of a given human phenotype.
Some limitations of our approach warrant discussion: firstly, the performance of the method is influenced by the completeness of the candidate gene input lists.Although the work of the Jackson Lab and other groups has resulted in an impressively comprehensive and systematic resource of genetically manipulated and phenotyped mice, animal models were only available for 11,500 out of >25,000 murine genes at the time of our study.Because of issues such as early lethality or structurally complicated genomic regions that contain overlapping genes or are difficult to manipulate, the resource will likely never become complete.Together with the difficulty of quantifying some murine phenotypes, as discussed above, this may introduce misclassification that should bias any observed results towards the null.Another limitation is the inherent restriction to the available data when using posted GWAS summary results.For example, the conduct of approximate conditional analyses using the GWAS summary results would have been desirable to identify the presence of independent bone mineral density-associated SNPs in the HOX gene cluster, because murine phenotypes are observed for several of the genes in this cluster.However, this was not possible because the GEFOS Consortium did not make the estimated effect sizes required for these analyses publicly available.In addition, current GWAS are typically restricted to the evaluation of common genetic variants, and are therefore likely to miss association signals for rare variants of large effect.Future extensions of GWAS efforts and the continuing completion of the underlying murine MGI database will therefore likely result in further improvements of our findings.
In conclusion, GenToS is a flexible, freely available and user-friendly tool to incorporate external information in order to identify trait-associated SNPs in candidate genes that do not necessarily meet genome-wide significance in human GWAS studies.It allows for performing an analysis within minutes on a standard personal computer without any special requirements.

Generation of candidate gene input lists
Candidate genes, which when impaired cause skeletal phenotypes in mice, were selected by searching the Mouse Genome Informatics (MGI) resource [15].MGI is the primary international database for laboratory mice.All phenotypes in MGI are categorized based on the Mammalian Phenotype (MP) ontology and emerge as a result of different genetic models, including targeted knockout animals, chemically induced (ENU) and spontaneous mutations.For this project, murine phenotypes were selected for their biomedical relevance regarding the evaluated traits for which GWAS data were publicly available, and downloaded from the MP ontology of MGI (http://www.informatics.jax.org/searches/MP_form.shtml) in March of 2015 for skeletal candidate gene lists and in June of 2015 for the glucose, insulin, systolic blood pressure and diabetes candidate gene lists (Table 3).For genes on each candidate gene list, human orthologs were selected using the Human-Mouse: Disease Connection [http://www.informatics.jax.org/humanDisease.html].Genes with no ortholog in humans were filtered out; no other filtering criteria were used.The number of genes provided for each candidate gene list in this report represents the number of genes per list after translation to the human ortholog, the entry point for the use of GenTos.

Genome-wide association study datasets
GenToS was applied to different publicly available datasets of GWAS summary statistics: 1.The GEFOS (GEnetic Factors for Osteoporosis) Consortium [19,20] is an international consortium investigating the genetic basis of osteoporosis.The datasets used in this report originated from the discovery step of two meta-analyses of GWAS summary statistics from different studies of European and East Asian ancestry that examined associations between genotyped and HapMap imputed single nucleotide polymorphisms and bone mineral density of the lumber spine (LSBMD; 32,000 individuals) and femoral neck (FNBMD; 33,000 individuals).2. In the MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium) [25] Consortium, international investigators investigate genetic influences on glucose metabolism.

Extraction of significantly associated SNPs from GWAS
As a final step, GenToS searches the specified GWAS summary data file for SNPs within the defined gene regions with association p-values lower than the determined significance threshold.If present, summary statistics for such SNPs are annotated to the gene of interest and written to a results file.Consequently, the results file contains all information present in the input GWAS summary file, along with gene mapping information (Fig 1C).
Subsequent to this three-step procedure an optional yet recommended step is implemented to evaluate whether there is significant enrichment of the number of detected association signals for the genes contained in the candidate gene input list compared to the number of detected associations expected by chance alone.Assessment of enrichment can be carried out by visual comparison to the null distribution, which is generated based on the number of significant genes identified in GWAS data based on the iterative evaluation of randomly drawn input gene lists (2,000 iterations by default) that contain an equal number of genes as the evaluated candidate gene input.The number of 2,000 iterations was chosen as a compromise between computational time and sufficient precision.Because each of the 2,000 iterations generates an input gene list of the same number but different genes (i.e.randomly drawn) the calculation of the number of independent SNPs across each list followed by a Bonferroni correction procedure is carried out for each draw.This procedure accounts for the different size and linkage disequilibrium structure of genes within and across lists, and represents a time consuming yet reliable method to derive a null distribution.Another option to assess enrichment is a similar graphical representation based on a binomial distribution, where the probability p of a significant association is estimated by the proportion of the total number of genes with GWAS association signals below the calculated significance threshold for the given candidate gene input list among the total number of genes in the gene database and n is the total number of genes on the candidate gene input list.The probability of observing as many or more significant genes x is then estimated using a complementary cumulative binomial distribution (enrichment p-value).

!
Genes implicated by GenToS were further investigated by annotating them using the Online Mendelian Inheritance in Man (OMIM) resource as well as the annotation program SNiPA [29].

Pre-computed databases
In order to run GenToS, two databases, one containing the genes and their positions in the genome and the other containing independent SNPs across the genome were pre-computed.
For the gene database, all RefSeq genes (table refFlat) were downloaded from the UCSC homepage using build GRCh37/hg19 coordinates [30].In a subsequent processing step, the longest transcript for each gene was retained.Only genes of unambiguous mapping and for which starting and ending position were not mapping onto different chromosomes were extracted and added to the database for a total of 25,230 entries.
The independent SNPs for the SNP database were pre-computed based on the 1000 Genomes project phase 1 version 3 data using plink (version 1.90b2) [31] (options-indeppairwise 50 5 0.2 and-maf 0.01).The computation was carried out chromosome-wise and added to the SNP database, each chromosome in a different table.

Fig 1 .
Fig 1. GenToS principle.(A) First, GenToS extracts for each gene on a given candidate gene input list the region of the gene including a user-defined flanking region.(B) Next, all independent SNPs within each region are identified from a reference population, and a significance threshold based on the number of independent SNPs is calculated.(C) In the final step, SNPs with an association p-value below the calculated significance threshold are extracted from the human GWAS summary results.(D) Enrichment of the number of observed significant genes (vertical line) can be assessed visually compared to the expected number based on a null distribution derived by resampling from a binomial distribution (histogram). doi:10.1371/journal.pone.0162466.g001

Fig 2 .
Fig 2. QQ-plots of the number of observed significant genes under the null hypothesis comparing random draws of gene input lists and simulated draws.The graph shows that simulated draws based on a binomial experiment approximate the number of significant genes under the null hypothesis derived from iterations of randomly generated input gene lists, while being computationally more efficient.QQ plots were generated across a range of possible significance thresholds.Spearman correlation coefficients were determined for each setting and found to be in the range of 0.90-1.00.

Fig 3 .
Fig 3. GenToS identifies significant enrichment of genes containing femoral neck bone mineral density-associated SNPs based on candidate gene input lists for murine bone phenotypes.For each of the six candidate gene input lists, the number of expected significant genes under the null hypothesis was generated based on iterations of randomly drawn gene lists that contained an equal number of genes as the respective candidate gene input list and is displayed as a histogram.In addition, the binomial density distribution corresponding to the candidate gene input list significance threshold was overlaid (dots connected with lines).The observed number of significant genes based on the use of GenTos with the candidate gene input lists and the human GWAS results for femoral neck bone mineral density is indicated by a vertical black line.The enrichment p-value is computed from the complementary cumulative binomial distribution (see Methods).

Table 1 .
(Continued)The index SNP is defined as the SNP with the lowest association p-value with a given trait.The GWAS Catalog entry refers to results obtained from the NHGRI GWAS catalog upon entry of the given index SNP.Monogenic phenotypes are retrieved from OMIM.Of note, several of these genes only achieved genome-wide significance after the replication step, whereas GenToS is based on data from the discovery step and already implicated the genes at this point.Empty cells for LSBMD and MNBMD p-values and SNP identifiers indicate that no SNP in the gene contained significant associations below any of the six murine candidate gene list-wise thresholds.
*LSBMD and FNBMD entries from the GWAS catalog represent summary estimates from the combined discovery and replication step.LSBMD = Lumber spine bone mineral density; FNBMD = Femoral neck bone mineral density; OMIM = Online Mendelian Inheritance in Man database doi:10.1371/journal.pone.0162466.t001

Table 2 . Newly implicated genes identified by GenToS in association with bone mineral density phenotypes.
These genes either mapped into known associated GWAS regions but were not previously named as the index gene, or were not replicated at genome-wide significance at the time the GWAS data was published.

Table 2 .
(Continued)The index SNP is defined as the SNP with the lowest association p-value with a given trait.The GWAS Catalog entry refers to results obtained from the NHGRI GWAS catalog upon entry of the given index SNP.Monogenic phenotypes are retrieved from OMIM.Empty cells for LSBMD and MNBMD p-values and SNP identifiers indicate that no SNP in the gene contained significant associations below any of the six murine candidate gene list-wise thresholds.*LSBMD and FNBMD entries from the GWAS catalog represent summary estimates from the combined discovery and replication step.SNiPA was used to retrieve cis-eQTL evidence from numerous tissues.Evidence is indicated when any tissue showed indication of an eQTL.