plosPLoS Comput BiolploscompPLoS Computational Biology1553734X15537358Public Library of ScienceSan Francisco, USAPCOMPBIOLD110135210.1371/journal.pcbi.1002459Research ArticleBiologyComputational biologyGeneticsSystems biologyGenetics and GenomicsComputational BiologyParameters in Dynamic Models of Complex Traits are Containers of Missing HeritabilityDynamic Model Parameters and Missing HeritabilityWangYunpeng^{1}GjuvslandArne B.^{2}VikJon Olav^{2}SmithNicolas P.^{3}HunterPeter J.^{4}OmholtStig W.^{1}^{*}Centre for Integrative Genetics, Department of Animal and Aquacultural Sciences, Norwegian University of Life Sciences, Ås, NorwayCentre for Integrative Genetics, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Ås, NorwayDepartment of Biomedical Engineering, St Thomas' Hospital, King's College London, London, United KingdomAuckland Bioengineering Institute, The University of Auckland, Auckland, New ZealandRothFrederick P.EditorHarvard Medical School, United States of America* Email: stig.omholt@umb.no
Conceived and designed the experiments: YW ABG SWO. Analyzed the data: YW ABG. Wrote the paper: YW ABG JOV NPS PJH SWO. Established computational pipeline: YW ABG JOV. Contributed to computational pipeline: NPS PJH.
The authors have declared that no competing interests exist.
4201254201284e1002459129201119220122012Wang et alThis is an openaccess article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Polymorphisms identified in genomewide association studies of human traits rarely explain more than a small proportion of the heritable variation, and improving this situation within the current paradigm appears daunting. Given a wellvalidated dynamic model of a complex physiological trait, a substantial part of the underlying genetic variation must manifest as variation in model parameters. These parameters are themselves phenotypic traits. By linking wholecell phenotypic variation to genetic variation in a computational model of a single heart cell, incorporating genotypetoparameter maps, we show that genomewide association studies on parameters reveal much more genetic variation than when using higherlevel cellular phenotypes. The results suggest that letting such studies be guided by computational physiology may facilitate a causal understanding of the genotypetophenotype map of complex traits, with strong implications for the development of phenomics technology.
Author Summary
Despite an everincreasing number of genome locations reported to be associated with complex human diseases or quantitative traits, only a small proportion of phenotypic variations in a typical quantitative trait can be explained by the discovered variants. We argue that this problem can partly be resolved by combining the statistical methods of quantitative genetics with computational biology. We demonstrate this for the in silico genotypetophenotype map of a model heart cell in conjunction with publically accessible genomic data. We show that genome wide association studies (GWAS) on model parameters identify more causal variants and can build better prediction models for the higherlevel phenotypes than by performing GWAS on the higherlevel phenotypes themselves. Since model parameters are in principle measurable physiological phenotypes, our findings suggest that development of future phenotyping technologies could be guided by mathematical models of the biological systems being targeted.
This work was supported by the Research Council of Norway (http://www.forskningsradet.no/) under the eVITA program, project number 178901/V30 and by the Virtual Physiological Rat Project funded through NIH grant P50GM094503. NOTUR, the Norwegian metacenter for computational science, provided computing resources under project nn4653k. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Introduction
The phenotypic variance cumulatively explained by marker loci found to associate with complex traits in genomewide association studies (GWAS) is usually much less than the narrowsense heritability [1]–[6], the ratio of additive genetic variance to total phenotypic variance. Several explanations have been proposed for this unexplained variance, popularly called the missing heritability [1], including imprecise heritability estimates; insufficient sample size; exclusion of particular types of polymorphisms such as copy number variants and rare SNPs in GWAS; unaccounted epistatic effects; and underestimated effect size of associated SNPs due to incomplete linkage with causal variants [3], [6]. Recently it was shown [7] that a large proportion of the missing heritability can be accounted for if one estimates the variance explained by all available marker loci together. This suggests that most of the genetic variation underlying complex trait variation is due to marginal effects of many loci that are too small to pass stringent significance tests. Strong support for this interpretation comes from several metaanalyses of genomewide association data [8]–[10]. While this insight appears to resolve much of the missing heritability issue as such, it also implies that standard GWAS approaches will not be very helpful for disclosing which genetic variants do actually contribute to additive variance.
Part of the problem underlying the missing heritability is that while the genotypephenotype map in reality arises from complex biological systems best described by nonlinear dynamic models, the statistical machinery of quantitative genetics, including GWAS methods, is built upon linear models of gene action. The aim of this study is not to improve the statistical methods per se, but rather to explore how more of the missing heritability can be explained and understood by combining nonlinear dynamic models with existing GWAS methods. The research program of linking system dynamics and genetics was suggested more than 40 years ago [11] and has been an active research area for more than 10 years [12]–[24]. Emergent properties of nonlinear systems, such as systemic silencing [25], might lead to a situation where genetic variation that penetrates to lowlevel phenotypes underlying a higherlevel phenotype does not necessarily manifest in the higherlevel phenotype itself. Doing GWAS on these lowphenotypes may thus reveal more of the genetic variation influencing the higherlevel trait. This line of reasoning is reflected in recent GWA studies on metabolite profiles [26], [27], in pathway and networkbased analysis of genomewide association studies [28] and in GWAS analyses on global gene expression data [29]–[30]. While all these studies represent important contributions, they do not combine a genetic mapping framework with mathematical models describing how highlevel trait variation emerges from lowlevel trait variation, i.e. they do not provide a quantitative framework for elucidating how genetic variation affecting a lowlevel phenotype do actually influence a focal highlevel phenotype.
If a dynamic model can describe the phenotypic variation of a given trait, it follows that irrespective of the biological resolution of the model, the genetic variation underlying the phenotypic variation will have to be reflected as variation in the parameters of the model. We therefore hypothesized that performing GWAS on parameters in computational physiology models might reveal much more of the underlying genetic variation, as well as shedding light on how this variation actually causes phenotypic variation.
To test the plausibility of this reasoning, we combined GWAS methodology with a causallycohesive genotypephenotype (cGP) model linking genetic variation to phenotypic variation. More specifically, a cGP model [19] is a mathematical model of a biological system where lowlevel parameters have an articulated relationship to an individual's genotype, and higherlevel phenotypes emerge from the mathematical model describing the causal dynamic relationships between these lowerlevel processes. Our approach bears some resemblance to that of functional GWAS (fGWAS) [31], where the genetic control of traits is analyzed by integrating biological principles of trait formation into the GWAS framework through mathematical and statistical bridges. Fu et al.[23] recently extended the functional mapping framework [15] to handle cyclic phenotypes such as circadian rhythms by combining differential equations with functional mapping of QTLs. However, there are clear differences between functional mapping and the cGP approach. In functional mapping the phenotypic measurements are currently done at the systems level, while lowerlevel parameters are estimated by combining curvefitting with more classical QTL methods. In contrast, the cGP approach advocated here focuses on measuring lowerlevel parameters based on the idea that they are highly relevant phenotypes of the system.
We studied a cGP model of a mouse heart cell [24], where genetic variation is mapped to parametric variation, which propagates through the physiological model to generate multivariate phenotypes for the action potential (an electrical signal) and calcium transient (linked to muscle contraction) under regular pacing. The rationale for using a heart cell model was that multiscale and multiphysics modelling of the mammalian heart has a solid empirical basis, and arguably comprises the most complex mathematical conceptualization of any organ or physiological trait available. For clarity of exposition, and because heart cell models lie at the core of this class of multiscale whole organ models [32]–[37], we deemed it sufficient to illustrate our points using a single cell model only. We used HapMap data [38], [39] as a guide to generate genetic variation with realistic allele frequencies and linkage disequilibrium to underlie variation in the model parameters. Based on HapMap [39] individuals we simulated complex pedigree populations and performed GWAS on both lowlevel parameters and highlevel phenotypes arising from the cGP model. The layout of the computational pipeline used for this study is depicted in Figure 1.
10.1371/journal.pcbi.1002459.g001Flowchart of computational pipeline.
A heart cell model, a genetic map and a virtual population are tied together by selecting heart model parameters assumed to be under influence of genetic variation and associating the parameter variation to a population of virtual genomes based upon HapMap 3 data. Individual genotypes are mapped into heart model parameters (steps 1–3) and by running the heart cell model parameters are mapped into celllevel phenotypes (step 4). Finally, GWAS analysis is then performed on the virtual population (step 5).
We show that genomewide association studies on parameters reveal many more of the underlying SNPs than when using higherlevel cellular phenotypes. Furthermore, the SNPs identified by GWAS on parameters can be used to build multivariate prediction models of higherlevel phenotypes giving much higher explained variance than from GWAS on higherlevel phenotypes alone. Our results suggest that combining statistical genetics with computational biology will facilitate both identification of genetic variation underlying complex traits and a much deeper understanding of how this genetic variation becomes causative.
Methods
The general layout of the study is outlined in Figure 1.
Heart cell model
The cell model [40] extends that of Bondarenko et al.[41] with more realistic calcium handling, conservation of charge, and detailed reparameterization to consistent experimental data for the C57BL/6 “black 6” mouse. State variables include ion concentrations of sodium, potassium and calcium in the cytosol, calcium concentration in the sarcoplasmic reticulum, and the state distribution of ion channels, whose transition rates between open, closed, and inactivated conformations may depend on transmembrane voltage. Formulated as a system of coupled ordinary differential equations, this model provides a comprehensive representation of membranebound channels and transporter functions as well as fluxes between the cytosol and intracellular organelles. As the action potential and calcium transient features following an electrical stimulation are the only state descriptors fed into higher level features of current multiscale heart models [33], we used these and associated aggregated measures as highlevel phenotypes, see “Parameter to phenotype mapping” below. See Vik et al.[24] for a detailed description of this model including model diagram, differential equations and a CellML implementation.
Polymorphic parameters
Out of the 86 model parameters we chose 34 to mediate the effects of genetic variation (Table 1 and Table S1). Because the genotype to parameter map for parameters describing ion channel properties may in general be much more straightforward than what is the case for many others, we picked mainly parameters describing affinities, conductivities and ion permeabilities for the ion channels and pumps underlying four potassium outward currents, one calcium current, one chloride current, one sodium current, the sodiumcalcium exchangers, the sarcoplasmic reticular calcium ATPase (SERCA), the sodium potassium pump, cytosolic calmodulin, the ryanodine receptors on sarcoplasmic reticulum and the calcium handling processes within sarcoplasmic reticulum.
10.1371/journal.pcbi.1002459.t001Parameters with genetic variation.
Parameter name
Description
Unit
Baseline value
Min
Max
Ka+
The PC1 – PO1 rate constant of the Ryanodine receptor
µM^{−4}/ms
6.08e3
4.09e3
8.06e3
Ka−
The PO1 – PC1 rate constant of the Ryanodine receptor
ms^{−1}
7.1332
4.70e2
9.58e2
Kb+
The PO1 – PO2 rate constant of the Ryanodine receptor
µM^{−3}/ms
4.05e3
2.64e3
5.47e3
Kb−
The PO2 – PO1 rate constant of the Ryanodine receptor
ms^{−1}
9.65e1
6.32e1
1.31
Kc+
The PO1 – PC2 rate constant of the Ryanodine receptor
ms^{−1}
9.00e3
6.09e3
1.20e2
Kc−
The PC2 – PO1 rate constant of the Ryanodine receptor
ms^{−1}
8.00e4
5.24e4
1.07e3
m
The Ca^{2+} cooperativity parameter of PO1 – PO2 of the Ryanodine receptor

3.0
1.99
3.97
n
The Ca^{2+} cooperativity parameter of PC1 – PO1 of the Ryanodine receptor

4.0
2.75
5.33
P_CaL
The permeability of the Ltype Ca^{2+} channel
ms^{−1}
2.5
1.62
3.30
t_L
The time constant of the switch between open and close states of the Ltype Ca^{2+} channel
ms^{−1}
1.5
1.01
1.98
tau_L
The Inactivation time constant of the Ltype Ca^{2+} channel
ms^{−1}
1.15e3
7.82e2
1.52e3
phi_L
The proportion of closed states in open mode of the Ltype Ca^{2+} channel

1.80
1.23
2.43
Kup
The SERCA affinity to Ca^{2+}
µM
4.12e1
2.93e1
5.68e1
V1
The leak constant of the Network Sarcoplasmic Reticulum
ms^{−1}
4.5
3.05
5.90
KCSQN
The Calsequestrin affinity to Ca^{2+}
µM
6.30e2
4.35e2
8.57e2
K_Co
The affinities of the Na^{+}/Ca^{2+} exchanger to extracellular Ca^{2+}
µM
1.4e3
9.38e2
1.85e3
K_Ci
The affinities of the Na^{+}/Ca^{2+} exchanger to intracellular Ca^{2+}
µM
3.6
2.45
4.93
K_No
The affinities of the Na^{+}/Ca^{2+} exchanger to extracellular Na^{+}
µM
8.80e4
6.06e4
1.20e5
K_Ni
The affinities of the Na^{+}/Ca^{2+} exchanger to intracellular Na^{+}
µM
1.2e4
8.38e3
1.58e4
KNai
The affinity of the Na^{+}/K^{+} pump to intracellular Na^{+}
µM
1.66e4
1.13e4
2.17e4
KKo
The affinity of the Na^{+}/K^{+} pump to extracellular K^{+}
µM
1.5e3
1.04e3
2.08e3
KpCa
The affinity of the Ca^{2+} pump to intracellular Ca^{2+}
µM
2.89e1
1.95e1
3.93e1
Vmax
The maximal exchange rate of Na^{+}/Ca^{2+} exchanger
pA/pF
3.94
2.71
5.19
Imax
The maximal current of the Na^{+}/K^{+} pump
pA/pF
2.49
1.71
3.58
GK1
The maximal conductance of the timedependent K^{+} channel
ms/µF
3.5e1
2.39e1
4.52e1
GKr
The maximal conductance of the rapid delayed rectifier K^{+} channel
ms/µF
1.65e2
1.11e2
2.17e2
GKur
The maximal conductance of the ultrarapidly activating delayed rectifier K^{+} channel
ms/µF
2.50e1
1.76e1
3.27e1
KCl
The half saturation constant of the Ca^{2+} activated Cl^{−} channel
µM
1.00e1
6.65
1.36e1
GNa
The maximal conductance of the Na^{+} channel
ms/µF
1.60e1
1.07e1
2.10e1
GKtof
The maximal conductance of the rapidly recovering transient outward K^{+} channel
ms/µF
5.35e1
3.97e1
7.11e1
GClCa
The maximum conductance of the Ca^{2+} activated Cl^{−} channel
ms/µF
1.00e1
6.56
1.33e1
on_rate
The autophosphorylation rate of Calmodulin
ms^{−1}
5.0e2
3.25e2
6.56e2
off_rate
The dephosphorylation rate of the Calmodulin
ms^{−1}
2.0e4
1.34e4
2.67e4
IpCm
The maximal current of the Ca^{2+} pump
pA/pF
9.55e2
6.35e2
1.26e1
Virtual genome and virtual population
To ensure some realism in the construction of the genetic structure of our in silico populations in terms of allele frequencies and LD patterns, we extracted HapMap3 data [39] for 2,000 evenly spaced SNPs (∼5000 nucleotides apart) for each of the first 20 autosomal chromosomes. We extracted genotypes for the 40000 SNPs for the 1301 individuals in the 11 HapMap 3 populations. We then expanded this into a population of 5000 individuals by use of the Python package simuPOP [42]. The population expansion by simuPOP maintained allele frequencies and LD patterns in accordance with the HapMap 3 data. Mutations were introduced based on a symmetric diallelic mutation model, all recombinations were based on genetic maps estimated from the HapMap data and migrations between the 11 subpopulations were allowed for. The mutation rate, migration rate and number of generations used as input to the simuPOP population expansion were 1e8, 0.001 and 500, respectively.
Genotype to parameter mapping
Assuming a purely additive genetic model, 400 causal SNPs were randomly sampled from the virtual genome for each of the 34 parameters selected to mediate genetic variation. The genotype to parameter mapping for each parameter was set up by defining the 5,000×40,000 genotype matrix G, where each element g_{ij} denoted the genotype of individual i at SNP j (−1 for the homozygous with the least frequent allele, 0 for the heterozygous and 1 for the homozygous with the most frequent allele). We then constructed for each parameter the 40,000x1 relative effect vector E, where element e_{j} was sampled from a Laplace (0, 0.0035) distribution if the jth SNP was among the 400 parameterspecific causative SNPs, and set to 0 otherwise (the relative effect being defined as the percentage increase or decrease of the baseline parameter value (Table 1 and Table S1)). The 5000element vector of parameter values for all individuals was then computed as p(GE+1), where p is the baseline value. With this procedure, each of the focal 34 parameters was varied within ∼±35% of its baseline value, and for each causal SNP, the heterozygous individuals were assigned the baseline parameter value (Table 1 and Table S1). The ±35% parameter variation range was chosen as a compromise between getting ample genetic signals and avoiding too many physiologically unrealistic phenotypes. We also tested a genetic model with 200 causative SNPs for each parameter, the only difference being that the standard deviation of the Laplace distribution was set to 0.0049.
Parameter to phenotype mapping
Cellular phenotypes for individual parameter sets were generated by a virtual experiment of constant pacing as described in Bondarenko et al.[41]. The potassium current was stimulated by −15 V/s for 3 ms at the start of each stimulus interval. Convergence was checked by comparing successive intervals with respect to the initial value of each state variable as well as the integral of its trajectory over that interval. A running history of 10 intervals was kept, and after each interval we checked for a match (within a relative tolerance of 5% for all state variables) against the previous one. This was done for three different pacing rates with stimulus intervals of intervals 100, 200 and 300 ms, respectively. The cell dynamics was categorized as “failure” if it did not converge to nonalternating dynamics within 10 minutes of simulation time. The Python code of the heart cell model was autogenerated from CellML [43], using the code generating service available at the CellML repository (www.cellml.org). The equations were integrated using the CVODE solver [44] with a Python wrapper.
Eight scalar phenotypes (see Table 2 and Table S2) were extracted from each computed action potential and calcium transient curve: the initial value (apbase and ctbase), the amplitude (apamp and ctamp), the peak value (appeak and ctpeak), the time to peak value (apttp and ctttp), the time to 25%, 50%, 75%, and 90% of the initial base value (apd25, apd50, apd75, apd90 and ctd25, ctd50, ctd75, ctd90).
We removed individuals with physiologically unrealistic phenotypes within each of the 100 datasets analyzed. The exclusion criterion was based on the interquartile range (IQR); points that were more than twice the IQR above the third quartile or below the first quartile were excluded. Each filtered data set, containing 4000–5000 individuals, was divided into a training set of 2500 individuals and a test set consisting of the remaining individuals. The training data set was used to detect causal SNPs, compute the false positive rate and sensitivity characteristics. The test set was used to estimate the phenotypic variation accounted for by the detected SNPs.
Statistical analysis
The same GWAS procedure was used for each parameter and each phenotype. The quantitative trait association analysis was performed with the program PLINK [45] on the training data. We used a threshold of 1e5 on the Bonferronicorrected pvalue from PLINK to determine the set of significant SNPs.
The detected SNP set and associated discovery rates were defined as follows. Let S_{i} denote the set of significant SNPs from GWAS on the ith parameter and let C_{i} denote the causal SNPs set of the ith parameter. The set of detected SNPs of the ith parameter was then computed as D_{i} = S_{i}∩C_{i}, and the discovery rate of ith parameter was computed as d_{i} = D_{i}/C_{i}. The union of causal SNP sets for parameters defined the causal SNP set underlying all cellular phenotypes, and the detected SNP set and the discovery rate for each cellular phenotype was computed in the same way as for each parameter. The set of false positive SNPs of the ith parameter or phenotype, F_{i}, consists of SNPs in the set of significant SNPs S_{i} that are not in the causal SNPs set C_{i}_{.}. The false positive rate of the ith parameter or phenotype was defined as the number of false positive SNPs in F_{i} divided by the number of signals in S_{i}, F_{i}/S_{i}.
To quantify explained genetic variance a multiple regression model was constructed by regressing the phenotype or parameter value of the training set on the causal SNPs detected by GWAS (similar to the weighted genomic profile approach in [46]). Then the phenotypes of test set individuals were predicted using the corresponding fitted models. We measured the explained variation by the R^{2} values from regressing observed values on predicted phenotypic values for the individuals in the test set.
Global sensitivity analysis
We quantified the linear sensitivity [47] of each phenotype to each parameter using linear regression in the training set. For each highlevel phenotype and Monte Carlo simulation we used the 2500 simulated phenotypes as response and performed a series of univariate regressions each time with a single parameter as predictor. We measured global sensitivity by the coefficient of determination (R^{2}).
Results/Discussion
The proportion of true causative SNPs detected by GWAS was as expected substantially higher for the parameters than for the cellular phenotypes (Figure 2 and Figure S2 for the 200 SNPs case). Median detection rates of causal SNPs were in the range 3.5%–4% after GWAS directly on parameter values (Figure 2A), and this number dropped to ∼0.05% for GWAS studies on action potential phenotypes and ∼0.02% for calcium transient phenotypes (Figure 2B), and the corresponding figures in the 200 SNPs case were 8–8.5%, ∼0.16% and ∼0.08%. The low overall detection rates were to be expected since we sampled SNP effects from an Lshaped distribution resulting in datasets where a small proportion of the SNPs underlying a given parameter will explain a substantial part of the variation. The main explanation for the decrease in detection rates is that the number of causal SNPs increases 34 times and the relative effects of all causal SNPs decrease, making them harder to pick up. Another, probably less important, phenomenon contributing to lower detection rates at the higherlevel phenotypes is that going from parameter level to the systemlevel phenotype introduces nonlinearities in the SNP effects, and standard GWAS methods pick up only the additive part.
10.1371/journal.pcbi.1002459.g002Percentage of causative SNPs detected by GWAS.
(A) The percentage of 400 causative SNPs (y axis) detected as significant SNPs by GWAS on genetically controlled model parameters (x axis). (B) The percentage of all 13600 causative SNPs (y axis) detected as significant SNPs by GWAS on cellular phenotypes (x axis). Each boxplot summarizes 100 Monte Carlo runs. See Methods for further descriptions of model parameters and phenotypes.
The difference between parameter and cellular phenotypes is also evident when looking at the amount of phenotypic variance explained by SNPs detected in the GWAS (Figure 3 and Figure S3 for the 200 SNPs case). The median explained variance is typically in the range 30–40% for parameter phenotypes (Figure 3A), 10–20% for action potential phenotypes and ∼5% for calcium transient phenotypes (Figure 3B). The proportion of phenotypic variance explained by detected SNPs was on average 2.6 (2.0 in the 200 SNP case) and 5.6 (3.9 for the 200 SNPs case) times higher for a parameter phenotype than for an action potential and calcium transient phenotype, respectively. However, when we made use of the SNPs detected for parameters we were able to explain 1.8 and 3.9 times (1.6 and 2.9 times for the 200 SNPs case) more of the phenotypic variance of the action potential and calcium transient phenotypes, respectively, approaching the levels obtained for the parameters (Figure 3C). We also calculated the explained variances with all significant SNPs and obtained similar results. This suggests that our approach can be tested empirically in a straightforward way.
10.1371/journal.pcbi.1002459.g003Phenotypic variance explained by genotypic variation.
(A) Total explained variance for genetically controlled parameters (x axis) using detected causal SNPs as predictors. (B) Total explained variance for cellular phenotypes (x axis) using detected causal SNPs obtained from GWAS targeting these phenotypes. (C) Total explained variance for cellular phenotypes (x axis) using detected causal SNPs obtained from GWAS targeting all genetically controlled parameters. Each boxplot summarizes total explained variance by GWAS for 100 Monte Carlo runs. Explained variance was measured as R^{2} from test set prediction with a multiple regression model, see Methods for further descriptions.
The gain in explained variance by using parameterassociated SNPs was not as dramatic for the action potential phenotypes as for the calcium transient phenotypes (Figure 3C), but even in this case the gain in number of identified SNPs was on average 13.9× (12.3 for the 200 SNPs case). The corresponding figure for the calcium transient phenotypes was 39.4× (26.5 for the 200 SNPs case). Because these additional SNPs are attached to one or more parameters describing specific biological processes or features that are causally related according to the functional structure of the mathematical model, the gain in our causal understanding of the genotype to phenotype map may be substantial.
Both the detection rate of causal SNP and the variances explained for the calcium transient phenotypes were overall significantly lower than those for the action potential phenotypes (Figure 2B and 3B). We investigated this further by a linear global sensitivity analysis of how variation in the cellular phenotypes depended on variation in the parameters, and compared this with the number of causative SNPs for each parameter detected by performing GWAS on highlevel cellular phenotypes. We found that the GWAS results for the two cellular phenotype groups are predominantly a consequence of the sensitivity structure of the dynamic model (Figure 4 and Figure S4 for the 200 SNPs case), and that the action potential phenotypes are overall more sensitive to fewer parameters than the calcium transient phenotypes. The only exception to this latter pattern is the parameter Kup, quantifying the affinity of SERCA to calcium ions (Figure 4A). It has a substantial impact on the calcium transient base value phenotype (ctbase), and the amount of variance explained by the SNPs detected for this phenotype is on par with the action potential phenotypes (Figure 3B). This suggests that SNPs associated with traits that are sensitive to few parameters will have a higher penetrance than SNPs associated with traits that are sensitive to many parameters for a given model resolution. Moreover, the results imply that the more polyparametric the sensitivity profile of a model phenotype is, the more will be gained in terms of added explained variance by performing GWAS on parameters. On the other hand, the results also imply that a sensitivity analysis can be used to systematically reveal hotspots for genetic variation underlying a complex trait and thus guide a parameter phenotyping program. Within this framework a SNP affecting a parameter to which the focal higherlevel phenotypes are not very sensitive will have little impact on these phenotypes unless it is highly penetrant at the parameter level.
10.1371/journal.pcbi.1002459.g004The close resemblance between GWAS results and linear sensitivity analysis.
(A) The number of causative SNPs for each parameter(y axis) detected by performing GWAS on highlevel cellular phenotypes(x axis). The color intensity of each square describes the mean value of 100 Monte Carlo runs. (B) Sensitivities of the highlevel phenotypes (x axis) of the 2500 individuals in the training set to variation in each parameter (y axis) quantified by univariate linear regression (see Methods). The color intensity of each square describes the mean R^{2} (coefficient of determination) value of 100 Monte Carlo runs.
GWAS methods are well known for producing false positives due to multiple testing and high LD between SNPs. A typical GWAS block of SNPs in high LD is often reduced to a subset of tagSNPs in low LD (typically with a pairwise correlation <0.2). The GWAS methods are aimed at identifying significant tagSNPs, and the task of distinguishing the causal SNPs from false positives in high LD has to be done with other methods such as functional studies of candidate SNPs. Our approach is not intended to solve this problem (but see e.g. [48], [49] for reviews of methods for identifying causal variants after GWAS) and in our study the increases detection rate for parameters is accompanied by an expected increased false positive rate (Figure S1 and Figure S5 for the 200 SNPs case). However, as parameters as a rule are closer to mechanism than higherlevel phenotypes, it should be noted that to do GWAS on parameters could become very instrumental for identifying candidate mechanisms and genes for follow up studies. We envision that ongoing efforts such as the RICORDO project [50] aimed at developing semantic interoperability for biomedical data and models will facilitate bioinformatic identification of candidate mechanisms and genes from cGP model sensitivities and GWAS results on parameter phenotypes.
We made deliberate use of the simplest possible genotype to parameter map in this study. A more complex map incorporating genetic dominance and various types of epistasis [51] would have diminished the SNP discovery rates and the explained variances of the parameters. However, this reduction in penetrance would apply equally well at higher phenotypic levels, and so would not affect our conclusions. We did not put any environmental variation on the parameters as we deemed this unnecessary in a context where the main focus was to compare the genetic signal strength at the parameter and cellular phenotype levels. However, in future studies this aspect needs to be taken into account in order to make quantitative assessments of how well we will be able to pick up genetic signals as function of environmental variation.
Our approach will remain useful in conjunction with future advances in statistical GWAS methodology, as it is applicable to any phenotypic variation that can be described by computational physiological modeling, irrespective of its position in the phenotypic hierarchy. Even in those cases where the parameters of a computational model are quite highlevel phenotypes, our results suggest that one will be able to gain insights about the genotype to phenotype map that would otherwise be challenging to achieve.
There has been an enormous expansion in efforts to model complex biological systems the last decade, and steadily expanding model repositories such as http://www.cellml.org and http://biomodels.net facilitate exchange and reuse of such models. Illustratively, our study benefited from the reuse of a model available in CellML format. Future development of the cGP approach and systems genetics in general will benefit greatly from these standards and online resources as well as modeling efforts like the Virtual Physiological Human (http://www.vphnoe.eu).
Reflecting upon how to improve the current performance of largescale GWA studies aiming to find the genetic determinants underlying complex diseases, Dermitzakis and Clark stated recently that “A major breakthrough will be to predict and interpret the effect of mutational and biochemical changes in human cells and understand how this signal is transmitted spatially (among tissues) and temporally (spanning development)” [52]. Our results suggest that combining GWAS methodology with a mature phenomics technology guided to fit the needs of computational physiology [53], may contribute substantially to making this vision come true.
Supporting Information
False positive rates of GWAS on parameters and cellular phenotypes (400 SNPs case). Boxplots summarizing false positive rates (y axis) for 100 Monte Carlo simulations for (A) parameters and (B) cellular phenotypes. The false positive rate is defined as the proportion of the noncausative SNPs among those identified as significant by the GWAS.
(EPS)
Percentage of causative SNPs detected by GWAS (200 SNPs case). (A) The percentage of 200 causative SNPs (y axis) detected as significant SNPs by GWAS on genetically controlled model parameters (x axis). (B) The percentage of all 6800 causative SNPs (y axis) detected as significant SNPs by GWAS on cellular phenotypes (x axis). Each boxplot summarizes 100 Monte Carlo runs. See Methods for further descriptions of model parameters and phenotypes.
(EPS)
Phenotypic variance explained by genotypic variation (200 SNPs case). (A) Total explained variance for genetically controlled parameters (x axis) using detected causal SNPs as predictors. (B) Total explained variance for cellular phenotypes (x axis) using detected causal SNPs obtained from GWAS targeting these phenotypes. (C) Total explained variance for cellular phenotypes (x axis) using detected causal SNPs obtained from GWAS targeting all genetically controlled parameters. Each boxplot summarizes total explained variance by GWAS for 100 Monte Carlo runs. Explained variance was measured as R^{2} from test set prediction with a multiple regression model, see Methods for further descriptions.
(EPS)
The close resemblance between GWAS results and linear sensitivity analysis (200 SNPs case). (A) The number of causative SNPs for each parameter (y axis) detected by performing GWAS on highlevel cellular phenotypes (x axis). The color intensity of each square describes the mean value of 100 Monte Carlo runs. (B) Sensitivities of the highlevel phenotypes (x axis) of the 2500 individuals in the training set to variation in each parameter (y axis) quantified by univariate linear regression (see Methods). The color intensity of each square describes the mean R^{2} (coefficient of determination) value of 100 Monte Carlo runs.
(EPS)
False positive rates of GWAS on parameters and cellular phenotypes (200 SNPs case). Boxplots summarizing false positive rates (y axis) for 100 Monte Carlo simulations for (A) parameters and (B) cellular phenotypes. The false positive rate is defined as the proportion of the noncausative SNPs among those identified as significant by the GWAS.
(EPS)
Parameters with genetic variation. This supplementary table contains data similar to that shown in Table 1, the only difference being that it is based on 200 causal SNPs per parameter instead of 400.
(PDF)
Attained cellular phenotype values. This supplementary table contains data similar to that shown in Table 2, the only difference being that it is based on 200 causal SNPs per parameter instead of 400.
(PDF)
ReferencesMaherB2008Personal genomes: The case of the missing heritability.ManolioTACollinsFSCoxNJGoldsteinDBHindorffLA2009Finding the missing heritability of complex diseases.MakowskyRPajewskiNMKlimentidisYCVazquezAIDuarteCW2011Beyond missing heritability: prediction of complex traits.Lango AllenHEstradaKLettreGBerndtSIWeedonMN2010Hundreds of variants clustered in genomic loci and biological pathways affect human height.ParkJHWacholderSGailMHPetersUJacobsKB2010Estimation of effect size distribution from genomewide association studies and implications for future discoveries.EichlerEEFlintJGibsonGKongALealSM2010Missing heritability and strategies for finding the underlying causes of complex disease.YangJManolioTAPasqualeLRBoerwinkleECaporasoN2011Genome partitioning of genetic variation for complex traits using common SNPs.ZegginiEScottLJSaxenaRVoightBFMarchiniJL2008Metaanalysis of genomewide association data and largescale replication identifies additional susceptibility loci for type 2 diabetes.ChoYSChenCHHuCLongJHee OngRT2011Metaanalysis of genomewide association studies identifies eight new loci for type 2 diabetes in east Asians.KatoNTakeuchiFTabaraYKellyTNGoMJ2011Metaanalysis of genomewide association studies identifies common variants associated with blood pressure variation in east Asians.BurnsJ1970The synthetic problem and the genotypephenotype relation in cellular metabolism.WaddingtonCHFrankSA1999Population and Quantitative Genetics of Regulatory Networks.OmholtSWPlahteEOyehaugLXiangK2000Gene regulatory networks generating the phenomena of additivity, dominance and epistasis.GilchristMANijhoutHF2001Nonlinear developmental processes as sources of dominance.MaCXCasellaGWuR2002Functional mapping of quantitative trait loci underlying the character process: a theoretical framework.PeccoudJVeldenKVPodlichDWinklerCArthurL2004The selective values of alleles in a molecular network model are context dependent.WelchSMDongZRoeJLDasS2005Flowering time control: gene network modelling and the link to quantitative genetics: Modelling complex traits for plant improvement.GjuvslandABHayesBJOmholtSWCarlborgO2007Statistical epistasis is a generic feature of gene regulatory networks.RajasinghHGjuvslandABVågeDIOmholtSW2008When parameters in dynamic models become phenotypes: a case study on flesh pigmentation in the chinook salmon (Oncorhynchus tshawytscha).GertzJGerkeJPCohenBA2010Epistasis in a quantitative trait captured by a molecular model of transcription factor interactions.SalazarCiudadIJernvallJ2010A computational model of teeth and the developmental origins of morphological variation.PumirAShraimanB2011Epistasis in a Model of Molecular Signal Transduction.FuGWangZLiJWuR2011A mathematical framework for functional mapping of complex phenotypes using delay differential equations.VikJOGjuvslandABLiLTøndelKNiedererS2011Genotypephenotype map characteristics of an in silico heart cell.GjuvslandABPlahteEOmholtSW2007Thresholddominated regulation hides genetic variation in gene expression networks.GiegerCGeistlingerLAltmaierEHrabé de AngelisMKronenbergF2008Genetics meets metabolomics: a genomewide association study of metabolite profiles in human serum.IlligTGiegerCZhaiGRömischMarglWWangSattlerR2009A genomewide perspective of genetic variation in human metabolism.BaranziniSEGalweyNWWangJKhankhanianPLindbergR2009Pathway and networkbased analysis of genomewide association studies in multiple sclerosis.CooksonWLiangLAbecasisGMoffattMLathropM2009Mapping complex disease traits with global gene expression.AlaKorpelaMKangasAJInouyeM2011Genomewide association studies and systems biology: together at last.DasKLiJWangZTongCFuG2011A dynamic model for genomewide association studies.NobleD2002Modeling the Heart–from Genes to Cells to the Whole Organ.SmithNPMulquineyPJNashMPBradleyCPNickersonDP2002Mathematical modelling of the heart: cell to organ.SmithNPNickersonDPCrampinEJHunterPJ2004Multiscale computational modelling of the heart.HunterPJBorgTK2003Innovation: Integration from proteins to organs: the Physiome Project.NickersonDNashMNielsenPSmithNHunterP2006Computational multiscale modeling in the IUPS Physiome Project: Modeling cardiac electromechanics.SmithNde VecchiAMcCormickMNordslettenDCamaraO2011euHeart: personalized and integrated cardiac care using patientspecific cardiovascular modelling.GibbsRABelmontJWHardenbolPWillisTDYuF2003The International HapMap Project.AltshulerDMGibbsRAPeltonenLAltshulerDMGibbsRA2010Integrating common and rare genetic variation in diverse human populations.LiLNiedererSAIdigoWZhangYHSwietachP2010A mathematical model of the murine ventricular myocyte: a datadriven biophysically based approach applied to mice overexpressing the canine NCX isoform.BondarenkoVESzigetiGPBettGCLKimSJRasmussonRL2004Computer model of action potential of mouse ventricular myocytes.PengBAmosCI2010Forwardtime simulation of realistic samples for genomewide association studies.LloydCMHalsteadMDBNielsenPF2004CellML: its future, present and past.CohenSHindmarshC1996CVODE, a stiff/nonstiff ODE solver in C.PurcellSNealeBToddBrownKThomasLFerreiraMAR2007PLINK: a tool set for wholegenome association and populationbased linkage analyses.AulchenkoYSStruchalinMVBelonogovaNMAxenovichTIWeedonMN2009Predicting human height by Victorian and genomic methods.SaltelliARattoMAndresTCampolongoFCariboniJ2008CantorRMLangeKSinsheimerJS2010Prioritizing GWAS results: A review of statistical methods and recommendations for their application.IoannidisJPAThomasGDalyMJ2009Validating, augmenting and refining genomewide association signals.de BonoBHoehndorfRWimalaratneSGkoutosGGrenonP2011The RICORDO approach to semantic interoperability for biomedical data and models: strategy, standards and solutions.PhillipsPC2008Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems.DermitzakisETClarkAG2009Genetics. Life after GWA studies.HouleDGovindarajuDROmholtS2010Phenomics: the next challenge.