Survey of the Heritability and Sparse Architecture of Gene Expression Traits across Human Tissues

doi:10.1371/journal.pgen.1006423

Table 1.

Estimates of unconstrained local h² across genes within tissues.

More »

Expand

Fig 1.

Genes with heritable expression in DGN whole blood are more tolerant to loss of function mutations.

The distribution of the probability of being loss-of-function intolerant (pLI) for each gene (from the Exome Aggregation Consortium [32]) dichotomized by local heritability estimates. The Kruskal-Wallis rank sum test revealed a significant difference in the pLI of heritability groups (χ² = 234, P < 10⁻⁵²). More heritable genes (h² > 0.1 in blue) have lower pLI metrics and are thus more tolerant to mutation than genes with lower h².

More »

Expand

Fig 2.

Sparsity estimates using Bayesian Sparse Linear Mixed Models in DGN whole blood.

(A) This panel shows a measure of sparsity of the gene expression traits represented by the PGE parameter from the BSLMM approach. PGE is the proportion of the sparse component of the total variance explained by genetic variants, PVE (the BSLMM equivalent of h²). The median of the posterior samples of BSLMM output is used as estimates of these parameters. Genes with a lower credible set (LCS) > 0.01 are shown in blue and the rest in red. The 95% credible set of each estimate is shown in gray. For highly heritable genes the sparse component is close to 1, thus for high heritability genes the local architecture is sparse. For lower heritability genes, there is not enough evidence to determine sparsity or polygenicity. (B) This panel shows the heritability estimate from BSLMM (PVE) vs the estimates from GCTA, which are found to be similar (R = 0.96). Here, the estimates are constrained to be between 0 and 1 in both models. Each point is colored according to that gene’s elastic net α = 1 cross-validated prediction correlation squared (EN R²). Note genes with high heritability have high prediction R², as expected.

More »

Expand

Fig 3.

DGN cross-validated predictive performance across the elastic net.

Elastic net prediction models were built in the DGN whole blood and performance was quantified by the cross-validated R² between observed and predicted expression levels. (A) This panel shows the 10-fold cross validated R² for 51 genes with R² > 0.3 from chromosome 22 as a function of the elastic net mixing parameters (α). Smaller mixing parameters correspond to more polygenic models while larger ones correspond to more sparse models. Each line represents a gene. The performance is in general flat for most values of the mixing parameter except very close to zero where it shows a pronounced dip. Thus polygenic models perform more poorly than sparse models. (B) This panel shows the difference between the cross validated R² of the LASSO model and the elastic net model mixing parameters 0.05 and 0.5 for autosomal protein coding genes. Elastic net with α = 0.5 values hover around zero, meaning that it has similar predictive performance to LASSO. The R² difference of the more polygenic model (elastic net with α = 0.05) is mostly above the 0 line, indicating that this model performs worse than the LASSO model.

More »

Expand

Fig 4.

BSLMM vs LMM estimates of heritability in GTEx.

This figure shows the comparison between estimates of heritability using BSLMM vs. LMM (GCTA) for GTEx data. Here, in both models the estimates are constrained to be between 0 and 1. For most genes BSLMM estimates are larger than LMM estimates reflecting the fact that BSLMM yields better estimates of heritability because of its ability to account for the sparse component. Each point is colored according to that gene’s prediction R² (correlation squared between cross-validated elastic net prediction vs observed expression denoted EN R²). At the bottom right of each panel, we show the correlation between BSLMM (EN_v_BSLMM) and LMM (EN_v_LMM). BSLMM is consistently more correlated with the elastic net correlation. This provides further indication that the local architecture is predominantly sparse.

More »

Expand

Fig 5.

Orthogonal Tissue Decomposition of gene expression traits.

For a given gene, the expression level is decoupled into a component that is specific to the individual and another component that is specific to the individual and tissue. The left side of the equation in the figure corresponds to the original “whole tissue” expression levels. The right side has the component specific for the individual, independent of the tissue and the tissue-specific component. Given the lack of multiple replications for a given tissue/individual we use a mixed effects model with a random effect that is specific to the individual. The cross-tissue component is estimated as the posterior mean of the subject-specific random effect. The tissue-specific component is estimated as the residual of the model fit, i.e. the difference between the “whole tissue” expression and the cross-tissue component. The rationale is that once we remove the component that is common across tissues, the remaining will be specific to the tissue. Models are fit one gene at a time. Covariates are not shown to simplify the presentation.

More »

Expand

Fig 6.

Measure of uniformity of the posterior probability of active regulation vs. cross-tissue heritability.

Uniformity was computed using the posterior probability of a gene being actively regulated in a tissue, PPA, from the Flutre et al. [33] multi-tissue eQTL analysis. (A) Representative examples showing that genes with PPA concentrated in one tissue were assigned small values of the uniformity measure whereas genes with PPA uniformly distributed across tissues were assigned high value of uniformity measure. See Methods for the entropy-based definition of uniformity. (B) This panel shows the distribution of heritability of the cross-tissue component vs. a measure of uniformity of genetic regulation across tissues. The Kruskal-Wallis rank sum test revealed a significant difference in the cross-tissue h² of uniformity groups (χ² = 31.4, P < 10⁻⁶).

More »

Expand

Fig 7.

Comparison of heritability of whole tissue or tissue-specific components vs. PPA.

Panel (A) of this figure shows the Pearson correlation (R) between the BSLMM PVE of the original (we are calling whole here) tissue expression levels vs. the probability of the tissue being actively regulated in a given tissue (PPA). Matching tissues show, in general, the largest correlation values but most of the off diagonal correlations are also relatively high consistent with the shared regulation across tissues. Panel (B) shows the Pearson correlation between the PVE of the tissue-specific component of expression via orthogonal tissue decomposition (OTD) vs. PPA. Correlations are in general lower but matching tissues show the largest correlation. Off diagonal correlations are reduced substantially consistent with properties that are specific to each tissue. Area of each circle is proportional to the absolute value of R.

More »

Expand