Fig 1.
The general scheme of our method.
Gene expression imputation models were built based on gene expression matrix Y of a specific gene in a tissue, the genotype matrix X and epigenetic annotation matrix A of cis-SNPs of the gene. The annotation matrix was used to select SNPs having regulatory effects on gene expression since we assumed that only part of cis-SNPs have effects on gene expression of the nearby gene. After getting SNP coefficient vectors β in each tissue for the gene, we combined β’s with GWAS summary stats and then get the gene-level association statistics for each disease in each tissue.
Fig 2.
More SNPs with functional potential were idenfied and more imputation models were built by T-GEN.
A) compares the percentages of SNPs in gene expression imputation models having active ChromHMM15 annotated states across three different methods (elnt, vb and vb.logit) using gene expression and genotype data from GTEx in 26 tissues. The “elnt” model was built via elastic net, the “vb” model was built via a variational Bayesian method, and the “T-GEN” model was built using our method (variational bayesian method with a logit link). Across all 26 tissues, imputation models built by our method have higher percentage of SNPs with active ChromHMM annotated states (indicated by blue bars). X axis denotes the mean percentage of SNPs in imputation models having ChromHMM15 annotated states in each tissue for each models. The dotted lines are the mean values of R2 across 26 tissues for each method. B) shows the ratio of CADD score mean level increases in T-GEN compared to elnt and vb models in 26 tissues. C) shows the ratios of gene model numbers (FDR < 0.05) in each method over that in elastic net models. D) indicates the difference in the number of genes models between that from each method and that from elastic net. In C and D, different colors of each tissue indicate their sample sizes, from upper to lower: [401,501), [301, 401), [201, 301), [101, 201).
Fig 3.
More genes were identified as trait-associated by T-GEN across 207 traits from the LD Hub.
Applied to 207 traits from LD Hub, significant trait-associated genes were identified in 26 tissues (p-values threshold: 0.05 divided by the number of gene-tissue pairs). Each boxplot represents the distribution of the number of differences between that identified from our tissue-specific analysis and that identified from the four other methods.
Fig 4.
Function constriant and tissue-specificity of identified trait-associated genes.
A) Higher pecentage of significant genes by T-GEN have pLI scores larger than 0.99 for genes identified in leass than 5 traits. Considering the number of traits that each identified gene is associated with, all significant trait-associated genes were groupped into three categories. The bar plot shows the percentage of genes identified by each method having larger pLI scores (>0.99) in each category. Error bars indicate the standard error calculated using bootstraping (120 traits each time, for 20 times). B) More genes were identified by T-GEN as trait-associated in tissues most enriched for genetics signals. In tissues with the highest heritability enrichment and also other tissues, the numbers of identified trait-associated genes were compared across all five methods. Each barplot shows the mean value of the numbers of identified trait-associated genes across 207 traits in the LD Hub.
Fig 5.
Gene-level manhattan plot of T-GEN results in IGAP data.
The plot shows the gene-level association with LOAD attained from T-GEN. Several significant genes are indicated in the figure.
Fig 6.
Regional Manhattan plot around TMEM135.
The listed SNP (rs541458) is one of the identifed eQTLs by T-GEN in the imputation model of TMEM135. Among all eQTL of TMEM135 identified by T-GEN, this SNP also has the strongest GWAS signal in the published AD GWAS study.