Fig 1.
a) Schema of the MegaNet, circle and square represent nodes for enhancers and genes, respectively, and colored directed arrows are enhancer->gene regulation edges. Different colors represent the interactions active in different tissues. Dashed lines represent the gene-gene interactions. b) Comparison of network features between LoF-tolerant vs. genome-wide, LoF-tolerant vs. Low-LoF-tolerance and Low-LoF-tolerance vs. genome-wide enhancers. Significant comparison (Wilcoxon rank sum test, P-value < 0.05) are shown in color while non-significant ones are marked by dashed lines. Effective sizes of each comparison are shown by Cohen’s D in color scale. Positive Cohen’s D stands for higher average while negative values stand for lower average. LoF-tol, Low-LoF-tol and GW represent LoF-tolerant, low-LoF-tolerance and genome-wide respectively. ‘a’ and ‘v’ stand for the average and variance for the corresponding features in a), detailed feature description is in Table 1.
Table 1.
Summary of network features.
Fig 2.
Three horizontal bars separately show the percentage of low-LoF-tolerance, all enhancers and LoF-tolerant enhancers in each tissue type. The matching vertical bar plots show the odds ratios for enrichment of the percentage of low-LoF-tolerance and LoF-tolerant enhancers for each tissue relative to all tissues. (asterisks mark the statistical significance using Fisher’s exact test).
Fig 3.
a) Stratified 10-fold cross validation mean ROC of the final random forest classification model. Results shown with conservation included and excluded in the feature set. b) Collective feature importance for the classification model. X-axis shows the feature collections. Gene features include gene indispensability scores (GISa/v), and their centrality metrics in the MegaNet (GPRC, GDC, GID, GCC, GEC) and gene in-degrees from each tissue subnetwork. Enhancer features include ETU, EOD in the MegaNet as well as EOD in each tissue subnetwork. Edge features refers to EGTUa/v. See Table 1 for further details of network features. c) Enhancer tissue ubiquity (ETU) and average gene indispensability scores (GISa) for LoF-tolerant and low-LoF-tolerance enhancer candidates.
Fig 4.
Validation using PacBio SVs and disease enhancers.
a) Cumulative distribution function for LoF-tolerant scores for disease enhancers (red), all tested enhancers (grey), PacBio deleted enhancers (blue). KS-test P-values are between disease enhancers vs. all tested and PacBio enhancers vs. all tested. b) Number of observed LoF-tolerant enhancers with increasing sample size. On the x-axis, 2503 includes the LoF-tolerant enhancers observed from 3 additional individuals sequenced using PacBio. c) Genome region of SLC26A4 and part of the enhancers regulating it. Blue denotes the predicted LoF-tolerant enhancers, while red is for predicted low-LoF-tolerance enhancers.