Skip to main content
Advertisement

< Back to Article

Fig 1.

Overview of the UNEECON model.

UNEECON estimates negative selection on missense mutation i in gene j based on the relative probability of the occurrence of the missense mutation, ηij, compared to the occurrence probability of neutral mutations, μij. ηij depends on the sum of a variant-level fixed effect, zij, and a gene-level random effect, uj. We assume that zij captures the contribution of variant-level features, Xij, to negative selection, and model the relationship between Xij and zij with a feedforward neural network. We assume that uj is a Gaussian random variable modeling the gene-level variation of selective constraints that cannot be predicted from variant features. The sum of zij and uj is then sent to a logistic function to obtain ηij. The neutral occurrence probability, μij, is from a context-dependent mutation model trained on putatively neutral mutations. Free parameters of the UNEECON model are estimated by minimizing the discrepancy between the predicted occurrence probability, ηijμij, and the observed occurrence of each potential missense mutation in the gnomAD exome sequencing data [29].

More »

Fig 1 Expand

Fig 2.

Distributions of UNEECON scores across potential missense mutations.

(a) Distributions of UNEECON scores estimated for potential missense mutations in haploinsufficient (HI) genes [37], autosomal dominant disease genes [35, 36], autosomal recessive disease genes [35, 36], and olfactory receptor genes [45]. (b) Distributions of UNEECON scores estimated for potential missense mutations in various protein regions. The functional sites and protein secondary structures are based on UniProt annotations [47]. The predicted disordered protein regions are from MobiDB [48]. (c) Average UNEECON scores estimated for all codon positions in the CDKL5 protein. Each grey dot represents the UNEECON score averaged over all missense mutations in a codon position. Blue curve represents the locally estimated scatterplot smoothing (LOESS) fit. Blue and red dots represent pathogenic and benign missense variants from ClinVar [30], respectively. The horizontal line represents a constrained region reported in a previous study [25].

More »

Fig 2 Expand

Fig 3.

Predictive power of various methods for distinguishing pathogenic missense variants from benign missense variants.

(a) Performance in predicting autosomal dominant pathogenic variants from ClinVar [30]. True positive and true negative rates correspond to the fractions of pathogenic and benign variants exceeding various thresholds, respectively. AUC corresponds to the area under the receiver operating characteristic curve. (b) Enrichment of predicted deleterious de novo variants in individuals affected by developmental disorders [31]. The y-axis corresponds to the log2 odds ratio of the enrichment of predicted deleterious variants in the affected individuals for a given percentile threshold. The x-axis corresponds to the various percentile threshold values used in the enrichment analysis. Error bars represent the standard error of the log2 odds ratio.

More »

Fig 3 Expand

Fig 4.

Predictive power of various methods for distinguishing disease and essential genes from genes not likely to have strong phenotypic effects.

(a) Performance in predicting autosomal dominant disease genes [35, 36]. (b) Performance in predicting haploinsufficient genes [37]. (c) Performance in predicting human orthologs of mouse essential genes [33, 34]. (d) Performance in predicting human essential genes in cell lines [32]. True positive and true negative rates correspond to the fractions of positive and negative genes exceeding various thresholds, respectively. AUC corresponds to the area under the receiver operating characteristic curve.

More »

Fig 4 Expand

Fig 5.

Distributions of gene-level intolerance to missense and to loss-of-function mutations.

(a) Correlation between gene-level intolerance to missense mutations (UNEECON-G score) and that to loss-of-function (LOF) mutations (pLI score). Blue dots represent 956 genes intolerant to both missense and LOF mutations. Red dots represent 956 genes tolerant to missense but not to loss-of-function mutations. (b) Distribution of protein disorder content in the gene sets intolerant to loss-of-function mutations. (c) Enrichment of Reactome pathways in the gene set intolerant to both missense and loss-of-function mutations. The gene set tolerant to missense but not to loss-of-function mutations is used as a background. Only the highest-level Reactome terms from the PANTHER hierarchy view are included in the visualization. The term “unclassified” indicates that the corresponding genes have no known or inferred function. A fold enrichment below 1 indicates a depletion in the gene set intolerant to both missense and loss-of-function mutations, or equivalently, an enrichment in the gene set tolerant to missense but not to loss-of-function mutations. (d) Enrichment of autism genes in the gene sets intolerant to loss-of-function mutations. Error bars represent the standard error of the log2 odds ratio.

More »

Fig 5 Expand