Skip to main content
Advertisement

< Back to Article

Fig 1.

The general structure of the study.

See details in the main text.

More »

Fig 1 Expand

Fig 2.

3D features perform comparably to state-of-the-art features and encode new information.

(A) Boxplots of Pearson correlations (absolute values) between predictive features and CRISPR efficiency in the Leenay, TTISS and GUIDE-Seq datasets. Blue: 420 sequence features; orange: 5 thermodynamic features; green: 9 3D features. (B-C) Bar charts of Pearson correlations (blue) and partial correlations (orange) with CRISPR efficiency, in the Leenay, TTISS and GUIDE-Seq datasets. The partial correlation of each vector controls for all other vectors in the dataset’s subplot. (B) Epigenetic features and 3D feature (correlations appear in absolute value); (C) State-of-the-art models and 3D feature.

More »

Fig 2 Expand

Fig 3.

3D features significantly improve CRISPR prediction.

Histograms of the correlations between measured and predicted efficiency in the Leenay and TTISS datasets, when testing LASSO and xgboost models. The blue/orange histogram indicates the model with/without the 3D feature and the average correlation is marked with a solid/dashed line, respectively. p-values were calculated using Wilcoxon’s signed rank test. (A) Models using all 425 classic features; (B) Models using the top 30 features, based on Pearson correlation with CRISPR efficiency.

More »

Fig 3 Expand

Fig 4.

3D features rank highly compared to classic features.

Ranks of features used in the LASSO and xgboost models, averaged over the 1000 iterations; a higher rank indicates a more important feature. The 3D feature and classic features are marked with an orange/blue dot, respectively. (A) Number of times each feature was selected in the LASSO models. (B) Feature rank based on LASSO permutation importance. (C) Feature rank based on SHAP values of the xgboost models. (D) Feature rank based on xgboost permutation importance. (E) Feature rank based on xgboost “gain” importance.

More »

Fig 4 Expand

Fig 5.

3D features improve prediction in non-matching cell types.

(A) Correlations between 3D features, generated from Hi-C in T/HEK293/GM12878 cells, and CRISPR efficiency. (B-C) Blue/orange/green bars represent 3D features generated from Hi-C in T/HEK293/GM12878 cells, respectively. (B) Relative change in xgboost r2 before and after adding a 3D feature to the model. (C) Rank of the 3D features relative to the other features, based on their average SHAP rank in the 1000 xgboost iterations; a higher rank is better. The dashed line represents the theoretical rank of a feature ranked first in all 1000 models.

More »

Fig 5 Expand

Fig 6.

Hypotheses explaining the inverse relation between genomic 3D density and CRISPR efficiency.

Unsuitable target sites are colored in orange, and suitable, fully complementary target sites are colored in green. pH/pL is the probability of binding to the suitable target site in the high-density/low-density region, respectively. (A) A Cas9-sgRNA complex searching for a target site in a dense region (left) encounters many unsuitable target sites; thus, the probability of finding the suitable target site in the dense region is lower than in a low-density region (right), where significantly fewer potential target sites exist. (B) Target sites in high-density regions (left) are less accessible physically to the Cas9-sgRNA complex, relative to low-density regions (right), because of DNA packing in the region.

More »

Fig 6 Expand