Combinatorial Modeling of Chromatin Features Quantitatively Predicts DNA Replication Timing in Drosophila
Figure 2
Feature importance analysis and simplified models.
(A) Values of the model coefficients along the λ-path, i.e. the sequence of values of the regularization parameter λ used to fit the model. The λ-path is truncated at the value of λ used for model predictions. Line thickness is proportional to the total number of models in which a non-zero coefficient is assigned to the corresponding feature. The vertical dashed line denotes the value of λ yielding the selected simplified model solely based on the four indicated terms. (B) Scatter plot of model features according to their z-scores and bootstrap-Lasso selection probabilities (p). Features with are colored in red (positive coefficient values) or blue (negative coefficient values) and their coefficient distributions are shown on the right as violin plots. Features are ranked by decreasing selection probabilities. (C) Boxplot of prediction accuracies (PCC on test sets) of 100 Lasso models where the indicated feature was excluded from the model fit. Rrp6 was used as control, as stability analysis indicated no significant role for this feature in predicting replication timing. p-values were obtained using a two-sided Wilcoxon rank sum test. (D) Frequency of appearance of chromatin features in four-features simplified models as a function of their model accuracy with respect to the full model. Only simplified models reaching at least 60% of the full model accuracy are shown.