Infer global, predict local: Quantity-relevance trade-off in protein fitness predictions from sequence data
Fig 5
The bias factor J0 depends on the model expressivity.
A Scaling correlation between predictive performance ρ and J0D + σ2 for the RNA-bind protein, modeled with the Sparse Potts Model with different numbers K of couplings. N is the length of the protein (82 sites). B: values of the bias factor J0 as a function of the number of modelled couplings in the Sparse Potts Model for the RNA-bind protein. C: same as B for the seven protein families combined; the black line and the blue area represent the mean and the standard deviation over the seven protein families. D Relation between bias factor J0(K) and improvement at best cutoff Δρ(dopt) for the RNA-bind protein. E same of D for the seven families combined. Values of K range from K = 0 to K = N. Each color corresponds to a different protein family as reported in the legend.