Size and structure of the sequence space of repeat proteins
Fig 5
Interactions within and between repeats sculpt a rugged energy landscape with many local minima.
Local minima were obtained by performing a zero-temperature Monte-Carlo simulation with the energy function in Eq (2), starting from initial conditions corresponding to naturally occurring sequences of pairs of consecutive ANK repeats. A, bottom) Rank-frequency plot of basin sizes, where basins are defined by the set of sequences falling into a particular minimum. A, top) energy of local minima vs the size-rank of their basin, showing that larger basins often also have lowest energy. Gray line indicates the energy of the consensus sequence, for comparison. B) Pairwise distance between the minima with the largest basins (comprising 90% of natural sequences), organised by hierarchical clustering. The panel right above the matrix shows the size of the basins relative to the minima corresponding to the entries of the distance matrix. A clear block structure emerges, separating different groups of basins with distinct sequences. C-D) Same as A) and B) but for single repeats. Since single repeats are shorter than pairs (length L instead of 2L), they have fewer local energy minima, yet still show a rich multi-basin structure. Equivalent analyses for LRR and TPR are shown in S3 and S4 Figs.