Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity

The cystic fibrosis transmembrane conductance regulator (CFTR) is an epithelial chloride channel mutated in patients with cystic fibrosis (CF). The most prevalent CFTR mutation, ΔF508, blocks folding in the endoplasmic reticulum. Recent work has shown that some ΔF508-CFTR channel activity can be recovered by pharmaceutical modulators (“potentiators” and “correctors”), but ΔF508-CFTR can still be rapidly degraded via a lysosomal pathway involving the CFTR-associated ligand (CAL), which binds CFTR via a PDZ interaction domain. We present a study that goes from theory, to new structure-based computational design algorithms, to computational predictions, to biochemical testing and ultimately to epithelial-cell validation of novel, effective CAL PDZ inhibitors (called “stabilizers”) that rescue ΔF508-CFTR activity. To design the “stabilizers”, we extended our structural ensemble-based computational protein redesign algorithm to encompass protein-protein and protein-peptide interactions. The computational predictions achieved high accuracy: all of the top-predicted peptide inhibitors bound well to CAL. Furthermore, when compared to state-of-the-art CAL inhibitors, our design methodology achieved higher affinity and increased binding efficiency. The designed inhibitor with the highest affinity for CAL (kCAL01) binds six-fold more tightly than the previous best hexamer (iCAL35), and 170-fold more tightly than the CFTR C-terminus. We show that kCAL01 has physiological activity and can rescue chloride efflux in CF patient-derived airway epithelial cells. Since stabilizers address a different cellular CF defect from potentiators and correctors, our inhibitors provide an additional therapeutic pathway that can be used in conjunction with current methods.


S1 Extension of Provable Guarantees to Multiple Strands
The original proof of the K * algorithm showed that it could find an ε-approximation to the K * score (where ε is a user-defined parameter) [2]. In this context, a ε-approximation means (1 − ε)K * ≤K * ≤ 1 1−ε K * , whereK * is the computed K * score and K * is the true K * score. This proof in [2] relied on the assumption that the ligand was small enough that a complete partition function could be computed, which is generally true for enzyme active site designs but not for protein-peptide (PPI) or protein-protein designs. This assumption was used not only to prove the correctness of the intermutation pruning criterion, but also to compute an ε-approximation to the K * score.
Having provable guarantees on the algorithm output ensures that any incorrect predictions are due to the input model and not the search algorithm; this cannot be shown for heuristic methods. Also, having provable guarantees can make it easier to accurately include experimental data into the computational model.
The goal of the current paper is to expand K * to apply to protein-peptide and proteinprotein interactions. In these cases it is not guaranteed that either member of the bound complex will be small enough to compute the complete partition function. Thus, we extend the previous proofs to handle complexes where both partners have approximate partition functions.
Intermutation Pruning. The idea behind intermutation pruning is that it is possible to provably show that, in some cases, a K * score for a candidate sequence that is currently being computed will never be better than a K * score for a sequence that has already been found. This pruning step significantly reduces the number of K * scores that must be fully computed and increases the speed of the algorithm. The original proof can be found in [2]. Given that K * i ≥ γK * 0 , where K * i is the K * score of the current sequence, K * 0 is the best score observed so far, and γ is a user-specified parameter defining the number of top scoring sequences we want an ε-approximation for, there exists an intermutation pruning criteria for PPI designs. In the following lemma, n is the number of conformations in the search yet to be enumerated, k is the number of conformations that have been pruned from the search with DEE, E 0 is the lower energy bound on all pruned conformations, R is the universal gas constant, and T is the temperature. The full partition function for the proteinprotein complex, protein A, and protein B are q AB , q A , and q B respectively, while q * AB , q * A , and q * B denote the current calculated value of the partition functions during the computational search. Lemma 1. If the lower bound E t on the minimized energy of the (m + 1) th conformation returned by , then the partition function computation can be halted, with q * AB guaranteed to be an ε-approximation to the true partition function, q AB , for a candidate sequence whose score K Proof. Using the previous intramutation pruning methods [2], we can compute ε-approximations for the partition functions of each protein. We have: (where i denotes that the partition function is for the i th sequence) we have that: Next, by definition q = q * + q + p * where q is the partition function of the remaining conformations and p * is the partition function of the pruned conformations, note that: If the following condition holds then the search can be stopped and we have an ε-approximation to the partition function q AB (i): To show this use eqs. (S3) and (S4) to show that and by eq.
which by the definition of q implies This shows that when designing multiple strands there exists an intermutation pruning criterion that uses the stopping condition obtained from eq. (S5): Thus, if the stopping criterion is met then q * AB is an ε-approximation to q AB .
K * Score Approximation. The proof that an ε-approximation can be found for the K * score also requires that the ligand partition function be fully computed [2]. In PPI designs the ligand partition function will not be fully computed so the K * approximation will no longer be an ε-approximation but rather a σ = ε(2 − ε) approximation is obtained. Lemma 2. When amino acid substitutions (or flexible residues) are allowed on both strands in a computational design, the computed K * score is a σ-approximation to the actual K * score, where σ = ε(2 − ε).
Proof. The full K * score is denoted as K * = q AB q A q B and the computed K * score asK * = We can then boundK * as follows: Which shows that given ε-approximations for all of the partition functions, we have a σ = ε(2 − ε) approximation for the computed K * value.
The two lemmas above provide provable guarantees for the K * algorithm when allowing amino acid substitutions on multiple protein chains.

S2 Training of Energy Function Weights
To obtain accurate energetic predictions for the CAL-CFTR system, scaling parameters for the van der Waals, electrostatics, and solvation energy terms were determined. The best weights for each of these terms were found by training with 16 previously-determined experimental K i values for the CAL-CFTR system [1]. A gradient descent method was used to determine the optimal energy weights. The initial weights used for the search were vdW: 0.7, dielectric: 20, solvation: 0.7. For each iteration of the search one energy weight parameter was varied, and a K * score was computed for each sequence that had a known experimentallymeasured K i . The Pearson correlation of K * score vs. 1/K i was calculated and the weights with the best correlation were used for the next iteration. Each parameter was varied 8 times and the amount it varied was reduced on each iteration.
The best correlation found through the parameter search that maintains reasonable K * scores is shown in Fig. S1. The correlations over the entire parameter search space range from 0.0 to 0.75, which highlights the importance of choosing the correct weighting factors. The parameters chosen for the design runs are as follows: a van der Waals scaling of 0.9, a dielectric constant of 20, and a solvation scaling of 0.76. These parameters are reasonable and similar to parameters used in previous designs. Since the peptide design occurs at the surface of the protein, this necessitates the somewhat high dielectric constant.

S3 Recapitulation of CAL Motif at Positions 0 and -2 of the Design Peptide
Comparison of K * scores against the HumLib array data already suggests that K * is able to enrich for sequences that bind CAL. However, since the energy function was trained on peptide sequences that all matched the CAL binding motif, it is important to determine whether K * allows false positives at peptide positions 0 and -2. However, except for the HumLib array, no additional CAL binding data exists for non-motif residues at positions 0 and -2. Therefore to do additional computational tests we must make the assumption that if the peptide sequence matches the CAL motif it can bind CAL, and if it does not match the motif it does not bind CAL. The HumLib data generally support this assumption, although there are some peptide sequences that can bind CAL but do not match the motif (10 out of 5867 sequences). A K * design search was conducted where positions 0 and -2 were mutated to all amino acids (except Pro) while keeping positions -1 and -3 fixed to Arg and Val respectively. The resulting ROC curve (Fig. S2) has an AUC = 0.94 which shows that K * has the ability to recover the known CAL binding motif.
2. Georgiev I, Lilien RH, Donald BR (2008) The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem 29: 1527-1542.