pyProGA—A PyMOL plugin for protein residue network analysis

The field of protein residue network (PRN) research has brought several useful methods and techniques for structural analysis of proteins and protein complexes. Many of these are ripe and ready to be used by the proteomics community outside of the PRN specialists. In this paper we present software which collects an ensemble of (network) methods tailored towards the analysis of protein-protein interactions (PPI) and/or interactions of proteins with ligands of other type, e.g. nucleic acids, oligosaccharides etc. In parallel, we propose the use of the network differential analysis as a method to identify residues mediating key interactions between proteins. We use a model system, to show that in combination with other, already published methods, also included in pyProGA, it can be used to make such predictions. Such extended repertoire of methods allows to cross-check predictions with other methods as well, as we show here. In addition, the possibility to construct PRN models from various kinds of input is so far a unique asset of our code. One can use structural data as defined in PDB files and/or from data on residue pair interaction energies, either from force-field parameters or fragment molecular orbital (FMO) calculations. pyProGA is a free open-source software available from https://gitlab.com/Vlado_S/pyproga.


Figure S 1:
The TIM barrel protein as used in our study. In the examples we construct a PIE-PRN model based on FMO calculations at DFTB level with the 3ob-3-1 parameter set as defined in the Gamess program package. One vertex/node n the PIE-PRN corresponds to one fragment in the FMO calculation. The FMO calculations yielded PIE energies between fragments used for edge creation. We apply the criterion E tot ≤ −1kcal mol −1 to all types of bonds (incl. peptide) with the remaining settings kept at default. The sequence has the residue 17 missing, allowing us to formally treat the system as a dimer (complex) formed from the first barrel (orange colour) and the remainder of the sequence (green colour). This fact, as well as the fact that the string does have a beginning and end, does not allow us to call the structure symmetrical in a strict sense. However, for practical purpose we consider it to have a four-fold symmetry, as do the authors, who synthesized the molecule. The PyMOL object of the .pdb structure is called structure in pyProGA.

Figure S 2:
A graphical representation of the PIE-PRN in PyMOL created with pyProGA. The lines are a separate object 3d_PEPRN, here displayed together with the object structure. In this study we call the single helix (orange) monomer A and the rest (green) is monomer B. The graph of the PRN model of the dimer is labelled as G, and the monomer graphs as G A(B ) .

Figure S 3:
A 2d map of the E tot values for the fragment pairs that are connected by edges in G created in pyProGA. The map is clickable, enabling easy retrieval of node coordinates, name, and the value of the plotted quantity (here E tot ). The vertical and horizontal lines indicate monomer boundaries (their display is optional).
Customization and manipulation of the plot can be done in pyProGA via a standard Matplolib toolbar.  shows that certain residues contribute more to the binding of the two monomers. Compare to the results in Fig. S 5b. (a) Efficiency centrality C eff

Figure S 5:
Efficiency centrality (a) in the dimer graph G does indicate that Asp29 is important for the small-world nature of this PRN model. We can see the four-fold symmetry of the TIM barrel protein rather clearly by the way how the ranking cyclically repeats it self in the sequence. Along Asp29, also Asp75, Asp121 and Asp167 score highly (amongst other residues). The score of Asp167 is somewhat lower than that of the others, owing to the fact that it is closer to the end of the sequence and to the (artificial) break of the sequence at residue 17. Contrary to that, Asp29 has a somewhat higher score, as it mediates also the contact of monomer A to monomer B via a strong interaction with Lys2. However, such centrality ranking does not reveal much about the interaction of monomer A and B (e.g. we do not see Lys2 score highly here). On the other hand, the efficiency centrality difference (b) paints a much clearer picture of which residues from the protein-protein interface are most contributing to the global efficiency of the dimer. Here we can clearly see that Lys2 ranks high.   Fig. 6c is obvious for some fragments. A difference is seen for e.g. Asp3 that scores relatively highly in ∆C eff k but the actual change in E tot is not as prominent. The reasons can be in principle twofold; first, Asp3 forms rather few interactions with fragments in monomer B. This explains the low C E tot k (G) score. The reason why Ala3 is picked up by ∆C eff k is a bit more convoluted. Asp3 is in monomer A, which is not connected to monomer B via covalent bonds. Monomer A has only 16 fragments/residues. Hence, the efficiency centrality of these fragments in the dimer graph G will not be very high, see Fig. 5a. However, Asp3 is adjacent to Lys2 via the strong peptide bond. The importance of Lys2 for binding between A and B is confirmed by its high C E tot k (G) (interaction with Asp29 is −28.5 kcal mol −1 ) and ∆C eff k score. Asp3 is relatively central within monomer A, as it interacts strongly with Ala5 (−2.3 kcal mol −1 ) and Trp6 (−9.1 kcal mol −1 ) † . By the combination of these facts (adjacency to Lys2 and centrality in monomer A), Asp3 becomes more central for the "small-worldedness" of the whole system. Hence is scores relatively highly in ∆C eff k . † If we calculate C eff k in monomer A, we find that Asp3 is more central than Lys2.   The plot is clickable and can be manipulated (rotate, zoom, save, etc.) via a standard Matplotlib toolbar.

Figure S 11:
Louvain communities in the PIE-PRN model. The top part depicts colour coded membership of nodes to communities using the '2D Edge Map' feature of pyProGA. The colour palette is the same as used in the right bottom structure. This is the representation where all communities are coloured at once, and is the default view in the PyMOL window after the calculation if finished. Since it may be difficult to see the partition borders clearly, pyProGA has a feature facilitating the highlight each partition (community, as seen here, or cluster). These are the fourteen structures with communities are highlighted in orange.
(The edge attribute importance must be copied to attribute weight, as Gephi interprets weight as the strength of an edge, rather than the cost (which is another standard interpretation we comply with in pyProGA) in order to achieve a similar layout. The settings can be checked in the .gephi file.).
(a) Binding energies assigned to fragments in monomer A.
(b) Binding energies assigned to fragments in both monomers A and B.
(c) Binding energies assigned to fragments in monomer B.
(d) Bar plot of binding energies assigned to fragments in both monomers A and B.
(e) Histogram of binding energies assigned to fragments in both monomers A and B.

Figure S 13:
The binding energy between monomers A and B and different schemes of assignment to fragments. For the method and practical aspects of the technique see this D. Fedorov: manual for subsystem analysis.