^{*}

CC and IB conceived and designed the experiments, performed the experiments, analyzed the data, and wrote the paper.

The authors have declared that no competing interests exist.

Elastic network (EN) models have been widely used in recent years for describing protein dynamics, based on the premise that the motions naturally accessible to native structures are relevant to biological function. We posit that equilibrium motions also determine communication mechanisms inherent to the network architecture. To this end, we explore the stochastics of a discrete-time, discrete-state Markov process of information transfer across the network of residues. We measure the communication abilities of residue pairs in terms of hit and commute times, i.e., the number of steps it takes on an average to send and receive signals. Functionally active residues are found to possess enhanced communication propensities, evidenced by their short hit times. Furthermore, secondary structural elements emerge as efficient mediators of communication. The present findings provide us with insights on the topological basis of communication in proteins and design principles for efficient signal transduction. While hit/commute times are information-theoretic concepts, a central contribution of this work is to rigorously show that they have physical origins directly relevant to the equilibrium fluctuations of residues predicted by EN models.

Proteins function neither as static entities nor in isolation, under physiological conditions. They are instead subject to constant motions and interactions, both within and between molecules. These motions can be either random fluctuations or concerted functional changes in conformations; and their sizes can vary from localized motions (e.g., single amino acid side chain reorientations) to large-scale global motions (e.g., domain–domain or intersubunit movements). While motions in the nanoseconds regime can be explored by full atomic simulations, understanding those involving large-scale structural rearrangements remains a challenge. In recent years, elastic network (EN) models in conjunction with modal analysis, and in particular the Gaussian Network Model (GNM) [

We posit that these collective motions also determine communication patterns that are inherent to the native architecture. To explore the validity and implications of this concept, we assume a discrete-time, discrete-state Markov process [_{i}_{j}

A major goal in this study is to relate the hitting (and commute) times derived from the Markovian stochastics model to the equilibrium fluctuations (mean-square fluctuations and cross-correlations) of residues predicted by EN models, thus bridging the gap between two disciplines, information theory and statistical mechanics. To this end, using the theory of generalized matrix inverses [

The paper is organized as follows. The Results are divided into three parts: first we present the Markovian stochastic model of information diffusion developed for exploring the inter-residue communication in proteins. The process is controlled by transition probabilities for the passage/flow of information across the nodes, which in turn is based on the internode affinities derived from atom–atom contacts in the folded structures. Second, we describe the evaluation of hit and commute times, and illustrate these concepts by presenting the application of the methodology to five different enzymes. Strikingly, active residues are distinguished by their effective communication stochastics. Third, we present a rigorous derivation of the mathematical relation (

The protein structure is modeled as a network of _{i}_{ij}_{i}_{j}_{ij}_{i}_{j}_{c} = 4 Å and (_{i}_{j}_{i}_{j}

The affinities provide a measure of the local interaction density _{j}_{j}_{j} =_{ij}_{ij}_{j}

A _{i}_{j}_{j}_{ij}

Suppose the probability of initiating the Markov propagation process at node _{j}_{i}_{ij}p_{j}_{1}, v_{2},...,v_{n}_{1}(_{n}

Assume there is a path connecting every pair of residues in the network. Then, as the number of steps _{1},_{2}_{n}

In the continuous time limit [_{ij}π_{j} _{=} m_{ji}π_{i}

The hitting time _{i}_{j}_{i}_{j}

The calculation of _{i}_{j}_{i}_{j}_{i}_{k}_{k}_{j}_{k}_{j}_{i}_{j}

The commute time is defined by the sum of the hitting times in both directions, i.e.,

Note that the commute time is symmetric by definition while

In the calculations below, it proves convenient to define the average hitting times in both directions, as well as the average commute time, for each individual residue as

<_{r}_{b}

_{i,} v_{j}^{th} row indicates the number of steps required for a signal to hit residue _{j}_{i}_{b}_{r}

(A) Hitting time

(B) Shows commute time

(C) Displays the average hitting times evaluated from (A). All three catalytic residues (blue dots) exhibit short hitting times.

The higher ability of particular residues to transduce signals is also reflected in the commute times displayed in

_{r}

It is of interest to examine the signal transduction properties of catalytic residues. Phospholipase A2 has three catalytic residues: His48, Tyr52, and Asp99. Notably, all three residues (indicated by blue dots) are found to be located in minima (

To additionally highlight the enhanced communication properties of the catalytic residues, we plot in _{r}_{r}

Catalytic residues (red crosses) are fast and precise, being located at the lower left end of the plot. Ligand-binding residues are indicated by black +.

(A) HIV-1 protease (1a30, [

Consider the hitting time to the ^{th} residue _{n}_{i}^{th} row and column are deleted to obtain

Here, ^{th} row of the hitting time matrix

As derived in Methods, _{i}_{j}

Substitution of ^{−1},
_{i}_{j}

The hitting time expression _{k}

Decomposing the hitting time

The two-body term may be positive or negative, depending on the type of cross-correlations between residues _{i}_{j}.^{−1}]_{ji}

The qualitative features observed here were verified to be valid for all examined proteins: mainly, the mean-square fluctuations of the destination node play a dominant role in determining the hitting (or commute) time, and the cross-correlations between the two end points may increase or decrease the hit/commute time, depending on the type of correlation. Anticorrelations have a retarding effect, while positive correlations reduce the hitting time. In the extreme case of the two nodes moving in phase, by the same amplitude, the effective hit/commute time approaches zero.

The commute times provide us with a means of estimating effective communication distances _{i}_{j}_{i}_{j.}

_{i}_{j}

(A) Comparison of efficient communication distances (ordinate) and physical distances (abscissa) for all residue pairs in phospholipase A2. The points colored red refer to pairs involving the catalytic residue His48. (B) and (C) illustrate the differences in communication times, for residue pairs separated by similar distances, and the opposite situation of comparable communication times despite significant differences in inter-residue distances, (D) and (E). See text for more details.

The comparison of the effective and actual (physical) communication distances in

Probability distribution of hitting times

The ribbon diagrams are colored by the secondary structure, namely helices (red), strands (blue), and coils/disordered regions (white). For each enzyme, the probability distribution of hitting times

As noted above, the mean-square fluctuations of the destination node play a dominant role in determining the hitting (or commute) time. The higher communication propensity of

Effective (ordinate) and physical (abscissa) distances between residues in the CORE, LID, and AMPbd domains (see inset), grouped as intradomain and interdomain distances and shown in different colors for each group. Note that communication between residues in the same domain is more efficient than that between residues in two different domains. This is evidenced by the longer commute distance corresponding to interdomain pairing for a given physical distance, compared with that of intradomain pairs. The inset gives a schematic overview of the distance distributions for intradomain and interdomain pairings.

Methods based on network models significantly helped in recent years in providing a comprehensible description of the dynamics of biomolecular systems. On the one hand, methods based on fundamental statistical mechanical principles have been proposed for delineating the collective motions of biomolecules [

The present study offers a rigorous way of connecting the two approaches, by demonstrating that the commute times between residues _{i}_{j}

Notably, the application to example enzymes point to the more efficient communication propensity and precision of catalytic sites (

The major advantage of the present stochastic model over the GNM is the fact that the new methodology lends itself to a comprehensive assessment of the communication paths and their efficiency in biomolecular structures. As such, it holds promise for identifying allosteric communication pathways as well as the sites distinguished by high allosteric potentials, thus providing insights into the design principles of biomolecular machines. The presently observed enhancement in the information transfer properties of catalytic residues and secondary structural elements suggests possible design requirements for efficient enzymatic activity. In this context, it is worth noting the relevant studies by Choe and Sun [

We note that finding suitable experimental setup for probing hit-times is a challenge. In general, the residues/interactions involved in information flow, or the changes in inter-residue distances (which directly define the commute times) may be assessed by site-directed mutagenesis and cross-linking experiments as well as spectroscopic methods such as site-directed fluorescence labeling [

Finally, establishing the bridge between these two disciplines will permit us to translate the wealth of concepts and methods developed in information-theoretic approaches, to exploring the signal transduction mechanisms in complex biomolecular systems, thus complementing physically inspired models and methods.

Let Δ_{i}_{j}_{i}_{j}_{B}^{th} element of the inverse Kirchhoff matrix ^{−1}. The individual residue mean-square (ms) fluctuations are obtained by substituting

The mean-square fluctuations _{ij}_{j}_{i}^{−1} as

Consider a small undirected network of three nodes connected as in
_{ij}_{jk}_{i}_{k}_{j}_{ji}_{jk}_{ij}_{kj}

Simultaneous solution of

Enumerating

Given the symmetry in the network here, the hitting time

Enumerating

The discussion below borrows from results in [^{−1} is a three-step process: (i) put together a matrix ^{T}^{−1}.

The vectors ^{T}^{T}^{T}^{th} residue _{n}_{i}_{j}_{i}

We gratefully acknowledge the contribution of Sinem Ozel to the results on adenylate kinase.

elastic network

Gaussian Network Model