Investigating Information Dynamics in Living Systems through the Structure and Function of Enzymes

Enzymes are proteins that accelerate intracellular chemical reactions often by factors of 105−1012s−1. We propose the structure and function of enzymes represent the thermodynamic expression of heritable information encoded in DNA with post-translational modifications that reflect intra- and extra-cellular environmental inputs. The 3 dimensional shape of the protein, determined by the genetically-specified amino acid sequence and post translational modifications, permits geometric interactions with substrate molecules traditionally described by the key-lock best fit model. Here we apply Kullback-Leibler (K-L) divergence as metric of this geometric “fit” and the information content of the interactions. When the K-L ‘distance’ between interspersed substrate pn and enzyme rn positions is minimized, the information state, reaction probability, and reaction rate are maximized. The latter obeys the Arrhenius equation, which we show can be derived from the geometrical principle of minimum K-L distance. The derivation is first limited to optimum substrate positions for fixed sets of enzyme positions. However, maximally improving the key/lock fit, called ‘induced fit,’ requires both sets of positions to be varied optimally. We demonstrate this permits and is maximally efficient if the key and lock particles pn, rn are quantum entangled because the level of entanglement obeys the same minimized value of the Kullback-Leibler distance that occurs when all pn ≈ rn. This implies interchanges pn ⇄ brn randomly taking place during a reaction successively improves key/lock fits, reducing the activation energy Ea and increasing the reaction rate k. Our results demonstrate the summation of heritable and environmental information that determines the enzyme spatial configuration, by decreasing the K-L divergence, is converted to thermodynamic work by reducing Ea and increasing k of intracellular reactions. Macroscopically, enzyme information increases the order in living systems, similar to the Maxwell demon gedanken, by selectively accelerating specific reaction thus generating both spatial and temporal concentration gradients.


Introduction
Living organisms, uniquely in nature, encode, propagate, and use information [1] to produce stable, highly-ordered structures that are also complex, dynamical, semi-open systems far from thermodynamic equilibrium. But, what is biological information and how is information used to maintain the ordered structure and function of a living system [2][3][4][5]? While it is apparent that information storage and process are fundamental characteristics of living systems, the principles governing information dynamics in biology remain unclear.
Enzymes are central to the function of living systems and facilitate the work necessary to maintain order [6]. Once synthesized as a string of amino acids specified by the nucleotide triplets in the gene, a protein is typically subjected to post-translational modification such as phosphorylation. Importantly, post translational modifications reflect temporally variations in the status of the cell (e.g. ATP concentrations [7]). Thus the 3 dimensional shape of the enzyme represents a summation of both heritable and current information within the cell. This composite information produces a 3 dimensional structure that is the low free-energy state for the amino acid sequence plus post-translational modifications. It will be seen that this minimum state represents, as well, one of minimum Kullback-Leibler divergence, i.e. maximal order, between substrate and enzyme codons. These effects result from a doubly-optimized lock and key interaction between substrate and enzyme codons.
By this effect, the enzymes are catalysts that do not alter the fundamental thermodynamics of the reaction, in the sense that the initial thermodynamic state of substrate and the final thermodynamic state of the products are not changed [8]. Because it acts as a catalyst, the enzyme is not consumed in the reaction so that its information content is applied repeatedly provided substrate is available and no additional post-translation modifications occur. Typically enzymes accelerate reactions, often by many orders of magnitude (Fig 1). Without them, many reactions-e.g., reactions to extract energy from substrate or synthesize cell componentswould be too slow to permit orderly function of living systems. We propose that this characteristic of enzymes permits investigation of the relation of information to thermodynamics and order through the concept of "activation energy." Finally, we note recent studies [8,9] have emphasized the dynamic nature of enzyme structure and the critical role of structural motion of the protein during catalysis. By integrating these dynamics into our model, we note that quantum effects may be observed.

Key-Lock dynamics
Enzymes are typically highly specific, decreasing the activation energy (E a ) (Figs 1 and 2) and increasing the reaction rate (k) only for a small number of substrate molecules [9]. This link between E a and k is typically described by the empirically derived Arrhenius equation (see below). The specific activity of the enzyme is often described as a "lock and key" [10] process in which some region of the folded protein provides a complementary geometric shape to that of the substrate [11,12] thus reducing the entropy of the interactions. We note that enthalpic interactions such as Coulomb interactions are also maximized as the distance between substrate and enzyme is decreased where, as noted in [12], "interactive enthalpy is estimated from the sum of electrostatic and van-der-Waals interactions." This permits binding that facilitates the reaction often through complex intermediate transitory steps.
Here we will focus on the spatial interactions between enzyme and substrate. We will view the catalyzed reaction as a single step of substrate ! products (Figs 1 and 2) omitting for simplicity the transient intermediate steps. We initially assume an enzyme density law r n = r(x n ), n = 1,. . .,N with the proteins fixed at molecular positions x n = nΔx, n = 1,. . .,N. For simplicity, a one dimensional case is temporarily assumed, and with constant position spacings Δx. These constraints will be relaxed in subsequent sections.
Let the substrate pathway positions obey an unknown density law p(x n ), n = 1,. . .,N on the pairs of substrate particles that ordinarily constitute reactant molecules. Let these reactant molecules interact, or 'bind,' with the enzyme molecules. This defines an enzyme-substrate complex.
It is shown (see Appendix) that this complex lowers the activation energy of the reaction. One of the most important ways that an enzyme catalyzes any given reaction is through entropy reduction: by bringing order to a disordered system. Thus, since entropy is a component of Gibbs free energy, this free energy is lowered as well. This in turn is a component of the activation energy E a which, as mentioned above, is likewise lowered. These factors work to increase the reaction rate. They also accelerate the reaction by providing a spatially specific charge distribution that form bonds with substrate to accelerate the reaction process. Enzymes also promote chemical reactions by bringing substrates together in an optimal orientation, lining up the atoms and bonds of one molecule with the atoms and bonds of the other molecule. A simplified model of a reaction with and without an enzyme. Substrate B is yields products C and D with a release of free energy ER. Although the overall reaction is thermodynamically favorable, there is an energy barrier (the activation energy [Ea]) that decreases the rate of the reaction (k). The enzyme, through a key-lock geometric binding with the substrate, has a net effect of reducing the Ea and accelerating the reaction. As described in the text, the information content of the enzyme is expressed geometrically by the formation of a shape within the protein that is precisely complementary to the shape of the substrate. The information is, thus, converted to energy by reducing Ea (ΔEa). This constitutes a lowering of local entropy, in particular the Kullback-Leibler or 'cross' entropy (as will be seen).
The initial interaction between enzyme and substrate is relatively weak, but these weak interactions rapidly induce conformational changes in the enzyme that strengthen binding [13]. These conformational changes are augmented by a 'key and lock' effect whereby the substrate 'key' molecule fits optimally close to the complementary 3 dimensional structure within the enzyme 'lock' particle. This 'key/lock' effect tends to maximize the reaction rate.
Initially assuming a well-mixed distribution of enzymes and substrate of equal concentration, we view the "lock" as constantly-spaced enzyme molecules of density profile r n = r(x n + Δx/2), x n = nΔx, Δx small. These molecules are located at positions (n + 1/2)Δx with density values r n . And by comparison, the substrate (or "key") molecules are particle pairs having a local density profile p n = p(x n ) at positions x n = nΔx. These are thereby located halfway between corresponding lock molecules r n . Each enzyme-substrate 'complex' locally lowers the activation energy of the Information in living systems manifest through "temporal gradients". Here the system contains initially two substrates and one enzyme. In the absence of the enzyme, reaction C ! G + H will proceed more rapidly because it has both lower final free energy and lower activation energy. However, the enzyme lowers the Ea for reaction B ! E + F. The information in the enzyme produces an observable gradient over time as the concentrations of E and F are increased and B is decreased when compared to an uncatalyzed system. In contrast, because of its specificity, the enzyme has no effect on the temporal evolution of the substrate and product concentrations of reaction C ! G + H. reaction so that overall activation energy is maximally lowered when all key particles are 'closest' geometrically to the corresponding lock particles. This is exemplified in Figs 1 and 2. Then, given a fixed enzyme path r n , the problem of minimizing activation energy becomes one of geometry. What substrate reactant path p n obeys minimal distance from the fixed enzyme path r(x n )?
Kullback-Leibler measure. We now need to choose a measure of the distance between the two density paths. From the preceding, this distance is to be a minimum. One useful measure is their Kullback-Leibler [14,15] 'divergence,' defined as Although H KL is not formally a 'distance' (since it is not symmetric in p and r) it has many properties of one and, for our purposes, is convenient to be regarded as such. It also obviously has the form of an 'entropy,' and so can be termed 'KL entropy'.
The KL distance between all enzymes of density r n = r(x n + Δx/2) and their corresponding substrate molecules of density p n = p(x n ) is to be minimized, obeying We are here analyzing a one-dimensional problem, i.e. where each x n and Δx is a scalar value. But this ignores the vital question of relative orientation of key and lock molecules. That is taken up at the end, and is an easy generalization of the one-dimensional approach.
This geometrical interleaving of the two types of molecule does represent a one-dimensional form of a key-lock geometry. However, specifically what density function p(x n ) should govern the reactant pathway?
Derivation of optimum reactant pathway p n . Regarding all enzyme and reactant molecules, this is assumed to obey principle (Eq 1a and Eq 1b). The reactant is also the substrate, so we are seeking the substrate density function p n that has minimum KL distance from the given enzyme pathway r n , n = 1,. . .,N. This is assumed to occur in the presence of the interlacing (x n , x n + Δx/2) of coordinate positions defined above, and also the known physical constraints of the problem. The main one is that of known mean energy.
We seek the pathway position law p n that obeys H KL (p||r) = min., in the presence of the arbitrary, but fixed, enzyme pathway r n . (Note: This temporarily ignores the more recently observed effect of "induced fit [16,17]," whereby the enzyme pathway changes as well to further improve the fit. This is addressed below. The two laws p n , r n of course obey normalization S n p n ¼ 1; S n r n ¼ 1; r n ¼ const:; n ¼ 1; . . . ; N: ð2Þ (All sums are over the entire pathways.). Assume, as well, a fixed, mean molecular bond energy S n PðE n ÞE n ¼ S n p n E n ¼ kT; with PðE n Þ ¼ p n ð3Þ by definition, κ Boltzmann's constant and T a fixed energy. Energies E n could, e.g., be due to hydrogen bonds. Also, Eq (3) assumes ergodicity to hold. That is, the true statistical average energy-the left-hand sum-equals the average energy along any one path-the second sum. We will use this ergodic property below. Net Optimization Problem. We therefore seek the reaction (or substrate) rate p n satisfying KL requirement Eq (1b) subject to four constraints Eqs (2) and (3) obeyed by p n and r n . By the method of undetermined multipliers, these satisfy the variational principle S n p n ln p n r n þ^1½S p n À 1 þ^2½S r n À 1 þ þ^3½S p n E n À kT ¼ min: Differentiating this @/@p n and equating it to zero gives as the condition for the constrained minimum 1 þ lnp n À lnr n þ^1 þ^3E n ¼ 0: Solving Eq (5) On this basis, for a given point n, the maximum probable local reaction rate p n p(x n ) is proportional to the neighboring (at positions x n ± Δx/2) densities r n of the enzyme. This makes sense since each enzyme is assumed to locally enhance the reaction, e.g. by strong hydrogen bonding, and this enhancement becomes stronger the geometrically closer the reactant is to the enzyme.
The rate p n of reaction in Eq (6) also falls off with the local molecular bonding energy E n . This also makes sense since the stronger the bond is the less probable it is that the molecule breaks up and contributes to the desired reactant.
Derivation for Multi-dimensional Geometry. For optimum key-lock fit, the two molecules must not only be optimally close but also each have a correct orientation. The approach to this problem requires a generalization to the use three-dimensional variables x n (x,y,z) n . Here p n = p(x,y,z) n , etc. for r n and with Δx!Δx = (Δx,Δy,Δz) n . Also, the Kullback-Leibler distance is of the same form Eq (1a) as before, S N n¼1 pðx n Þln pðx n Þ pðx n þ Dx=2Þ ¼ min: The identical algebra Eqs (3)-(6) follow as before, with boldface quantities replacing scalars, but with the scalar E n remaining in Eq (6) since energy is always a scalar quantity. However, an important new interpretation arises for the effect Δx ! 0. Acknowledging this to occur in three dimensions requires the key and lock to now approach one another while in the same orientation. This describes a true key-lock bond. Also, now the change of reactant path so as to reduce activation energy E a occurs in full three-dimensional space.
Note that principle Eq (7) is much more than simply a 3D version of principle (Eq 1a and Eq 1b). Consider the 3D tissue produced by multiply-folding a long string of nucleotides. From the form of Eq (7), the more regular the folding is, i.e. the more often p a given codon occurs at neighboring points x n , the closer to 1 will be the ratios in the logarithm ln operation in Eq (7). Therefore the smaller will be their contributions to Eq (7) after the ln is taken. Hence the smaller will be the minimum value of H KL . Tissue with such low cross-entropy has low free energy and a high state of order. This might account for the vital role played by protein folding in augmenting living systems [6]. In turn, this emphasizes that H KL has direct biological significance as a measure of cellular growth, despite being merely a geometrical measure.
Deriving the Arrhenius equation. The Arrhenius equation describes the dependence of reaction rates upon temperature and is empirically-derived. No enzymes are presumed present. Or equivalently, they are equally present at all reaction path positions [18]. Hence, we now repeat use of the minimum Kullback-Leibler principle in the special case where all enzyme densities obey Also, for simplicity we return to the one dimensional case of scalar coordinates x n . Recall that we used the 'ergodic hypothesis,' that the statistics of E at any position x n equals that of E over any one path x n , n = 1,. . .,N. On this basis, and using the last identity Eq (3), result Eq (6) is, in the special case Eq (8) We also found, at Eq (3), that the average < E > = kT. Using this in Eq (9) gives =^3 = K = 1/kT. Then the Boltzmann energy distribution law. At this point it is assumed that if the energy E n ! E a , a so-called 'activation' level of the energy, the reaction occurs at the position x n . But we also assumed ergodicity to hold. Therefore, the reaction occurs as often as event E n ! E a occurs for any one n. This shows that for any fixed energy density function p(E n ) the smaller E a is the more energy events E n occur or, equivalently, the higher is the reaction rate.
Also, ergodicity allows us to now drop subscript n in Eq (10). Then using Eq (10) for p(E) gives Since each energy value E satisfying Eq (11) gives rise to a reaction product, this shows that the reaction rate grows as the activation energy E a decreases.
But the analysis has ignored the fact that the molecules of the reacting medium may have a known prior probability A of being in the proper orientation to react. This probability should multiply result Eq (11).
The result is that the net probability density, or reaction rate, obeys The Arrhenius equation.
As we discussed, the optimum choice of enzyme path r n for accomplishing the desired reaction can occur along an altered reaction path x n requiring a lower activation energy E a . This is shown by Eq (12) in two ways:, First, the required energy values E can be smaller; and second, the resulting reaction rate k is higher. That E a is, in fact, a minimum is shown in the Appendix to follow from the H KL principle (Eq 1a and Eq 1b). Thus, the H KL principle derives both the well-known rate effect Eq (12) and the fact that activation energy E a tends to be a minimum value.

Optimization of reactant path by quantum entanglement
In the preceding, only densities p n were optimized for a fixed enzyme density path r n . However, further optimization can be made whereby the r n themselves are allowed to change so as to further improve the key/lock fit. This is called "conformer selection" or "induced fit." [18]. We propose two effects that potentially accomplishing this.
As noted above, enzyme function requires a tight geometric fit in which the atoms of the amino acids in to protein and the substrate molecules are separated by distances that are minimized. Suppose, as we found, their spacings Δx/2 are on the order of angstroms. At such molecular distances, quantum effects can enter in, e.g. in the form of quantum entanglement. This is even for semiclassical quantum effects [19] Other authors [20], in fact, define the degree of global entanglement between two systems p n , r n as the very value of H KL (p||r) for the p n , r n obeying KL principle (Eq 1a and Eq 1b). That is: The level of entanglement is defined by the minimized value of the Kullback-Leibler entropy, which was our very criterion (Eq 1a and Eq 1b) for the choice of the p n .
This also makes intuitive sense: By Eq (1a) 'distance' measure H KL (p||r) is mathematically at its absolute minimum value, of zero, when all p n = r n . This describes perfect entanglement between the the two systems p n , r n , so that interchanges of the roles played by enzymes and reactants repeatedly take place. By the same token, finite values, instead, of H KL (p||r) allow only certain pairs of the p n , r n to effectively interchange roles. It results, then, that over a number of such reactions the initial molecular reactant paths p(x n ), r(x n ), n = 1,. . .,N can progressively wander off to totally different ones which further upgrade the key/lock fit. These are also, in fact, energetically preferred since, by Eq (12), the progressively lowered threshold energy E a is more readily provided at each such entanglement.

Discussion
Here we investigate a mechanism by which living systems use information to maintain a low entropy state far from thermodynamic equilibrium. We propose that the information encoded in the inherited sequence of nucleotides in DNA is manifested geometrically in the 3 dimensional shape of an enzyme determined by the lowest free energy state of the amino acid sequence specified by the corresponding gene. However, we note that the 3 dimensional shape of the enzyme can be extensively altered by post-translation modified. Thus, the geometry of the enzyme represents a summation of heritable information represented by its amino acid sequence and temporally variable information regarding the state of the cell and its environment which govern post translation modification. Most simply, the information within the 3 geometry of the protein is manifested thermodynamically by the reduction in the activation energy (E a ) of the reaction catalyzed by the enzyme.
The mechanism by which information reduces the activation energy is geometric as, like a "lock and key", the shape of the enzyme precisely fits the shape of a substrate. We investigate these spatial interactions using the Kullback-Leibler distance, Eq (1a), which is a generalization of the Shannon mutual information. In fact in many textbooks the latter is derived as a special case of the former. We demonstrate that the information of the enzyme "lock" vis a vis the shape of the substrate "key" is the equivalent of the K-L distance. Maximum information corresponds to a minimal K-L distance and, thus, the largest possible decrease in the E a .
The observable effect of the enzyme-induced decrease in E a is an increase in the reaction rate k, often by several orders of magnitude. This is quantified by the empirically-derived Arrhenius equation. Here we demonstrate that the Arrhenius equation can be derived from a first principle that requires minimum Kullback-Leibler divergence, (Eq 1a and Eq 1b), between a fixed enzyme density function and an unknown reactant function.
Here we also investigate the more recently proposed "induced fit" model in which the enzyme geometry changes in response to the substrate thus further improving the geometric match. Interestingly, we find that the induced fit dynamic will occur over very small molecular distances Δx, which will potentially permit quantum entanglement effects. In particular, we find for small Δx the minimized KL entropy becomes proportional to the degree of quantum entanglement of path functions p n , r n . This extends prior studies suggesting quantum effects in proteins including enzymes [21][22][23].
Our investigation also provides general insights into the dynamics of biological information. Although it is clear that information must play a central role in the growth of living systems, the general principles that govern translation of information into biological order and function are not well defined [24]. We note that an enzyme can alter the living system in ways similar to the classic Maxwell's demon gedanken [25,26]. For example, a protein within a membrane can use its information (expressed as its 3 dimensional shape) to select and bind a substrate on one side of the membrane and move it into the adjacent cellular compartment [27] thus creating a spatial concentration gradient similar to the classic thought experiment [28]. However, unlike the iconic demon, enzymes can also generate a gradient over time [29]. That is, by greatly accelerating the rate of reaction, the concentration of substrate and products over time will be larger and smaller respectively when an enzyme is present compared to a system in which the information content of the enzyme is absent.
Finally, we note that biological information in our study is highly contextual. This is apparent, in Figs 1 and 2, as an enzyme-dependent quantitative change in activation energy E a is dependent on both the properties of the enzyme and the properties of the substrate. Thus, in Fig 2, addition of an enzyme that is specific to the AB reaction, but not the AC reaction, lowers E a for the AB reaction relative to that for the AC. As a result the energy E of system AB will much more often obey E E a and, hence, occur much more often than the reaction AC. The information in the enzyme can, thus, be viewed as "kinetic" in reaction AB and only "potential" in the absence of the substrate. Restating this quantitatively, the information of an enzyme is defined by the KL divergence between the enzyme and a potential reactant. Further, the level of this information in each biological enzyme is converted to a thermodynamic property by the change in E a that it evokes. Thus, the information may be either 'potential' or 'kinetic,' depending on context. The kinetic information represents the increased probability of a reaction and decreased E a , when substrate to which it can bind is presence according to principle (Eq 1a and Eq 1b). By contrast, the same enzyme but in the presence of substrate with which it cannot react (or in the absence of substrate) carries only potential information. It is interesting that such contextual dependence is lacking in, e.g., the pure Shannon entropy [30] measure H S = − R p(x)lnp(x). The algebraic difference is that the KL information is of p in the presence of context r whereas the Shannon H S is in p by itself, in the absence of any context r. In summary, it is the contextual dependence of the KL information that provides its biological significance and gives rise to its function.
const.) Eq (8) to the Arrhenius equation. Then the principle Eq (1b) becomes where by Eq (10) Expanding the ln in principle Eq (A1) gives directly Using expression Eq (A2) for P(E) in Eq (A3), and the normalization of P(E), give Why is E a the lower integration limit? Since our aim centers on the value of rate k we only integrate over those values of E that can contribute to k, and by Eq (12) this is the value E a .
Dividing through Eq (A4) by kT and doing the integrations gives a condition To attain the required minimum in H KL through choice of E a requires setting @y @E a ¼ 0: Differentiating Eq (A5) in this way gives a requirement This is accomplished by either E a = 0 or E a = 1. From the result Eq (12) for the reaction rate k it is obvious that these activation energy values respectively maximize, or minimize, the rate k. Of course the case E a = 0 is preferred on the basis of maximum reaction rate. However, our aim here is to show that this activation energy also follows from our overall principle (Eq 1a and Eq 1b) that H KL = min. Since Eq (A5) gives H KL (proportional to y) we can use it to judge if the usual requirement for attaining a minimum is satisfied, namely that the second derivative @ 2 y=@E a 2 > 0: Taking this second derivative gives the anticipated result Hence the case E a = 0 both maximizes the reaction rate k and minimizes H KL as required. By Eq (A5) zero activation energy gives a minimum H KL of value Of course attaining activation energy E a = 0 is not a usual case, but the analysis shows that the closer the system gets to achieving it the higher the reaction rate is, and the smaller the KL distance is between enzyme and substrate, i.e. the better does the key fit into the lock.