Chromatin Computation

In living cells, DNA is packaged along with protein and RNA into chromatin. Chemical modifications to nucleotides and histone proteins are added, removed and recognized by multi-functional molecular complexes. Here I define a new computational model, in which chromatin modifications are information units that can be written onto a one-dimensional string of nucleosomes, analogous to the symbols written onto cells of a Turing machine tape, and chromatin-modifying complexes are modeled as read-write rules that operate on a finite set of adjacent nucleosomes. I illustrate the use of this “chromatin computer” to solve an instance of the Hamiltonian path problem. I prove that chromatin computers are computationally universal – and therefore more powerful than the logic circuits often used to model transcription factor control of gene expression. Features of biological chromatin provide a rich instruction set for efficient computation of nontrivial algorithms in biological time scales. Modeling chromatin as a computer shifts how we think about chromatin function, suggests new approaches to medical intervention, and lays the groundwork for the engineering of a new class of biological computing machines.

removed (overwritten with the "blank" character B), and the P in the third position should be changed to an S. In the third nucleosome, the blanks in positions 1 and 2 should remain, and the third position should be changed to a T. The left hand side of this rule can match different -mers (sets of 3 adjacent nucleosomes) because of the two wild card symbols. The right hand side of this rule could also be written QR-BBS BBT.

Note on nondeterminism
A chromatin computer is in general nondeterministic because a rule may match at multiple locations, and because more than one rule could match at a given location. A deterministic chromatin computer has at most one rule that matches anywhere along the chromatin at each time step. The proof of Turing completeness constructs a reversible mapping from a deterministic Turing machine to a chromatin computer, which is therefore also deterministic.

Examples of biological read/write complexes
To show that real biological chromatin-modifying complexes are capable of carrying out interesting computations, we can ask whether there are many that have multiple read/write functions. While this is a field of active research, there are indeed many known examples. In the Table below, I extract 39 examples from PINdb, a database of nuclear protein complexes [1]. For each, I list the PINdb name for the complex, along with the protein components that are erasers (which can be thought of as writers of a "blank" mark), writers or readers. Many of the proteins have multiple read/write domains, which would increase the valency of the components in which it operates. This table illustrates the point that a given protein may participate in several different complexes; it is likely that the combinatorics of protein inclusion in these complexes means that while I've listed 39 complexes here, there could be hundreds of complexes with at least two read or write components, and possibly far more. Not listed here are the scaffolding or connector proteins that hook the effector proteins together; there are many of these as well.

A lower bound on the size of the human chromatin computer
How much chromatin memory is there in a human cell?
There are approximately 3,000,000,000 base pairs in the human genome [3]. There are 147 nucleotides wrapped around one nucleosome [4], and the linker region is around 10 additional nucleotides [5]. Davey et al give the average length between nucleosomes as 157-240 base pairs [6]. With some of the genome presumably nucleosome-free, I'll take 300 base pairs as a reasonable upper estimate for the length of DNA covered by a nucleosome, on average.
In Figure 2 of their review article, Zamudio and colleagues list 32 histone modification sites across H2A, H2B, H3 and H4 [7]. Each nucleosome contains two copies of each histone; therefore, there are 64 modification sites. I will assume that each position can have only one modification, although we know that some positions can be marked with one of several marks (such as one methyl group, two methyl groups, three methyl groups, acetylation, etc.).
Putting these numbers together, I arrive at 80 megabytes as a lower estimate of the amount of writable memory in human chromatin.

Chromatin computer solution to Hamiltonian Path Problem
Just as Adleman generated many pieces of DNA, each representing a path through the graph, we start with many pieces of chromatin, each of which will represent a path through the graph. (We could also put all these lengths on a single stretch of chromatin, and separate them with insulator nucleosomes with marks that do not ever change.) Each chromatin tape has seven nucleosomes, each with six positions.

Input chromatin tape
The initial configuration of each piece of chromatin is as follows:

BBBBBB BBBBBB BBBBBB BBBBBB BBBBBB BBBBBB
Each nucleosome represents one vertex in the path. The first position at each nucleosome indicates which vertex is at that point in the path; it is either blank or a digit from 0 to 6. The remaining 5 positions will be used to check whether vertices 1 through 5 are visited exactly once, and take one of the values {B, 0, 1, F}. A "0" indicates that the corresponding vertex has not been seen in the path so far; a "1" indicates that it has been seen once; and an "F" indicates that it has been seen more than once. The 7 th nucleosome of a valid path should have the configuration 611111, because the path should end at vertex 6 and each of the vertices 1 through 5 should have be seen exactly once.

Rules
Fourteen rules implement edge traversal from one vertex to another. For example, the following rule allows traversal of the edge from vertex 2 to vertex 3: 2***** B***** ------3-----Applying these rules to the 7-nucleosome initial configuration will result in a random path of up to length 7 through the graph. Some paths will be shorter than 7, and these we will not consider further as they will not lead to a solution. These path-constructing rules are analogous to the ligation in Adleman's solution.
Adleman solved the problem of checking that every node had been visited by doing affinity purification to check for the presence of each vertex. We can perform this check directly in the CC program. The following rules check that we have one and only one instance of vertex 1 in the path, and propagate the necessary information from left to right along the nucleosomes representing the path. The "F" (for "Fail") in the one rule below indicates that too many ones have been visited.
A nucleosome with the marks 611111 indicates the existence of a correct path. The following rule indicates success: To read out the path after computation, read the nucleosome tape, looking at the first positions of each nucleosome. As described in the paper, we can augment the definition of the chromatin computer to allow output signaling (like gene expression) to indicate that the computation can halt because the current path is a valid Hamiltonian path, or else it has been found to be invalid. This can be used to bring computation to a halt. In the simulator, we assume that any rule with an "S" or an "F" in the right hand side brings computation to a halt. The simulator also halts, of course, if there is no applicable rule that can operate anywhere on the chromatin.
If a chromatin tape has an invalid path visiting a path more than once, then it will contain a nucleosome with the F symbol. Computation halts if no more rules apply (this happens when vertex 6 is reached because there are no further edges that can be traversed), or if a halt rule is encountered.
Note that an alternate way to solve this problem would be to add rules that erase the tape back to the first nucleosome if an invalid path is found, to allow reuse of the same chromatin tape. However, the solution presented here is simpler (in terms of the number of rules), and takes advantage of parallelism if it were available.

Simulation: finding a Hamiltonian path
Here I give an example of a successful sequence of application of the rules to find a Hamiltonian path. This is output from the perl script in "verbose" mode. (The actual run of the simulator used an input tape that had the insulator nucleosome IIIIII on the right hand side of the nucleosomes displayed below, which I've removed for readability.) The first line shows the initial configuration of the chromatin tape. The second line shows the "read" specification or left hand side of the first rule to be applied, and the third line shows the "write" specification or right hand side. The rule looks for a 0 in the first position of a nucleosome, and a B in the first position of the next nucleosome. It then writes a 1 at the first position of the second nucleosome. This implements the traversal of the edge from vertex 0 to vertex 1. The next rule writes the number of times vertex 1 has been visited, when we are at the second vertex in the path. (The answer is 1.) The rule after that records the fact that vertex 5 has been visited 0 times at that point in the path. We then continue on, with a matching rule randomly selected, building the path and checking for repeated vertices. The final rule applied writes the "S" symbols that trigger a halt in the simulation.

Simulation: flagging a path with repeated vertices
In this simulation, the sequence of applied rules leads to a failure state

Repeated simulation until success
We run the simulator many times, until we achieve success. Each row belowshows the chromatin tape at the time that the computer halted due to either achieving a success state, a fail state (repeated vertices on the path), or no more rules matched (got to vertex 6 in the graph). The last chromatin configuration is the one that achieves success, showing that the correct order for visiting the nodes is 0,1,2,3,4,5,6.

Additional rules to find the Hamiltonian path in a single run
It is possible to solve the Hamiltonian path problem with a single stretch of 8 nucleosomes by adding rules to reset the chromatin state if we explore a path that repeats a vertex or gets to the finish too early. The starting state for this rule set has an insulator sequence at the right edge.