An associative memory Hamiltonian model for DNA and nucleosomes

A model for DNA and nucleosomes is introduced with the goal of studying chromosomes from a single base level all the way to higher-order chromatin structures. This model, dubbed the Widely Editable Chromatin Model (WEChroM), reproduces the complex mechanics of the double helix including its bending persistence length and twisting persistence length, and the temperature dependence of the former. The WEChroM Hamiltonian is composed of chain connectivity, steric interactions, and associative memory terms representing all remaining interactions leading to the structure, dynamics, and mechanical characteristics of the B-DNA. Several applications of this model are discussed to demonstrate its applicability. WEChroM is used to investigate the behavior of circular DNA in the presence of positive and negative supercoiling. We show that it recapitulates the formation of plectonemes and of structural defects that relax mechanical stress. The model spontaneously manifests an asymmetric behavior with respect to positive or negative supercoiling, similar to what was previously observed in experiments. Additionally, we show that the associative memory Hamiltonian is also capable of reproducing the free energy of partial DNA unwrapping from nucleosomes. WEChroM is designed to emulate the continuously variable mechanical properties of the 10nm fiber and, by virtue of its simplicity, is ready to be scaled up to molecular systems large enough to investigate the structural ensembles of genes. WEChroM is implemented in the OpenMM simulation toolkits and is freely available for public use.


Introduction
Deoxyribonucleic acid (DNA) has been studied for decades since the fundamental discovery made by Watson, Crick, Wilkins, and Franklin [1][2][3] The repeating units of the DNA molecule, nucleotides, are composed of phosphate, sugar, and a base group; a sequence of covalently bonded nucleotides form single-stranded DNA. The double-helix structure of DNA, formed by two strands of DNA, is maintained by base pairing, planar base stack interactions, and electrostatic interactions [4]. In eukaryotes, DNA is then wrapped around histones to form nucleosomes, which serve as the basic structural unit of DNA packaging. Each nucleosome contains about 147 base pairs of DNA wrapped 1.7 times around a histone octamer. Recent evidence shows that the structure of nucleosomes is very dynamic and irregular [5]. Outer stretches of nucleosomal DNA are observed to spontaneously unwrap and rewrap to the core octamer [6], which can result in DNA strands entering and exiting the nucleosome at flexible angles, and this ultimately has a significant impact on the three-dimensional packing of chromatin. However, we still lack a sufficient understanding of DNA and nucleosome mechanics to ascertain how these features combine with other factors to determine the structural and mechanical properties of higher-order structures.
Numerous theoretical and computational approaches have already been explored to model DNA and chromatin fibers, greatly improving our understanding but still leaving many questions open. This work aims at integrating previous efforts by proposing a novel theoretical approach that focuses on the mechanics of DNA and nucleosomes, yet simple enough to scale up to larger scales.
Detailed, all-atom (AA) simulations have been extensively and successfully used to gain insights into DNA dynamics [7], flexibility [8], and other properties at a fine molecular scale. Fully atomistic models, however, cannot simulate DNA molecules at the level of nucleosome organization or chromatin fiber due to computational limitations, at least in the foreseeable future. Additionally, for very large molecular assemblies, such as even a short segment of chromatinized DNA, the accuracy of AA force fields remains an issue. This is particularly problematic in systems like DNA and histones in which ions play a determinant role. Coarse-grained (CG) models  for DNA have also proven very successful.
CG approaches reach the time and length scales that are inaccessible to AA simulations. In bottom-up CG models [9][10][11][12][13][14][15], effective interactions are determined to eliminate some degrees of freedom from reference atomistic simulations. As a result, CG force fields' ability to reproduce the system's behavior depends on the accuracy of the underlying AA force field, from which it inherits limitations in predictive capabilities, although reducing the computational cost. Top-down CG empirical energy functions [16][17][18][19][20][21][22][23][24][25][26][27][28][29] are instead tuned to match the specific properties of the system as observed through experiments, such as the polymer's flexibility or melting temperature, thus avoiding the pitfalls of atomistic simulations altogether at the expense of universality CG DNA models have been developed using use a variable number of sites to represent each nucleotide: from two [16][17][18][19], three [20][21][22][23][24], six [25,26], up to eight [27]. The oxDNA model [16,17] represents each nucleotide as a rigid body with 2 interacting sites and effectively capture basic structural properties of DNA such as pitch together with mechanical properties as the persistence length. The oxDNA has been successfully used to characterize DNA origami nanostructures [30]. The model by Drukker et al. [18,19] has similar resolution but focuses on the melting behavior and denaturation in double-stranded sequences. The 3-site-per-nucleotide (3SPN.0 [20]/1 [21]/2 [22]/2C [23]) model was developed by de Pablo's group to reproduce the persistence length for both single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA), melting temperatures, and hybridization rate constants. The BioModi Model [24] also chose a 3-site-per-nucleotide representation focusing on modeling the thermodynamics and kinetics of DNA self-assembly. The SIRAH model [25] uses 6 beads per base with explicit solvents and allow modeling interaction of DNA with lipids and proteins. The HiRE-DNA model [27] was extended from its RNA counterpart focusing on DNA self-assembly.
Coarser models are necessary to perform simulations of the chromatin architecture on the size of tens of thousands, even millions of bp [31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50]. A mesoscale model developed by Schlick and coworkers [31,32] employed a rigid nucleosome core particle, flexible histone tails, and 9-bp-per-particle worm-like-chain linker DNA. The model has been used to investigate the relationship between chromatin organization and structural factors, such as nucleosome repeat length and linker histones [33][34][35][36][37]. The 1CPN model proposed by Lequieu et al. [38] uses a rigid nucleosome particle with a twistable wormlike chain representing linker DNA at the resolution of 3 bp per particle. Other coarse-grained models with nucleosome and subnucleosome resolution have also been developed in the past few years to connect the molecular and physicochemical data of nucleosomes with the mesoscale characteristics of chromatin, including the "beads-on-a-string" models from Wedemann's group [39][40][41][42][43] and Bajpai et al. [44], the multiscale model from Farr et al. [45], the rigid base-pair model from the van Noort group [46,47], the mesoscopic model from Zhurkin and Norouzi [48,49], and the multi-modal model from Orozco group [50,51]. Most of these mesoscopic models use one interaction site to represent a variable number of nucleotides, relinquishing from explicitly describing of the double helix, which is instead a fundamental determinant of the mechanical properties of DNA, and in particular its supercoiling behavior. As a result, a gap persists in understanding how the mechanical properties of the molecular components of chromatin, DNA, nucleosomes, and DNA-binding proteins affect higher-order chromatin structures.
Traditional CG models involve two types of interactions: bonded potentials among neighbors and non-bonded potentials among any pair of particles. The bonded interactions typically entail the 2-body bonding term, 3-body angle or bending term, 4-body aligning/twisting/dihedral terms, and 4-body to many-body terms to capture base stacking effects. Non-bonded interactions include excluded-volume potentials and electrostatic potentials where the solvent is typically treated implicitly. While following a top-down philosophy, traditional CG models aim to reproduce the physicochemical nature of DNA using interaction potentials motivated by basic chemistry; however, such ambitious effort is often frustrated by a difficult parametrization process and by computationally expensive simulations, due to multi-body interactions and non-bonded electrostatic potentials.
Such detail may not be necessary if we limit our objective to recapitulating just the mechanical properties of DNA. We use an Associative Memory Hamiltonian [52][53][54] (AMH) to explore the energy landscape of chromatinized DNA, starting from known structures of DNA fragments and protein-DNA complexes. The AMH framework has been successfully employed in protein folding where the energy landscape of a given protein is reconstructed using crystallographic information of short fragments characterized by sequences overlapping that of the protein under investigation [52][53][54]. We apply this data-driven approach to build a model for DNA and nucleosomes using the 3D structures of B-DNA and nucleosomes. The AMH implicitly recapitulates all the energy terms already discussed: bending, twisting, base-stacking, and electrostatic energy. Parameters are tuned to reproduce the mechanical properties of DNA, including the twisting and the bending persistence lengths, and their coupling with supercoiling behaviors.
This model, dubbed the Widely Editable Chromatin Model (WEChroM), represents each nucleotide as a single particle and, besides polymer connectivity, uses AMH energy terms for all bonding interactions. By virtue of its simplicity, the WEChroM is computationally efficient and well-suited to simulate large genetic molecular systems up to kilobases of chromatin at base-pair resolution, reaching over to the size of mammalian genes.
We show that the WEChroM accurately reproduces the properties of naked DNA, such as bending and twisting persistence lengths, as well as supercoiling behaviors. Because of the finite energy barrier characteristic of the AMH potentials, the model can also account for the natural emergence of defects in the structure of DNA. Lastly, we introduce an associative memory template for the histone octamer and show that WEChroM recapitulates the free energy profile of nucleosome unwrapping.

An associative memory Hamiltonian for DNA
The AMH was introduced by Friedrichs and Wolynes [56] to study the problem of protein folding. Using crystallographic information, the AMH reconstructs the energy landscape of a protein and encodes the native structure of the target sequence as the global attractor of such a landscape. The energy landscape of the protein is approximated by selecting a subset of the pairs of atoms in the reference protein structure and constraining their distances using a Gaussian well potential. The resulting model of the molecule is roughly harmonic in the folded state but can accommodate configurations with partial or complete unfolding of the protein at a finite energy cost. More than one molecular configuration can be encoded in the Hamiltonian; each one of these configurations is called a "memory", in analogy with the terminology used in the field of recurrent neural networks [52,56,57].
We adapt the AMH framework to reconstruct DNA interactions and DNA-protein interactions. We start from idealized computational structures for B-DNA and from crystal structures of DNA-protein complexes as memories to construct the energy landscape of DNA and nucleosomes. In the WEChroM, each nucleotide is represented by one particle, as shown in Fig 1. The model is sequence-independent, but this can be easily generalized. The resolution of one nucleotide per particle allows for describing both the bending and twisting properties of DNA without additional rotational coordinates, and it allows the model to closely reproduce the complex mechanical behavior of the double helix, including buckling and the formation of plectonemes.
For naked DNA, the WEChroM potential energy consists of 3 major components: chain connectivity, steric interactions, and associative memory interactions, in this case labeled the double helix term (U DH ). The chain connectivity is enforced by harmonic interactions between two neighboring particles on the same strand, mimicking the covalent backbone bonds in a single-stranded DNA. Steric interactions are modeled using a short-range repulsive potential. A complete description of the energy terms modeling chain connectivity and steric interactions is provided in the Appendix B in S1 Document.
Pairwise distances (r m ij ) among nucleotides are captured from computational B-DNA structures and define the associative memory energy term (U DH ) where r ij is the distance between nucleotides i and j, and where the outer sum runs over memories and the inner sum over all pairs of nucleotides within the memory. The energy λ determines the strength of interactions and it is chosen to be a constant value, regardless of the nucleotide identity. The variance σ ij controls the scale of structural fluctuations. To allow for more flexibility when the interaction distance is larger, σ ij increases proportionally to the square root of the genomic distance between i, j, i.e., s i;iþn ¼ ffi ffi ffi n p s 0 (Fig A in S1 Document). In our model, each memory comprises 4 consecutive bp ( Fig 1C). Our analysis indicates that 4bp is the minimal memory size needed for reproducing the mechanics of DNA without creating artifacts. The resulted in interaction patterns among the nucleotides resemble those in Savelyev and Papoian [10,11]. While we were able to reproduce the bending and twisting persistence length of DNA using 3bp memories, this required unreasonably large interaction energies between nucleotide pairs, and we thus rejected the model. We introduce energy terms by sliding the 4-bp memory box along DNA in 1-bp shifts. This results in multiple memory interactions being added between pairs of nucleotides depending on their index difference (Fig A in S1 Document), with pairs of nucleotides closer in genomic distance interacting more strongly.
To create a sequence-independent structural template, we choose 10 random sequences and generate their structures using X3DNA [58]; we then calculate each nucleotide's center of mass and use it to define the distances between nucleotides r m ij . U DH enshrines the doublehelical structure of DNA as the global energy minimum of the WEChroM Hamiltonian while allowing for structural distortions as well as local defects in the DNA polymer, such as kinks and bubbles.
The current model does not include any energy term enforcing chirality. The correct chirality is set by the initial condition and preserved throughout the simulation. While in principle energy fluctuations could change the global chirality of the system, in practice we did not observe any of these events during the extensive testing of the model. We concluded that a term enforcing chirality was not necessary, at least for the systems under investigation. It is possible, however, that future studies for particular larger molecular systems and different conditions might require such an energy term.
Similar to modeling double-stranded DNA, we also use the associative memory Hamiltonian framework to describe how proteins influence the structure of chromatin. We model the effect of the histone octamer on DNA by using the crystal structure of a nucleosome core particle (PDB ID: 1kx5 [59]) as the associative memory template. In this case, associative memory terms are introduced among nucleotides and between them and the histone octamer's center of mass. Despite the simplified representation of protein-DNA interactions, our model recapitulates the free energy changes involved in the partial unwrapping of DNA from nucleosomes.
In summary, the AMH is a flexible theoretical framework that allows for introducing structural memories from a vast repertoire of DNA-binding proteins and histone variants and integrating the effect of these molecules at larger scales. This namesake characteristic of the Widely Editable Chromatin Model makes it particularly suitable for studying genetic systems above the nucleosomal scale and below the chromosomal scale, bridging the gap between the atomistic and bottom-up models typically utilized up to the nucleosomal scale, and the polymer physics approaches typical of the chromosomal scale [60]. The WEChroM is thus designed to provide the means for studying the structural ensembles of genes, which are too large for conventional molecular modeling and yet strongly affected by heterogeneous molecular factors to be effectively modeled by simple polymer models.

Bending and twisting persistence lengths
The molecular rigidity of naked DNA is quantified by two persistence lengths: the bending persistence length (L bp ) and the twisting persistence length (L tp ). Experimentally, the bending persistence length L bp of DNA has been found to be approximately 150 bp (around 50 nm) [63], while the twisting persistence length L tp is believed to be between 75 and 360 bp (or around 25-120 nm) [64][65][66]. In our model, the bending persistence length L bp is defined by the relaxation of the angle correlation as a function of the genomic distance L: where u(i) is tangent to the helical axis at base-pair i. Due to DNA's double helix structure, the angle formed by any two base pairs is oscillating, leading to the following relaxation function for the twisting angle: where v(i) is the vector pointing from one strand to the other strand within base pair (i). The twisting persistence length L tp is defined from the exponentially decaying envelope exp À L L tp � � ; while a, b are the period and phase of the double helix.
By tuning the parameters λ and σ 0 in the double helix term in the AMH, we obtain a range of values for L bp and L tp as shown in Fig 2A and 2B, respectively. The bending persistence length L bp , observed in simulations varies between 100 to 500 bp and monotonically increases with increasing λ or decreasing σ 0 . This is expected as larger λ 0 s lead to stronger associative memory interactions while smaller σ 0 0 s lead to reduced fluctuations of the distances between nucleotides, both resulting in a stiffer molecule. The twisting persistence length L tp , observed in simulations varies between 89 and 415bp, in good agreement with the experimentally determined range between 75 and 360bp ( Fig 2B and Fig 2C in S1 Document). The persistence lengths obtained simulating systems of different sizes (Fig C in S1 Document) are stable, indicating the convergence in the simulations.
Experimental studies have determined that the DNA persistence length depends on temperature, with L bp ranging from 140~165 bp at 277K to approximately 110~120bp at 330K [61,62]. We chose parameter sets characterized by a L bp~1 50bp at 300K, i.e., consistent with experiments at room temperature. For these parameter sets, we calculate the persistence length L bp as a function of the temperature, ranging from 270K to 340K. Simulations show that the persistence length L bp decreases from 150~180 bp at 275K to 120~140bp at 338K, which is in excellent agreement with the experimentally determined temperature dependence (Fig 3C).
Multiple sets of parameters are compatible with the reported bending and twisting persistence lengths from experiments (highlighted in Fig 2A and 2B and listed in Table B in S1 Document). These different sets of parameters, although having similar persistence lengths, show different features in other aspects. For example, the parameter set with (λ, σ) = (0.15, 0.05) has a similar L bp and L tp as other data points as shown in Fig 2A, yet the double helix with this parameter set is more prone to forming defects at room temperature. Nonetheless, other parameter sets can produce L bp and L tp that are consistent with the experiments without obvious defects. Such degeneration is caused by the scarcity of experimental constraints and can be removed by introducing additional information on specific genetic systems.
We performed a sensitivity analysis of both bending and twisting persistence length with respect to the energy terms in our model (Fig B in S1 Document). Such an analysis revealed that the stiffness of the double helix is much more sensitive to the strength of the inter-strand interactions than to changes in the intra-strand interactions, with minor increases in the strength of inter-strand bonds leading to a significantly stiffer DNA polymer. This is consistent with the fact that the bending persistence length of a single-stranded DNA (ssDNA) is only approximately 6 bp [67], much smaller than that of double-stranded DNA (dsDNA).

Supercoiling behavior of DNA minicircles
Genomic DNA is often subjected to torsional stress, which can both over-and under-wind the DNA double helix [68], resulting in twisting and coiling of the helix. Supercoiling can be induced by enzymes in order to reduce the volume of chromatin. Therefore, it will also be inevitably introduced when part of the DNA is uncoiled during essential cell processes, like transcription or replication [69,70]. Elucidating the structural properties of DNA molecules in different supercoiling states is therefore essential to improve our understanding of genome functions.
In order to quantitatively characterize supercoiling, we utilize the linking number (Lk), i.e., the total number of times the two single DNA strands coil about one another. The linking number is related to two geometrical properties of the molecule, the twist (Tw, the coiling of the two strands about the helical axis) and the writhe (Wr, the coiling of the helix axis's path around itself). These three properties are related by the simple relation More details on how to calculate these quantities are found in Appendix D in S1 Document. When the ends of a DNA molecule are covalently ligated to form a circle, the two strands become intertwined and will remain in such a state unless one of the strands is broken, i.e., the linking number is conserved. We performed simulations of supercoiled 350 bp DNA minicircles using the WEChroM model. We prepared 9 topoisomers with both negative and positive supercoiling, ranging from ΔLk = −4 to ΔLk = 4; the topoisomer ΔLk = 0 corresponding to the relaxed DNA minicircle.
Similarly to what is seen in experiments [71], a wide variety of structures is observed in simulations (Fig 3A). Positively or moderately negatively supercoiled minicircles, ΔLk = −1 to ΔLk = 4, tend to form plectonemes. These topoisomers transpose twist into writhe, relaxing twisting torsional strain into the bending strain due to writhe (Fig 3B). Surprisingly, strongly negatively supercoiled DNA minicircles do not show a tendency to form plectonemes. On the contrary, negative torsional stress appears to be more easily relaxed through the formation of defects in the double-helical structure, i.e., by local DNA melting (Fig 3C). Such asymmetric behavior of DNA with respect to supercoiling was previously observed experimentally [72,73], with AFM and Cyro-EM showing that negatively supercoiled minicircles often present melted regions at room temperature while positive ones do not [71,72,74].
As shown, the energy transfer between bending and twisting modes is an essential aspect of DNA mechanics. Such energy transfer is difficult to reproduce when using simpler models that lack details about the three-dimensional nature of DNA. Additionally, the formation of defects in the double-helical structure of DNA also appears to be essential to correctly describe its structural ensembles [75]. Taken together, these two facts highlight the necessity of two crucial design elements of the WEChroM: accounting explicitly for the 3D structure of the DNA double helix and using a non-harmonic potential to recapitulate its empirical energy function.

DNA-protein interactions and nucleosomes
In eukaryotes, DNA is wrapped around histones to form nucleosomes with each nucleosome containing approximately 147 base pairs of DNA wrapped 1.7 times around a histone octamer. Besides compacting chromatin, nucleosomes create a significant barrier to protein binding, in turn contributing to the regulation of essential cellular processes like the gene expression [76]. The outer stretches of nucleosomal DNA are observed to spontaneously unwrap and rewrap to the core octamer [6], significantly impacting the three-dimensional packing of chromatin. A faithful description of nucleosome dynamics is necessary to recapitulate the mechanical behavior of chromatin.
We introduce nucleosomes in the WEChroM by representing the histone octamer with a single particle at its center of mass, then apply AMH interactions between the DNA and histone particles to reproduce the effect of protein-DNA interactions. The DNA-protein interactions U Nucleosome can be formulated as follows: DNA bases in contact with the amino acids of the histone octamer are defined as "contact DNA". The first part of the nucleosome associative memory U center acts between the center of histone core and contact DNA (Fig 4A). The distances r m i are learned from a nucleosome crystal structure. The second potential, U neighbor , acts between neighboring contact DNA bases. The energy λ and the variance σ in the two parts were parameterized to match the thermodynamics of the isolated nucleosomes.
We investigate the free energy landscape of DNA unwrapping from the nucleosome core, and its dependence on the parameters in our model. The reaction coordinate is chosen as the distance between the two free ends of the DNA, which mimics the nucleosome extension reported in the optical trap experiment [76]. To explore the phase space of unwrapping more rapidly, we used Umbrella Sampling (US) and the Multiple Bennett Acceptance Ratio (MBAR) [77] to calculate the free energy as a function of end-to-end distance.
First we study a 223-bp system, with 147-bp DNA molecules bound to the histone core and 38-bp linker DNA particles on each side The same system was previously investigated using the 3-Site-per-nucleotide (3SPN) DNA model and the Atomic Interaction-based Coarse Grained (AICG) model of the nucleosome [38,78]. We trained the parameters in U Nucleosome to match the free energy computed with the more detailed 3SPN-AICG model by Lequieu et al in ref. [38].
After training, the free energy profile computed with WEChroM is in good agreement with the 3SPN-AICG (Fig 4B right), despite the simplicity of the AMH energy function. Both models place the free energy minimum at around 80 Å and this free energy monotonically increases for the simulated range, 250 Å in the case of 3SPN-AICG and up to 350 Å in the current study. The extension energy in the full range is at the order of magnitude of a few k B T, matching experimental observations that nucleosomes can spontaneously unwrap and rewrap the outer layer of DNA [6].
With this trained set of parameters for WEChroM, we study the unwrapping of DNA from the nucleosome core particle, i.e., the histones octamer wrapped by 147-bp DNA. As before, the free energy profile is computed using Umbrella Sampling and MBAR. The results are displayed in Fig 4C. Without the flexibility warranted by linker DNA, the free energy displays a two-staged non-monotonical increase as the end-to-end distance stretches to 220 Å, when the outer layer of DNA is completely dissociated from the histone core. For this 147-bp nucleosome, the free energy associated with unwrapping the outer layer is approximately 10 k B T, in consistent agreement with the rather scattered values obtained by experiments, which range from about 8 to 20 k B T [76,79,80].
The AMH potential energy of the two-stage unwrapping process is shown in Fig 4D. Consistently with the free energy, the nucleosome potential energy shows two sharp increases for the end-to-end distance between 80 and 100 Å as well as between 160 and 180 Å. An analysis of the structures associated with these end-to-end distances indicates that for each sharp increase, one end of the DNA is unbound from the histone core; snapshots of these conformations are displayed in Fig 4D. Altogether, our model suggests that the unwrapping of DNA from the nucleosome core particle is not a gradual-one base-pair at the time-process, instead, the molecular complex accumulates mechanical strain until one arm consisting of tens of base pairs suddenly breaks apart. The presence of molecular cracking in the DNA unwrapping process could be rationalized by considering the high stiffness of double-stranded DNA. On the other hand, because experiments typically require linker DNA to apply the external forces unwrapping the DNA, observing molecular cracking in the wet lab can be challenging, as we have shown that the presence of linker DNA masks this behavior. Overall, our results demonstrate that, beyond double-stranded DNA, the WEChroM model can effectively model the mechanics of single nucleosomes, opening the way to the study of the 10nm fiber and moving to complexes containing full genes.

Discussion
The Widely Editable Chromatin Model (WEChroM) proposes an alternative approach to the modeling of DNA and nucleosomes that can scale up to larger genetic systems such as the 10-nm fiber and genes. The objective of this work is to reproduce the mechanical properties of DNA and nucleosomes with a simple and computationally efficient mathematical model, with the compromise of relinquishing any attempt at reproducing the details of the chemical complexity of the system.
As shown, the model reproduces the bending and twisting persistence lengths, and the temperature dependence of the former. WEChroM is also able to reproduce the effects of positive and negative supercoiling, the formation of plectonemes and of structural defects that relax mechanical stress, and the asymmetric behavior with respect to positive or negative supercoiling, which was previously observed in experiments. Finally, we show that the proposed model is also suitable for reproducing the free energy of partial DNA unwrapping from nucleosomes. Indeed, these results demonstrate that the WEChroM model is a viable strategy to simulate the mechanics of 10-nm fiber, when willing to surrender chemical accuracy.
The structural ensembles of genes have been so far out of the reach of theoretical and computational investigations because genes are molecular complexes too large to be tackled with even the most efficient CG models and yet too strongly affected by heterogeneous molecular factors to be effectively modeled as simple polymer models. The WEChroM is an attempt to extend the reach of computational modeling with the alternative approach based on AMH. Preliminary benchmarking (see Appendix C in S1 Document) does indicate that the model is efficient enough to allow the study of genetic systems as big as tens of kilobases of chromatin, i.e., the size of mammalian genes.
Because of the extremely large number of combinations of nucleosome types, epigenetic modifications, linker lengths, and bound factors, modeling the chromatin fiber as a polymer composed of nucleosome monomers appears impractical. In fact, we are not likely to see the same exact monomer twice in a system composed of a few kilobases of DNA. WEChroM is also an attempt to solve this conundrum. The namesake characteristic of the Widely Editable Chromatin Model provides the needed flexibility for introducing a vast repertoire of structural memories of DNA binding proteins and histone variants and integrating the effect of these molecules at a larger scale. This characteristic of the AMH approach, which has not been yet exploited, opens the way to the study of structural ensembles of specific segments of chromatin, utilizing the vast amounts of omics data indicating the positioning and the identity of bound factors along DNA are already available for a variety of species, and human in particular.

Implementation of WEChroM on OpenMM
OpenMM is a toolkit developed for high-performance and extensible molecular dynamics simulations [81]. OpenMM provides a high-level application programming interface (API) for external programs like Python and makes the most efficient use of CPU and GPU hardware capabilities.
WEChroM is implemented in OpenMM with a variety of custom force templates optimized for each term in the Hamiltonian. Compatible hardware platforms include single CPUs, multiple CPUs, and GPUs (with CUDA, OpenCL, and HIP support).

Simulation setup
Atomic-resolution structures for B-DNA (PDB files) were generated using w3DNA 2.0 [58] with its B-DNA fiber model at lengths 52bp, 75bp, 200bp, 350bp, and 1000bp as discussed in the Results section and Fig B in S1 Document. The atomic-resolution structure for the nucleosome (1KX5 [59]) was downloaded from the Protein Data Bank. The atomic-resolution structures were coarse-grained to be used at the WEChroM. Users have the flexibility to provide atomic PDB files other than the above-mentioned ones to coarse grain into a WEChroM molecule. DNA particles and histone protein core particles (in the case of a nucleosome system) in the generated coarse-grained file are automatically recognized by the software. This coarsegrained file are then used as the initial structure of the simulation for straight DNA systems. On the other hand, for minicircle systems, a straight DNA will be twisted to modify the initial twisting number, and then bent into a circle to create the closed open-ring form, which will serve as the initial structure for the simulations.
For naked DNA simulations, WEChroM is ready to run simulations using a potential energy described in the Section "Associative Memory Hamiltonian of DNA" and Appendix B in S1 Document "Connectivity Interactions and Steric Interactions" after the initial preparation. Each term can be turned on and off and vary in strength and setting. A local energy minimization is performed before the formal start of the simulation to prevent unphysical forces produced by the initial positions. The sampling simulation is carried out via Langevin dynamics with a damping coefficient of γ = 1 /τ, where τ is the time unit. A time step of 0.02 τ was used for all the simulations. Only thermodynamical simulations are performed where the dynamics of systems converged in the equilibrium states and the relaxation time was chosen for the best sampling. The temperature is fixed at the unit temperature and assumed to be room temperature. A detailed description of the reduced unit system is provided in the Appendix A in S1 Document. The trajectories are saved in.dcd files. The equilibrium state was attained after 1×10 6 steps for the 75-bp straight DNA systems and after 1×10 8 steps for 350-bp minicircle systems. The substantial increase in steps required for the minicircles to reach equilibrium is to be expected because our initial structures have a high twisting number, and transitioning to the writhing number necessitates a global reconfiguration. Conversely, the straight DNA system starts near a potential energy minimum configuration and only needs to undergo thermal fluctuations to satisfy persistence length requirements. Each of the trajectories was simulated for 1×10 9 steps, storing a frame every 5×10 3 steps. Analyses are performed after the simulations by reading the trajectory files.
For nucleosome simulations, we employ umbrella sampling to sufficiently explore the configurational space. The DNA end-to-end distance serves as the reaction coordinate. The first or last base pair's center of mass is used to designate the ends of the DNA, and the Euclidean distance between them is used to determine the end-to-end distance. Forty-one umbrella windows are used for the 223 bp system (nucleosome core particle with two arms), and 23 are used for the 147bp system (nucleosome core particle system). In each window, the end-to-end distance is restrained around a center distance using a harmonic biasing potential. We utilized Langevin Dynamics with the same hyperparameters as in the naked DNA system, where the damping coefficient of 1 /τ and a time step of 0.02 τ, where τ is the unit time in our reduced unit system. Five replicas were used within each umbrella window, and each replica was simulated for 1×10 7 steps storing a frame every 1×10 3 steps. We then utilize the FastMBAR [77] to calculate the free energies and expectations based on the saved trajectories.
Supporting information S1 Document. WEChroM simulation and software description. Detailed description of the reduced unit system, force field setup, and mechanical property analysis for WEChroM systems.ik (DOCX)