Residue-level coarse-grained (CG) models have become one of the most popular tools in biomolecular simulations in the trade-off between modeling accuracy and computational efficiency. To investigate large-scale biological phenomena in molecular dynamics (MD) simulations with CG models, unified treatments of proteins and nucleic acids, as well as efficient parallel computations, are indispensable. In the GENESIS MD software, we implement several residue-level CG models, covering structure-based and context-based potentials for both well-folded biomolecules and intrinsically disordered regions. An amino acid residue in protein is represented as a single CG particle centered at the Cα atom position, while a nucleotide in RNA or DNA is modeled with three beads. Then, a single CG particle represents around ten heavy atoms in both proteins and nucleic acids. The input data in CG MD simulations are treated as GROMACS-style input files generated from a newly developed toolbox, GENESIS-CG-tool. To optimize the performance in CG MD simulations, we utilize multiple neighbor lists, each of which is attached to a different nonbonded interaction potential in the cell-linked list method. We found that random number generations for Gaussian distributions in the Langevin thermostat are one of the bottlenecks in CG MD simulations. Therefore, we parallelize the computations with message-passing-interface (MPI) to improve the performance on PC clusters or supercomputers. We simulate Herpes simplex virus (HSV) type 2 B-capsid and chromatin models containing more than 1,000 nucleosomes in GENESIS as examples of large-scale biomolecular simulations with residue-level CG models. This framework extends accessible spatial and temporal scales by multi-scale simulations to study biologically relevant phenomena, such as genome-scale chromatin folding or phase-separated membrane-less condensations.
Molecular dynamics (MD) simulations have been widely used to investigate biological phenomena that are difficult to study only with experiments. Since all-atom MD simulations of large biomolecular complexes are computationally expensive, coarse-grained (CG) models based on different approximations and interaction potentials have been developed so far. There are two practical issues in biological MD simulations with CG models. The first issue is the input file generations of highly heterogeneous systems. In contrast to well-established all-atom models, specific features are introduced in each CG model, making it difficult to generate input data for the systems containing different types of biomolecules. The second issue is how to improve the computational performance in CG MD simulations of heterogeneous biological systems. Here, we introduce a user-friendly toolbox to generate input files of residue-level CG models containing folded and disordered proteins, RNAs, and DNAs using a unified format and optimize the performance of CG MD simulations via efficient parallelization in GENESIS software. Our implementation will serve as a framework to develop novel CG models and investigate various biological phenomena in the cell.
Citation: Tan C, Jung J, Kobayashi C, Torre DUL, Takada S, Sugita Y (2022) Implementation of residue-level coarse-grained models in GENESIS for large-scale molecular dynamics simulations. PLoS Comput Biol 18(4): e1009578. https://doi.org/10.1371/journal.pcbi.1009578
Editor: Dina Schneidman-Duhovny, Hebrew University of Jerusalem, ISRAEL
Received: October 20, 2021; Accepted: March 26, 2022; Published: April 5, 2022
Copyright: © 2022 Tan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The source code and user manual of GENESIS v1.7.1 is available at https://www.r-ccs.riken.jp/labs/cbrt/ website, and GENESIS-CG-tool is both included as a part of GENESIS v1.7.1 and deployed at https://github.com/noinil/genesis_cg_tool. Tutorials of performing CG simulations with GENESIS v1.7.1 are open on https://www.r-ccs.riken.jp/labs/cbrt/tutorials2019/ website. All the MD simulation files and data to produce the results are available from https://github.com/RikenSugitaLab/cg-development-GENESIS-1.7.0.
Funding: The research was supported in part by MEXT as “Program for Promoting Researches on the Supercomputer Fugaku” (Biomolecular dynamics in a living cell (JPMXP1020200101) / MD-driven Precision Medicine (JPMXP1020200201)), Grant-in-aid for scientific research (Grant Numbers, 19H05645, 21H05249 (to YS), 21H05282 (to JJ and CT) and by RIKEN Pioneering research projects (Glycolipidologue Initiative, biology of intracellular environments) (to YS). The computer resources were provided by RIKEN (HOKUSAI BigWaterfall), RIKEN Center for Computational Science (Fugaku supercomputer), and HPCI (project IDs: hp200028, hp200129, hp200135, hp210172, hp210177). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
This is a PLOS Computational Biology Software paper.
Molecular dynamics (MD) simulation of a biomolecule in solution or membrane has been successfully applied to biophysical or biochemical problems at high spatial and temporal resolutions that are unattainable by experimental techniques. Recently, biological phenomena involving many biomolecules at larger scales have attracted much more attention than before in molecular and cellular biology, as well as medical and pharmaceutical studies. Genome-scale chromatin folding  and phase-separated membrane-less condensations  are well-known examples, which have already been targets of MD simulation studies [3–7]. To obtain biological insights on these phenomena, further methodological and computational challenges need to be met. An enormous number of atoms, for instance, more than 10–100 million atoms, are involved in such target systems, requiring us to improve the computational performance of the simulations. Alongside the size problem, there are also barriers in achieving sufficient conformational sampling in biologically meaningful time scales. Considering these issues, coarse-grained (CG) models of biomolecules are very useful in MD simulations, being a trade-off between modeling accuracy and computational efficiency. The CG models are able to decrease the number of degrees of freedom in a system and to smooth out the ruggedness of free energy landscapes, both of which efficiently accelerate MD simulations of biological systems .
Although there are various levels of coarse-graining [9–13], they can be categorized roughly into two classes of models in terms of their theories and parameterizations: physical-based or structure-based CG models. The MARTINI model  is one of the representatives in the former, while the Gō model [14–16], which describes the funnel-like energy landscape of protein folding, is probably the most well-known structure-based one. The resolution of each CG model, namely, how many heavy atoms are integrated as a single CG particle, is another crucial point in the choice of CG models. Besides, the solvent is treated differently, either represented by explicit but simplified water models  or modeled implicitly through effective interactions [12,17]. In the latter case, Brownian dynamics with hydrodynamic interactions [18,19] or the Lattice Boltzmann method [20,21] can be used to account for the effects of solvent-induced correlations. Here, we focus on residue-level CG models with non-hydrodynamic implicit solvation , whose interaction potentials are primarily described via structure-based ones. These models have been utilized in many simulation studies of folding dynamics of single proteins , conformational dynamics of nucleic acids [23,24], and complex molecular machinery . In these models, an amino acid residue is usually simplified as one CG particle on the Cα atom, and a nucleotide is represented using three particles corresponding to the phosphate (P), sugar (S), and base (B), respectively (Fig 1). With this representation, proteins and nucleic acids are modeled with a similar resolution of around ten heavy atoms per single CG particle. Furthermore, as an extension from the original Gō model, the atomic interaction-based CG model version 2+ (AICG2+)  protein model introduces flexible local potentials and sequence-dependent contact parameters to gain a finer description of the free energy landscapes. The same strategy has been applied to RNA molecules . On the other hand, for DNA, the 3SPN.2C model has been designed to describe the sequence-dependent geometric and mechanical properties of double-stranded DNA (dsDNA) .
Mapping of a protein (top) and nucleic acids (bottom) from atomistic structures to the residue-level CG models. Each amino acid residue is represented with one CG particle (Cα), and each nucleotide is represented using three CG particles, phosphate (P), sugar (S), and base (B), respectively. Left: three-dimensional structure of protein and nucleic acids. Big and transparent spheres are the CG particles, with Cα in green, P in red, S in yellow, and B in blue. Small and solid spheres represent heavy atoms, with phosphorus in dark blue, nitrogen in pale blue, oxygen in red, Cα in green, and all the other carbon in white. Right: two-dimensional ball-and-stick cartoons of protein and nucleic acids abstracted from the left structures, using the same color scheme.
The AICG2+ and 3SPN.2C models have been used jointly in simulations of protein-DNA complexes [29–31], although these two were developed by different research groups. In these simulations, electrostatic and excluded volume interactions between proteins or between proteins and nucleic acids are necessary for heterogeneous biological systems. In addition to generic and physical interactions, the PWMcos method integrates high-throughput sequencing experimental results in the form of the position weight matrix (PWM) with complex structural information to describe the recognition of DNA bases by proteins . Not only for the well-defined folded structures of proteins, residue-level CG models can also be used to study highly flexible and intrinsically disordered regions (IDR). In particular, two pairwise interaction models, Hydrophobicity scale (HPS)  and Kim-Hummer (KH) , have been tested and refined to reproduce the phase behaviors of protein IDRs [35,36] and RNAs  by Best, Mittal, and their coworkers.
The CG models mentioned above were implemented in general-purpose MD packages such as LAMMPS [38,39], GROMACS [40,41], NAMD [40,42], and OpenMM , or specialized CG software such as CafeMol . Although these implementations are convenient tools for performing CG simulations, there are still some barriers for regular users or even developers. Each residue-based CG model is usually developed by different groups in different MD packages. If someone wants to use two of them simultaneously, it is not straightforward to combine two different CG models by treating their input or output data properly. Therefore, there is a necessity to incorporate the latest CG models into a unified working environment and provide a user-friendly tool for preparing MD input files. Furthermore, simultaneous usages of different CG models can be a powerful tool in the cutting-edge studies of protein-nucleic acid complexes in cellular environments. For example, the liquid-liquid phase-separated (LLPS) systems often involve complicated interactions among multi-domain proteins , IDRs , RNAs , and DNAs  and reach the length scales of hundreds to thousands of nanometers . Such systems require both proper modeling of biomolecular interactions and efficient handling of computational resources.
In the current work, we present an implementation of several residue-level CG models of protein and nucleic acids in the GENESIS MD software [50,51]. In conjunction, we provide a collection of Julia  scripts, which we call GENESIS-CG-tool, to help users generate CG MD input files of complex biomolecular systems. The optimization and parallelization of CG MD simulations are the other focus in the current work. Although all-atom MD simulations in GENESIS have been highly optimized and parallelized, additional efforts are required to improve the performance of CG MD simulations because of the uniqueness of individual interaction potentials. We also noticed that some computations that take negligible fractions of time in all-atom MD simulations could be a bottleneck in CG MD simulations. After the implementation in GENESIS, memory and performance benchmark tests are carried out. CG MD simulations in GENESIS are applied to several cases of different sizes and molecular compositions, such as protein diffusion on DNA and LLPS consisting of protein IDRs and RNAs. The atomistic and CG MD simulations available in GENESIS can be a practical framework to investigate cellular-scale biological phenomena at multi-scale resolutions.
Design and implementation
Basic interaction terms
We first define basic interaction terms that are commonly used in various residue-level CG models. Each model, such as AICG2+, 3SPN.2C, HPS/KH, combines several basic interaction terms to represent intra- and inter-molecular potentials of proteins and/or nucleic acids.
The most typical bond potential has the following harmonic form: (1) where bi is the bond length between two neighboring CG particles, bi,0 is the value of bi in the reference structure, and is the force constant.
A higher-order term is also used in some CG models, such as the 3SPN.2C DNA model : (2) where bi and bi,0 have the same meaning as in Eq (1); and are the force constants in the quadratic and quartic terms, respectively.
The harmonic angle potential is given by: (3) where θi is the bond angle formed by the adjacent three CG particles, θi,0 is its reference value, and ka,i is the force constant.
The AICG2+ protein model  uses a knowledge-based angle potential, which is based on the Boltzmann inversion method , (4) where kB is the Boltzmann constant, T is temperature in MD simulation, and Pa(θi) is an amino-acid type-dependent probability distribution of the angles analyzed from a set of PDB structures. is further refined via the iterative Boltzmann algorithm. Practically, in MD simulations, we calculate the energies and forces using spline functions fitted to values in a table containing the energy and its derivatives at several points.
The periodic proper dihedral potential is defined as follows: (5) where ϕi and ϕi,0 are the dihedral angle and its reference value, respectively, n is an integer number that controls the periodicity of the function, and kϕ,i,n is the force constant.
We also implement a Gaussian type dihedral potential, which is used in the AICG2+  and the 3SPN.2C models : (6) where ϕi and ϕi,0 have the same meaning as in Eq (5), σϕ,i controls the Gaussian width and ϵϕ,i is the force constant.
Similar to Eq (4), AICG2+  also uses a knowledge-based dihedral term, which is based on the Boltzmann inversion potential : (7) where Pd(ϕi) is the probability distribution of the dihedral angle. The is further refined via the iterative Boltzmann algorithm. This is also calculated as a tabulated function for better performance. Notably, in each of the dihedral potentials described above, we introduce a recently proposed algorithm  to improve the robustness and numerical stability in the force evaluations.
The structure-based Gō-like models for proteins usually use a 12–10 potential to model a native contact, defined as two residues containing heavy atoms at a distance less than a specific cutoff in the native structure [14,26]. The potential  is defined as: (8) where ri is the distance between two CG particles in a native-contact pair, σi is the native value of ri, and ϵGō,i is the force constant. Note that it is applicable not only to proteins but also to nucleic acids if their experimental structures are known.
The Lennard-Jones (LJ) potential (or 12–6 potential) is also widely used to describe pairwise nonlocal interactions and is commonly defined as: (9) where ϵLJ,i represents the minimum energy.
The LJ potential truncated at its energy minimum is a purely repulsive potential and can be used as the excluded volume interaction: (10) where rC,exv is the cutoff distance and has the same value as σi. Note that the LJ forms in Eq (9) and Eq (10) are different in a coefficient of 2 for the 6th-power term. However, these two forms are mathematically equivalent when the parameters (σ and ϵ) are appropriately converted.
A simpler form of the excluded volume interaction is given by: (11) where rC,exv = 2σi and .
Other pairwise nonbonded interactions include the Gaussian potential: (12) and the Morse potential: (13) where ϵG,i and ϵM,i are the “depth”, and wi and αi are the “width” of the Gaussian and Morse potentials, respectively. In the 3SPN.2C model , the Morse potential is divided into two parts, the repulsive () and the attractive (), as defined in the following: (13A) and (13B)
The electrostatic interactions are modeled using the Debye-Hückel theory: (14) where ri is the distance between the charged particles i1 and i2, whose charges are and , respectively, ε0 is the dielectric permittivity of vacuum, λD is the Debye screening length, εr is the relative permittivity of the solution, which is a function of the solution temperature T and salt molarity C: εr = e(T)a(C), where , and . The Debye length is defined as: where NA is the Avogadro’s number, ec is the elementary charge, and I is the ionic strength of the solution.
When residue-level CG models are used to describe specific nonbonded interactions, such as the hydrogen bonds (HBs) or the π-π stacking, it is necessary to introduce multi-body potentials. For instance, in the 3SPN.2C model, the base-stacking, base-pairing, and cross-base-stacking interactions are described with the angle and torsion-angle dependent multi-body potential functions [28,39]. Specifically, the angle-dependent modulating function in the 3SPN.2C model is defined as : (15) where Δθ is the difference between an angle (θ) and its reference value, and γ controls the tuning range.
Residue-level CG models
We next describe the CG models as the combinations of the basic interaction terms defined in the last section. We mainly focus on the energy function forms rather than the detailed parameters, which users can easily change in their MD simulations.
The AICG2+ model for folded proteins.
The energy function of the AICG2+ model  is defined as: (16) where Γ and Γ0 are the simulated and native structures, respectively. The reference values in each term are determined from the native values in Γ0, except for the non-native contact term. In the non-native contact interactions, σi is dependent on the excluded volume radii of the particles in contact, using the combination rule of , where i1 and i2 are the indices of the particles in the i-th non-native contact, with and as their residue type-dependent radii.
The HPS and KH models for IDR.
The two IDR models, HPS  and KH , share the same energy function : (17) where Γ is the conformation of an IDP. EHPS,KH(ri) has the Ashbaugh-Hatch form : (18) where λi in the HPS model is the hydrophobicity scale, and in the KH model is a sign function to regulate the attractiveness or repulsiveness. All the other quantities, ri, σi, and ϵLJ,i, have the same meaning as defined in Eq (9). Specifically, ϵLJ,i and λi are based on the amino acid hydrophobicity in HPS  or the Miyazawa-Jerningan potential in KH [34,58], respectively.
The same potential functions are also used in the HPS RNA model , which has been developed to study the co-condensation of RNA and protein IDRs. In the HPS RNA model , the resolution is only one bead for one nucleotide, which is lower than the other nucleic acid CG models in GENESIS. Accordingly, the bond lengths and particle radii are also larger than the other models. Note that GENESIS also provides a structure-based three-bead-per-nucleotide RNA model, as described in a later section.
The 3SPN.2C model for DNA.
The potential energy function of the 3SPN.2C model [28,39] is defined as follows: (19) where Γ is the conformation of the DNA molecule, and "exv pairs" are the nonbonded particle pairs that are not involved in either base-pairing or stacking interactions. The three terms for base-base interactions, Ebstk, Ebp, and Ecstk, are defined as multi-body energy functions, using the basic equations described in the previous section (Eq (13A), (13B) and (15)): (20) (21) (22) where ri represents the distance between the two interacting bases, and the angles (θBS,i, θ1,i, θ2,i, θ3,i, and θCS,i) and dihedral angles (ϕBP,i) are formed by the surrounding sugar and phosphate sites (for details of the definition, please refer to ).
The structure-based model for RNA.
In addition to the one-bead (per-nucleotide) HPS model, we also implement a structure-based three-bead model for RNA , whose parameters are determined with the fluctuation matching method . The energy function of this model  is defined as: (23) where Γ is the conformation of the RNA molecule. The electrostatic term, Eele, is optional in the intra-RNA interactions but required for protein-RNA interactions . In the later case, ri in Eele(ri) represents the distance between a charged amino-acid residue and an RNA phosphate.
The PWMcos model for protein-DNA interaction.
The protein-DNA binding is generally considered in two parts: the sequence-nonspecific interactions between amino acids with the DNA backbone groups (mainly the electrostatic interactions) and the sequence-specific interactions between amino acids and DNA bases. The PWMcos model can describe the latter, incorporating the PWM information into the structure-based interactions . The model first defines a list of DNA-binding protein residues (DB-Cαs) forming contact with DNA in the native structure, and then the potential energy is then given by : (24) where m′ is the base in the complementary strand that forms a base-pair with base m, bi is the base type of any base i (bi∈[A, C, G, T]), and is the coordinates of particles in each conformation. The function is defined as: where rij is the distance between the i-th and the j-th Cα, and θ1, θ2, and θ3 are three angles defined by the surrounding particles (for details, please refer to ). EGaussian(r) and f(Δθ) functions are defined in Eq (12) and (15), respectively. The energy coefficient of EGaussian(r) (ϵG,i in Eq (12)) is defined as a function of base type bi based on the PWM: where ePWM,m(b) is the element in the mth column of the PWM and in the row corresponding to base type b, Nm is the total number of protein contacts formed with the mth base pair, ϵ′ is a hyperparameter that shifts the absolute value of the function, and γ is a scaling factor to change the strength of the interaction.
Notably, a variation of the PWMcos has been used to model sequence-non-specific hydrogen bonds between protein and DNA backbone : (25) where rij represents the distance from the i-th phosphate (instead of base) to the j-th Cα, and θ1 and θ2 are the angles defined by the local particles around the target phosphate and Cα atom in the configuration with coordinates .
Summary of the potential energy functions available in GENESIS.
Although the models mentioned above are developed independently, some have been proved to work well with each other. For instance, the association of the AICG2+ and the 3SPN.2C models has shown the capability to study the protein-DNA binding systems [29,31]. Between proteins and DNA, electrostatic interactions contribute most to the sequence-nonspecific affinity. For the accuracy of electrostatics modeling, the RESPAC method was developed to calculate the surface charge distribution of proteins . Furthermore, the angle-dependent HB potential enables modeling the rotation-coupled and uncoupled sliding of nucleosomal DNA [30,61]. The PWMcos method also facilitates the target search of transcription factors on both linear DNA and nucleosomes [32,62].
In addition to these well-developed methods, GENESIS also allows the combinations of different models. In Table 1, we list all the available CG models in GENESIS 1.7.0 in mixtures of proteins, DNAs, and RNAs.
Fig 2 shows a flowchart for running CG simulations in GENESIS. We roughly divide the whole procedure into three steps: (i) preparation, (ii) simulation, and (iii) analysis. GENESIS provides tools to achieve the task in each step. In the preparation stage, we use a collection of scripts, “GENESIS-CG-tool”, to translate experimental results into CG topology and coordinate files. Together with the MD control file, these files are then parsed by one of the GENESIS MD engines, atdyn, to perform energy/force calculations and time integrations. As the output data from MD simulations, GENESIS produces a log file of system properties including temperature, system size, and potential energy. The output coordinates from MD simulation are written in a trajectory file with the DCD format, which are processed by the GENESIS analysis tools and can be visualized by VMD , PyMOL , or other molecular graphics software. Here, we mainly focus on our developments in the first two stages, since the analysis using the DCD format files and analysis tools is common to those of all-atom MD simulations.
Dashed boxes represent collections of files used together as input or output at a particular step. Solid red boxes are used for the executable tools provided by GENESIS. Small boxes include typical filename extensions.
We next explain the usage of the GENESIS-CG-tool, which can read structural information and generate force field parameters and coordinates for MD simulations. As input, the structural information is in the PDB format, which includes atom coordinates determined by experiments such as X-ray crystallography, NMR, or Cryo-EM. As output, we decided to use a unified GROMACS-like file format for all the CG models. Particularly, the output of GENESIS-CG-tool includes a major topology file (.top) and a bundle of molecule-specific files (.itp), as well as a coordinate file for all the particles in the system (.gro) (Fig 2). The molecule-specific topology (.itp) files contain the information of intra-molecular interactions, such as bond, angle, dihedral, and Gō-type native-contact interactions. Each term includes the indices of the involved particles, reference values of the variables, and the force parameters. Additionally, an integer variable (called the “function type”) is used to distinguish the various potential functions of the same interaction type. For instance, we set the function types of 1 and 21 to (Eq (1)) and (Eq (2)), respectively.
GENESIS-CG-tool was developed with Julia , a fast, open-sourced, and reproducible programming language. The project’s main entrance is a script called “aa_2_cg.jl”, which works in the command-line environment of UNIX-like shells. The script takes a PDB file name as the positional argument and several model-related optional arguments. During the processing, the atomistic coordinates are read from PDB files. Then, local and nonlocal interactions, including their native values, are detected from the PDB structure. Other parameters, such as force constants, are determined by the models specified for each molecule.
As a simple example, assuming that one wants to prepare CG MD input files for a protein using a PDB file, “PRO1.pdb”, the command “aa_2_cg.jl PRO1.pdb” will generate “PRO1_cg.top” and “PRO1_cg.itp” topology files and “PRO1_cg.gro” coordinate file (Figs 2 and S1). The AICG2+ model is used for proteins by default, which can be changed with the option “--force-field-protein”.
The 3SPN.2C model of DNA uses reference structures that include sequence-dependent geometric features. These structures can be generated by software such as 3DNA . In GENESIS-CG-tool, with no requirement of external packages, we provide a simple “sequence-to-structure” script, which reads in DNA sequences and directly outputs both atomistic PDB and CG topology/coordinate files for dsDNA, including the necessary sequence-dependent information (see S1 Fig).
For heterogeneous systems that contain more than one type of biomolecules, the molecule-specific “.itp” files can be linked into the “.top” file, using the “#include” keyword.
Notably, recent improvements in experimental techniques such as cryo-EM have fostered more high-resolution structures of the huge biomolecule complexes, which are valuable resources in MD simulations. The spatial scales of these structures, however, outstrip the capacity of the traditional PDB file format. Another format, PDBx/mmCIF , has been used to store coordinate information of large structures. GENESIS-CG-tool also accepts files of the PDBx/mmCIF format as input. Besides, we also noticed that detecting the Gō-like native contacts is rather time-consuming for large protein complexes. We utilize the Julia thread parallelization to accelerate this procedure in the GENESIS-CG-tool.
We have packaged GENESIS-CG-tool as a part of GENESIS v1.7.0. We have also deployed the GENESIS-CG-tool in an individual Github repository: https://github.com/noinil/genesis_cg_tools. Users can refer to the online wiki page of this project for more details on the usage and available file formats.
In GENESIS v1.7.0, there are two MD programs: spdyn and atdyn. Spdyn is parallelized using the mid-point cell method , one of the domain-decomposition schemes; while atdyn is parallelized with the atomic decomposition scheme [50,51]. We implement the residue-level CG models only in atdyn because of the feasibility of implementations and the number of CG particles required in most CG MD simulations. In atdyn, the Clementi Gō , Karanicolas-Brooks Gō , Domain-enhanced Model (DoME) , and dual/multi-basin Gō models  are also available and use the same topology and coordinate file formats as described in the previous section. Since atdyn is already a well-developed framework, the original source code is reused as much as possible to implement the residue-level CG models. In the following, we describe the newly introduced functions and optimization schemes only.
Cutoff scheme of the nonbonded interactions.
The nonbonded interaction terms are the most time-consuming computations both in all-atom and CG MD simulations. Cutoff schemes are used for the nonbonded terms in CG MD simulations to reduce the computation time. For relatively short-range nonbonded interactions, such as and , cutoff values are already included in the definition (see Eq (10) and (11)). In contrast, the other interactions do not have generally established cutoff values. We provide three parameters to control the cutoff lengths of the Ebp, ELJ, and Eele terms in our implementations. We set the default cutoff value of Ebp to 18Å, following the original 3SPN.2C model [28,39]. We use a value of 52Å for Eele, which assures that the absolute value of the energy at the cutoff distance to be smaller than 10−4 kcal/mol (calculated at the temperature of 300K and with an ionic concentration of 150mM, for two charges of ±1 e−). This cutoff value of Eele is slightly larger than the one used in the original 3SPN.2C model and guarantees accuracy . As for the LJ potential used in the HPS/KH IDR models, the parameters ϵLJ,i and σi are dependent on residue-types . Therefore, we determine the LJ cutoff using a strategy that, for the i-th pair, the absolute value of the energy at the cutoff distance is as small as 10−4 times of the energy minimum: ELJ(rC,LJ) = 10−4ϵLJ,i. This gives a value of rC,LJ≃5.849σi. We then choose the largest σi from the HPS/KH models to determine a unified value of 39Å as the cutoff for ELJ. Note that these default values are considered as the “safest” ones. The users may be able to change them to smaller values by carefully evaluating the balance between computational efficiency and accuracy.
Several nonbonded interaction terms are considered only for pre-defined pairs of CG particles, such as the DNA base-stacking (Ebstk) and the Gō-type native contacts (EGō). For these terms, we don’t apply the cutoffs but directly calculate all the interactions.
Neighbor lists are used to maintain a list of particles that can participate in nonbonded energy/force interactions within a few MD integration steps. We set the neighbor list distances (rP, also called “pair-list distance” in GENESIS) to be roughly 5Å larger than the largest cutoff distance (rC). A complete list of the default cutoff and neighbor list distances can be seen in Table 2.
We utilize the cell linked list strategy to construct the neighbor list. Fig 3 illustrates how we determine the neighbor lists of different nonbonded interaction terms. During every update of the neighbor lists, the system is first divided into small cells. In each cell Ci and for each interaction term, we construct a “particle list”, Lnb−term(Ci), which contains the particles located in Ci and that are involved in the interaction (“nb-term”). For example, Lele(Ci) is a list of all the charged particles in the cell Ci. We then determine the “neighboring cells” for each cell: , where rmin(Ci, Ck) is the minimum distance between the cells Ci and Ck. As shown in Fig 3A, relatively short-range interaction terms such as or have a small number of the neighboring cells, whereas those with longer cutoff distances, such as Eele, will have more neighboring cells.
(A) An example of determining the neighbor list for Eexv (representing either or ) and Eele on a two-dimensional space. The gray dotted lines show the edges of the cells. The center cell with dark dashed edges (Ci) contains the particle a (black dot) whose neighbor list is to be determined. rP,exv and rP,ele are the neighbor-list distances for Eexv and Eele, respectively. The neighboring cells for the Eexv term () are represented by the red cells and surrounded by the red dashed lines. Those for the ) are shown as the green area edged by green dashed lines. The red and green circles indicate the range of final neighbor lists for Eexv and Eele, respectively. (B) Pseudo code of the algorithm to determine a neighbor list of the electrostatic interaction. Lele(Ck) is a list of all the charged particles in cell Ck.
After defining the “particle list” (Lnb−term(Ci)) and “neighboring cells” () of each cell, the neighbor list for each particle can be determined by the algorithm in Fig 3B (for convenience, here we use electrostatic interaction as an example). The neighbor lists are updated every 20 steps in the CG simulations.
Using individual particle list and neighboring cell for each nonbonded interaction term, we can minimize the pairwise distance calculations. First, as described above, we can avoid do loops over unnecessary cells for short-ranged interactions by using the potential-dependent neighboring cells. Besides, in each cell, we just consider a pre-assigned subgroup of particles (the Lnb−term(Ci) lists defined above) instead of all the particles. For example, in a 3SPN.2C DNA system, to construct the neighbor list for the electrostatic interactions, we only have to scan the charged particles (phosphates), which are only ~1/3 of all the particles. Whereas for the base-pairing interactions, we scan another group of particles, the bases, also ~1/3 of all the particles. Generally speaking, for the pairwise non-local interactions, considering 1/N of the particles means only 1/N2 calculations compared to scanning over all the particles. Therefore, using cell linked list and grouping particles according to interaction terms can save a lot of computations in the neighbor list construction. On the other side, however, we also note that when periodic boundary conditions are used with the cell linked list method, the simulation box should have each dimension equal to or larger than three times the largest pair-list distance (57Å for the electrostatic interaction by default).
GENESIS atdyn has provided various integrators and thermostats/barostats. For the current CG implementation, we employ the velocity-Verlet integrator coupled with the Langevin thermostat. In the velocity-Verlet algorithm, coordinates (ri) and velocities (vi) are updated following: (26A) (26B) where t is the simulation time, Δt is the integration step size, mi, ri, Fi, and vi are the mass, coordinate, force, and velocity of the i-th particle, respectively. In the Langevin thermostat, the force Fi is calculated by: (27) where Vtotal is the total potential energy and γ is the friction constant. ξi(t) is the Gaussian noise vector satisfying: (28) where kB is the Boltzmann constant, T is temperature, δt,t′ = 1 when t = t′ and 0 otherwise, and I is an identity matrix of size 3×3. ξi(t′)T is the transpose of ξi(t′) and the product follows matrix multiplication.
We use 10 femtosecond (fs) as the literal time step size (Δt in Eq 26), which is determined by the highest vibration frequency of the particles. On the other side, the time scale mapping of large-scale conformational dynamics of biomolecules is system-dependent and requires a careful comparison between simulated and experimental physical quantities .
Next, we optimize the performance of CG MD simulations implemented in GENESIS. At first, we optimize the nonbonded interactions, which are the most time-consuming parts in both residue-level CG and all-atom force-field models. There are two common strategies in the optimization. The first one is the vectorization of the most inner loops for utilizing as many SIMD (single instruction and multiple data) instructions as possible. The second one is to minimize the number of computations by using conditional statements. In contrast to all-atom models, using “conditional” statements is not avoidable due to the complex potential energy function forms. In our experience, the ratio rP3:rC3 gives us a good criteria whether the calculation would be vectorized or not using “conditional” statements. For instance, in the case of Eele, the ratio is about 1.3, and it is better to apply vectorizations to the inner loop without any “conditional” statements. The ratio in or evaluations are greater than 3.5, and it is better to optimize by minimizing the calculation amount with conditional statements. The ratio in ELJ or Ebp is less than 3, but we use conditional statements because the energy expression form itself is changed in HPS, KH, or 3SPN.2C models.
Another computational bottleneck in CG MD simulations is the generation of random numbers with Gaussian distributions when using the Langevin thermostat. This is mainly because of the replicated data MPI parallelization scheme used in GENESIS atdyn, in which all processes have a copy of the coordinate data. By keeping the replicated data scheme, we assign MPI parallelization in random number generation and apply “MPI_Allgatherv” to collect the information. We do not apply MPI parallelization to integrate coordinates or momenta because the communication time is comparable or even more time-consuming than the integration itself. Instead, OpenMP parallelization is applied in integrations.
Benchmark of MD simulations with residue-level CG models
We first examine the usage of memory and the performance benchmark in MD simulations with residue-level CG models. The memory usage was examined with a single process and a single thread on a machine with 93GiB RAM (random access memory). The CPU benchmarks were executed with 5 OpenMP threads and various MPI processes on a computer server with Intel Xeon Gold-6148 (2.4GHz) CPU (20 cores per CPU and two CPUs per node) and InfiniBand EDR networking. GENESIS was compiled with the Intel compiler in couple with Intel MPI version 2019.5.281. Fig 4 shows the results of the memory benchmark (Fig 4A) and CPU benchmark using different biological systems (Fig 4B, 4C and 4D).
(A) Memory benchmark of the RNA Polymerase II (PolII) systems. A single PolII has 3,695 CG particles, which is then duplicated into a n×n×n (n = 1,…,10) grid space. The insets show the structure of a single PolII (left) and 1000 PolIIs (right). Protein is shown in blue, DNA in red, and RNA in yellow. (B) CPU benchmark of a system containing 120 DPS proteins. The total number of CG particles is 222,360. (C) CPU benchmark for a system of 5000 chains of 100aa IDPs. The total number of CG particles is 500,000. (D) CPU benchmark for a system including 512 nucleosomes. The total number of CG particles is 1,044,480. The insets of (B), (C), and (D) show the initial structure of each system, respectively. N represents the total number of CG particles in each system. For all the CPU benchmarks, we used 5 OpenMP threads.
To test the memory consumption, we built systems consisting of different numbers of RNA Polymerase II (PolII). The smallest system has only one PolII and contains 3695 CG particles (3542 from protein, 124 from DNA, and 29 from RNA), based on the X-ray crystallography structure (PDB ID: 1R9T) . We then duplicated the single PolII to create systems of different sizes, ranging from thousands to millions of particles. In all these tested systems, we used VAICG2+ for proteins, V3SPN.2C for DNAs, and VRNA for RNAs. For proteins, we use integer charges distributed on the charged residues (+e for Arg and Lys, −e for Asp and Glu). For the interactions between different biomolecule types, we applied and Eele as general nonspecific interactions, and EGō for native contacts between protein-RNA, protein-DNA, and RNA-DNA to maintain the whole structure. All the cutoff and pair-list distances are set to the default values (see Table 2). As shown in Fig 4A, we carried out simulations with n3 (n = 1,…10) duplicated PolIIs to track the memory usage. The consumed memory scales with the system sizes, from ~40 MB for a single PolII to ~22 GB for 1000 PolIIs. Notably, on an ordinary computer with 10GB memory, we can run GENESIS CG MD simulations for a system with about 2 million particles.
To examine the strong scaling of GENESIS CG simulations, we prepare three systems comprised of different types of biomolecules:
- 120 copies of the protein DPSs . In total, 222,360 CG particles modeled with VAICG2+;
- 5000 chains of IDPs, each consisting of 100 amino acids. In total, 500,000 CG particles modeled with VHPS;
- 512 copies of nucleosomes . In total, 1,044,480 particles, using VAICG2+ for proteins, V3SPN.2C for DNAs, and Eele for protein-DNA interactions.
and Eele are used as general non-specific interactions for systems 1) and 3); and EHB is additionally applied between protein and DNA in system 3). For all the cutoff and pair-list distances, we use the default values listed in Table 2. The benchmark results for these three systems are shown in Fig 4B, 4C and 4D, respectively. The results suggest that at least up to 64 MPI processes (8 nodes), the parallel performances are scalable in all three systems. Although this does not show perfect scalability, it is useful for simulating large biological systems. We discuss possible reasons for the insufficient scalabilities of our implementation and future plans to improve the performance in the Discussion section.
Application 1: protein target search on DNA
The AICG2+ protein model (VAICG2+), and the 3SPN.2C DNA model (V3SPN.2C), in combination with the sequence-specific (EPWMcos) or nonspecific (EHB) protein-DNA interactions, can be used to study the binding, diffusion, and conformational changes of protein-DNA complexes [29,30,61,62]. As a simple example, we performed MD simulations of the sex-determining region Y protein (sry)  and its target DNA. The modeling and simulation results are shown in Fig 5.
(A) Solution NMR structure of sry binding on DNA (PDB entry: 1J46). (B) Coarse-grained sry with the Cα particles colored by the partial charges (determined using the RESPAC method). DNA structure from the PDB is also shown as lines for reference. (C) Sequence logo of sry’s DNA target. (D) Sequence of DNA used in the CG simulations. The consensus sequence “TAAACAAT” is inserted at the center of a 50bp poly-CG DNA. (E) Time series of sry’s binding position on DNA (bsry). The green region represents the consensus sequence. (F) Time series of the bending angle of DNA (θ). Two representative structures of sry binding at the poly-CG region or the consensus region are shown on top of (E). In these structures, DNA is colored in gray, except for the consensus sequence region in green. Sry is shown in red.
Sry uses a conserved High-Mobility Group (HMG) domain to recognize its consensus DNA sequence, “TAAACAAT” . The HMG domain intercalates into the DNA minor groove and forms contacts with the bases. This type of minor groove binding of sry results in a sharp bending of the DNA , which plays a crucial role in regulating the transcription of its target gene . Starting from the solution NMR structure of the sry-DNA complex (PDB entry 1J46 , see Fig 5A), we generated the CG model of sry using the GENESIS-CG-tool. We used the AICG2+ model (VAICG2+) for the whole protein. Specifically, we applied Gō-type native contact interactions (EGō) to the folded HMG domain (residue 9–70) and assigned only local flexible potentials (, and ) to the N-terminal (residue 1–8) and C-terminal tails (residue 71–85). We also employed the RESPAC method  to calculate the partial charge distribution on the surface residues of sry (see Figs 5B and S2). As for the DNA, we designed a 50bp poly-CG sequence, with the 8-bp piece “TAAACAAT” inserted at the center (Fig 5C and 5D). The atomistic structure of the dsDNA was then constructed from this 50-bp sequence using the 3DNA package . The 3SPN.2C CG topology and coordinate files were then generated from the atomistic structure, using the GENESIS-CG-tool. For sry to find its consensus sequence in the simulations, we utilize the PWMcos model (EPWMcos) to incorporate information from the position frequency matrix (PFM, downloaded from JASPAR database  profile MA0084.1 ). For details of the model parameters, please see S1 Text.
After getting the input files, we carried out MD simulations of the sry-DNA complex for 107 steps. In the initial structure, sry was placed around 70Å away from the DNA. Soon after the simulation started, sry was bound onto DNA and began the sliding. We monitored the binding site of sry on DNA (Fig 5E) and the bending of DNA (Fig 5F). As can be seen, sry is preferentially bound to the consensus sequence (represented by the green region in Fig 5E) and occasionally left this target with a quick diffusion on the DNA. As expected, we observed coupling between the sequence-specific recognition and DNA bending caused by the intercalation of the HMG domain. These results show that the CG models (AICG2+, 3SPN.2C, and PWMcos) for protein-DNA interactions have been correctly implemented in GENESIS and can be used to study similar biological systems.
Application 2: phase behaviors of IDR and RNA
The HPS and KH IDR models have been used to study the phase behavior of IDRs . Here, we used the IDR from protein Fused in Sarcoma (FUS)  to show that our implementation of these models in GENESIS can simulate the condensation of proteins. We first simulated a single chain of the FUS IDR (163aa, see S1 Text for the sequence). The final structure of the single-chain simulation was then duplicated to create a 120-chains system. We carefully arranged the multiple chains so that the system had a boundary size of 18nm×18nm×200nm, which is for the slab sampling method. We performed a 107-step simulation, as shown in Fig 6A and 6B. During the simulation, we monitored the density change of the CG particles along the z-axis (the longest dimension). As shown in the top panel of Fig 6A, at the beginning of the simulation, the FUS IDRs formed relatively small and low-density clusters. As simulation time increased, we observed several merge events of the condensates. After ~3.5×106 steps, all the FUS IDRs went into the same cluster, and the density finally reached around 5 particles per nm3 (Fig 6A and 6B). This simple example shows the possibility of using the HPS model in GENESIS to explore the physical mechanisms of IDR condensation. Besides, we noticed that there are some recent updates of the parameters of the HPS model [36,80]. These changes of parameters can be easily used with GENESIS by modifying the parameter files, with no need to touch the code. In this sense, GENESIS can also serve as a handy framework for the users to re-calibrate their own set of parameters.
(A) and (B) show the slab simulation results of 120 FUS IDRs. (A) Time series of FUS density along the long axis (z). The intensity of the blue color represents the density, as shown in the color bar above. (B) The first and last structure of the MD trajectory shown in (A). The FUS IDRs are shown as tubes. Different colors from green to blue are used to distinguish chains. (C), (D), and (E) show the simulation of the mixture of LAF1 IDRs and 15nt poly-A (A15) RNAs. (C) The last structure of a 5×106 steps slab simulation of 300 LAF1 IDRs and 100 A15 RNAs. LAF1 IDRs are in green or blue, whereas RNAs are in red or purple color. (D) The last structure of a 2×106 steps droplet simulation of 750 LAF1 IDRs and 250 A15 RNAs. The color scheme is the same as (C). (E) Radial distribution function of protein and RNA CG particles as a function of the distance from the center of mass of the LAF1 droplet (r). The structure of the condensation is shown as transparent background for reference.
In conjunction with the HPS model for protein IDR, the HPS model for RNA has also been recently developed . The HPS RNA model has a different resolution from the other CG models described above, with one CG particle per nucleotide. Nevertheless, we also implemented this model in GENESIS, preparing for possible applications in the study of biophysical condensations involving both protein and RNA. Here we built and simulated two systems of IDR from the protein LAF1 and 15-nt (nucleotide) poly-A RNA (hereafter called A15). The two systems consisted of 300 LAF1 together with 100 A15 (57,300 particles in total) and 750 LAF1 together with 250 A15 (143,250 particles), respectively. The slab method was used to simulate the first system for 5×106 steps, with the simulation box of 20nm×20nm×200nm (Fig 6C). The second system was simulated in a cubic box of size (100nm)3 for 2×106 steps (Fig 6D). As can be seen in Fig 6C and 6D, in both situations, the A15 RNA entered the condensation of LAF1. We also quantitatively analyzed the co-condensation of LAF1 and RNA by plotting the radial distribution function (RDF) of protein and RNA particles as a function of the distance from the center of the LAF1 droplet (Fig 6E). There is a higher density of RNA inside the LAF1 condensation than in bulk (Fig 6E). These results are consistent with the previous simulation results .
Application 3: virus capsid
We also explored the ability of our GENESIS CG implementation to simulate large-scale biological systems, such as a virus capsid. Here we reported our modeling and simulation of the herpes simplex virus (HSV) capsid, whose high-resolution structure has been recently solved with the cryo-electron microscopy (cryo-EM) (PDB entry: 5ZAP) . The capsid comprises about 4000 proteins assembled into 12 pentons (pentameric blocks) and 150 hexons (hexameric blocks), and 320 triplexes gluing all the subunits . The number of atoms and the length scale exceed the PDB format’s upper limit for such a vast structure. Therefore, the structural information of the HSV capsid was recorded in a file with the mmCIF format. We used the GENESIS-CG-tool to parse this file and generate the CG structure, which consists of 1,687,980 CG particles in total (Fig 7A). The AICG2+ potentials were used to model the whole structure. After generating the necessary input files, we then simulated the structure for 2×106 steps. We tracked the radius-of-gyration (Rg), the root-mean-square-deviation (RMSD), and the native-ness (Q, see S1 Text for definition) of the whole structure (shown in Fig 7B). As can be seen, during our simulation, the structure was stably maintained. Although here we only show a short equilibrium simulation of the HSV capsid, to our knowledge, it is one of the largest biological systems simulated with the residue-level CG models. We expect to run CG simulations of similar systems using GENESIS to probe the conformational changes and dynamic assembly or disassembly of the capsid.
(A) Structure of the coarse-grained HSV capsid, based on PDB entry 5ZAP. The pentameric blocks (pentons) are colored in blue, the hexameric blocks (hexons) are colored in green (VP5) and cyan (VP26). All the other proteins that glue together the pentons and hexons are colored in white. (B) Time series of the radius of gyration (Rg), RMSD, and nativeness (Q) of the whole structure during a 2×106-step simulation.
Application 4: chromatin
Eukaryotic chromatin is the architecture that stores genetic information by packaging DNA into supercoiled structures around the histone octamer proteins. In addition to the function of genome organization, chromatin also hosts many biochemical reactions around the information flow between DNA and RNA. Due to its extraordinary importance, chromatin and related biological phenomena have been widely studied by MD simulations at different length scales and models at different resolutions [4,82–84]. In particular, residue-level CG models, including those we implemented here (VAICG2+, V3SPN.2C, EPWMcos, and EHB), have shown their abilities to decipher the dynamics of the nucleosome, which is the building block and basic unit of chromatin [30,31,61,62,85]. Here we tried to expand these residue-level CG MD simulations to a larger length scale. Instead of using a natural genomic sequence and constructing a Hi-C experiment-based structure, we chose poly-CG as the DNA sequence and built an artificial chromatin structure by connecting single nucleosome structures (based on PDB 1KX5)  with linear linker DNAs. The constructed structure contains 1024 nucleosomes linked by a 219,213-bp dsDNA (see Fig 8A and 8B). More details of the structure modeling can be found in S1 Text.
(A) Structure of the whole artificial chromatin. DNA is colored in red, and histones are colored in cyan (for the folded domains) and green (tails). (B) A zoom-in structure of two adjacent nucleosomes as a part of the artificial chromatin. (C) Time series of the radius-of-gyration (Rg) of the N-terminal tails of the histones (top) and the number of wrapped DNA base-pairs on histone (bottom). The solid lines represent the average values, and the shadowed regions show the standard errors.
Similar to previous studies [30,61], we used VAICG2+ for histones, V3SPN.2C for DNA, and EHB for hydrogen bonds between histone and DNA phosphates. In addition, we applied VHPS to the flexible histone tails (see S1 Text for definition). We then performed 5×105-step MD simulations of this system at a temperature of 300K and 150mM ionic concentration. The time series of Rg of the four types of N-terminal histone tails and the number of wrapped DNA base-pairs are shown in Fig 8C. These results were based on the statistics over the 1024 nucleosomes. This example illustrates the possibility of using our GENESIS CG implementation to carry out studies on the large-scale chromatin dynamics with accurate modeling of DNA and both well-folded and intrinsically disordered parts of proteins.
In this paper, we have implemented residue-level CG models of biomolecules in GENESIS MD software and have applied them in CG MD simulations of various biomolecules by combining different CG models. The combination of the AICG2+ model for proteins with the 3SPN.2C model for DNA and the PWMcos potential for protein-DNA sequence-specific interactions has been applied in the previous studies . Whereas the combination of the HPS model for IDR and AICG2+ for folded protein is tested, for the first time, in this new framework. We also noticed several recently published CG models at an equivalent or similar resolution as ours, such as the TIS RNA model  and the iSoLF lipid model . We will consider incorporating these models into GENESIS to cover a broader range of target biomolecules. In the applications presented here, we used the default interaction parameters in GENESIS. For more practical applications, such as genome-scale chromatin folding or phase-separated membrane-less condensations, one may need to more carefully optimize interaction terms between the IDR and the other components and their parameters. If necessary, the parameters should be calibrated carefully with experimental information as references. The GENESIS CG framework provides a convenient environment for these attempts.
Similar to the current work, other groups have also released combinatorial implementations of CG methods, such as the OpenAWSEM-Open3SPN2 within the OpenMM framework , the 3SPN.2C packages in the LAMMPS framework [38,39], and the CafeMol package . Particularly, the OpenMM package utilized GPU calculations to gain better performances . Compared with its implementation, our development aims to cover more models and provide a more flexible and feasible environment for the studies of larger-scale biomolecular systems. CafeMol has been successfully applied for CG simulations in various biomolecular studies [26,29,30,62,87,88]. However, it is not designed for the high-performance computing of larger-scale systems. We have optimized our code in GENESIS to enable high-performance simulations of systems consisting of millions of particles on regular computers (Figs 4, 7 and 8). There is an upper limit of particle number with the atomic decomposition scheme in atdyn, which essentially limits our the applicability with the current code to huge biological systems. The domain-decomposition strategy and the efficient load balancer suitable to residue-level CG models are required to solve this problem. These two will be implemented in a new MD engine in GENESIS. However, it is worth noting that there are technical difficulties in developing the domain-decomposition scheme and load balancer for CG simulations. The most widely appreciated problem is the inhomogeneous particle distribution in the residue-level CG systems due to the implicit treatment of the solvent. In a conventional uniform spatial decomposition, the non-uniform particle density causes severe load imbalance among different domains. Therefore, special decomposition algorithms should be considered . Another direction is to consider explicit solvent models [13,90], which intrinsically solve the load-balance problem, but introduce significantly more computations on the solvent particles. Besides, the large cutoff value of the non-bonded interactions also sets a lower limit of the target system size. We assume at least two domains in each dimension to apply domain decomposition. Each domain should have at least the size equal to or larger than the longest cutoff (in our implementation, 52Å of the electrostatic term). This requirement essentially limits the applicability of the program to relatively small systems. In developing the new CG MD engine in GENESIS, we will have to solve these problems.
Although the CG models have boosted the computational efficiency to be orders of magnitudes faster than the all-atom force fields , the speed-up still looks insufficient when studying systems of the organellar or cellular scales. It is necessary to employ enhanced sampling methods, many of which [91–94] have also been implemented in GENESIS. Taking this advantage, we will combine CG MD simulations with some of the enhanced sampling methods to obtain more statistically reliable data with reasonable computational times and to enable free-energy analysis in the organellar or cellular scales.
In parallel to the advances in computational biophysics, mesoscopic pictures of biomolecules in the organellar or cellular environments have been better described using the latest experimental techniques, such as cryo-electron microscopy (cryo-EM) , cryo-electron tomography (cryo-ET) , high-speed atomic force microscopy (AFM) , and in-cell NMR . Integrating structural information by these experiments with computational biomolecular dynamics is one of the most important tasks in this research field. Recently, there have been several attempts [99–101] to make more realistic structural models of sub-cellular systems by utilizing available experimental results as much as possible. We hope that the implementation of residue-level CG models can provide more realistic structural dynamics of heterogeneous biological systems together with recent experimental information and modeling strategies. For this purpose, multi-scale and multi-resolution simulations are helpful in the trade-off between modeling accuracy and computational efficiency, if we can switch the resolutions of target biological systems easily from this residue-level CG models to other CG models with different approximations (MARTINI , SPICA , PACE , etc.) or all-atom force field models (AMBER , CHARMM , GROMOS , etc.). In the GENESIS software, most of the above models are already available, and some of them are being implemented by us. Mapping from the atomistic structure to CG is deterministic and easy, although information such as amino-acid backbone dihedral angles (φ and ψ) and hydrogen bonds are largely lost. The other direction, namely, reconstructing atomistic structure from CG coordinates, is usually much more difficult . In our current implementation, the structure-based models, such as AICG2+ , can uphold the secondary, tertiary, or even higher-order structures in the native structure. Additionally, by using the angle-dependent non-local potentials, the 3SPN.2C  and the PWMcos  models can partly preserve the base-stacking or hydrogen-bonding interactions in the DNA and protein-DNA interface, respectively. Structures sampled with these CG models can be remapped back to reasonable atomistic structures, provided the reconstruction methods [107,108] are good enough. In the near future, we may be able to investigate the inside of the cell at various resolutions, as Google Earth can do visualizations of both countries and local towns.
In summary, we have implemented several residue-level CG models in the GENESIS MD software, including the AICG2+ model for protein, the 3SPN.2C model for DNA, the PWMcos model for protein-DNA interactions, and the HPS/KH model for IDR and RNA. The MD input data for these CG models are given in a GROMACS-style file format by an easy-to-use toolbox, GENESIS-CG-tool. The computational performance of CG MD simulations is optimized to regular computers by preparing the nonbonded interaction lists suitable for the CG potential functions, vectorizing do-loop structures in the time-consuming subroutines, parallelizing the generation of random numbers in Langevin dynamics, and so on. With these attempts, GENESIS CG MD simulation is applied to large biological systems containing millions of CG particles efficiently on regular computers. The models and programs implemented here would be valuable for investigating heterogeneous biological phenomena involving folded/disordered proteins, RNAs, and DNAs at high concentrations. Implementations for larger systems containing billion CG particles and the use of enhanced sampling in the systems will follow the current implementation in the GENESIS software.
Availability and future directions
The source code and user manual of GENESIS v1.7.1 is available at https://www.r-ccs.riken.jp/labs/cbrt/ website, and GENESIS-CG-tool is both included as a part of GENESIS v1.7.1 and deployed at https://github.com/noinil/genesis_cg_tool. Tutorials of performing CG simulations with GENESIS v1.7.1 are open on https://www.r-ccs.riken.jp/labs/cbrt/tutorials2019/ website. All the MD simulation files and data to produce the results are available from https://github.com/RikenSugitaLab/cg-development-GENESIS-1.7.0. We plan to use GENESIS v1.7.1 to study large-scale biological phenomina such as LLPS of protein and nucleic acids. We also plan to implement more CG models of biomolecules into GENESIS.
S1 Text. Supplementary methods of molecular dynamics simulations.
S1 Fig. Examples of using GENESIS-CG-tool to prepare CG topology and coordinate files.
(A) Generate CG files (PRO1_cg.gro for coordinates; PRO1_cg.top and PRO1_cg.itp for topology) for a protein, based on its atomistic PDB file (PRO1.pdb). (B) Generate CG files (DNA1_cg.gro for coordinates; DNA1_cg.top and DNA1_cg.itp for topology) as well as an atomistic coordinate file (DNA1.pdb) for a 20bp DNA, from its DNA sequence (DNA1.fasta).
S2 Fig. Charge distribution on the surface residues of protein sry.
The charges of the surface residues of the HMG domain (residue 9–70, dark blue) of sry was determined with the RESPAC method. Whereas the charges on the N-tail and C-tail were the original integer charges (green). The same data are used to plot the CG structure of sry (Fig 5B).
The authors thank Ai Shinobu and Giovanni B Brandani for testing the developments of CG models in GENESIS v1.7.0.
- 1. Jerkovic´ I, Cavalli G. Understanding 3D genome organization by multidisciplinary methods. Nat Rev Mol Cell Biol. 2021; 22: 511–528. pmid:33953379
- 2. Yoshizawa T, Nozawa R-S, Jia TZ, Saio T, Mori E. Biological phase separation: cell biology meets biophysics. Biophys Rev. 2020;12: 519–539. pmid:32189162
- 3. Di Pierro M, Zhang B, Aiden EL, Wolynes PG, Onuchic JN. Transferable model for chromosome architecture. Proc Natl Acad Sci. 2016;113: 12168–12173. pmid:27688758
- 4. Nuebler J, Fudenberg G, Imakaev M, Abdennur N, Mirny LA. Chromatin organization by an interplay of loop extrusion and compartmental segregation. Proc Natl Acad Sci. 2018;115: E6697–E6706. pmid:29967174
- 5. Borgia A, Borgia MB, Bugge K, Kissling VM, Heidarsson PO, Fernandes CB, et al. Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018;555: 61–66. pmid:29466338
- 6. Martin EW, Holehouse AS, Peran I, Farag M, Incicco JJ, Bremer A, et al. Valence and patterning of aromatic residues determine the phase behavior of prion-like domains. Science (80-). 2020;367: 694–699. pmid:32029630
- 7. Conicella AE, Dignon GL, Zerze GH, Schmidt HB, D’Ordine AM, Kim YC, et al. TDP-43 α-helical structure tunes liquid–liquid phase separation and function. Proc Natl Acad Sci. 2020;117: 5883–5894. pmid:32132204
- 8. Saunders MG, Voth GA. Coarse-Graining Methods for Computational Biology. Annu Rev Biophys. 2013;42: 73–93. pmid:23451897
- 9. Kmiecik S, Gront D, Kolinski M, Wieteska L, Dawid AE, Kolinski A. Coarse-Grained Protein Models and Their Applications. Chem Rev. 2016;116: 7898–7936. pmid:27333362
- 10. Gopal SM, Mukherjee S, Cheng Y-M, Feig M. PRIMO/PRIMONA: A coarse-grained model for proteins and nucleic acids that preserves near-atomistic accuracy. Proteins Struct Funct Bioinforma. 2010;78: 1266–1281. pmid:19967787
- 11. Sterpone F, Melchionna S, Tuffery P, Pasquali S, Mousseau N, Cragnolini T, et al. The OPEP protein model: from single molecules, amyloid formation, crowding and hydrodynamics to DNA/RNA systems. Chem Soc Rev. 2014;43: 4871–4893. pmid:24759934
- 12. Davtyan A, Schafer NP, Zheng W, Clementi C, Wolynes PG, Papoian GA. AWSEM-MD: Protein Structure Prediction Using Coarse-Grained Physical Potentials and Bioinformatically Based Local Structure Biasing. J Phys Chem B. 2012;116: 8494–8503. pmid:22545654
- 13. Marrink SJ, Risselada HJ, Yefimov S, Tieleman DP, de Vries AH. The MARTINI Force Field: Coarse Grained Model for Biomolecular Simulations. J Phys Chem B. 2007;111: 7812–7824. pmid:17569554
- 14. Clementi C, Nymeyer H, Onuchic JN. Topological and energetic factors: what determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? an investigation for small globular proteins. J Mol Biol. 2000;298: 937–953. pmid:10801360
- 15. Go N. Theoretical Studies of Protein Folding. Annu Rev Biophys Bioeng. 1983;12: 183–210. pmid:6347038
- 16. Takada S. Gō model revisited. Biophys Physicobiology. 2019;16: 248–255. pmid:31984178
- 17. Arnarez C, Uusitalo JJ, Masman MF, Ingólfsson HI, de Jong DH, Melo MN, et al. Dry Martini, a Coarse-Grained Force Field for Lipid Membrane Simulations with Implicit Solvent. J Chem Theory Comput. 2015;11: 260–275. pmid:26574224
- 18. Ermak DL, McCammon JA. Brownian dynamics with hydrodynamic interactions. J Chem Phys. 1978;69: 1352–1360.
- 19. Ando T, Skolnick J. Sliding of Proteins Non-specifically Bound to DNA: Brownian Dynamics Studies with Coarse-Grained Protein and DNA Models. Clore GM, editor. PLoS Comput Biol. 2014;10: e1003990. pmid:25504215
- 20. Sterpone F, Derreumaux P, Melchionna S. Protein Simulations in Fluids: Coupling the OPEP Coarse-Grained Force Field with Hydrodynamics. J Chem Theory Comput. 2015;11: 1843–1853. pmid:26574390
- 21. Brandner A F., Timr S, Melchionna S, Derreumaux P, Baaden M, Sterpone F. Modelling lipid systems in fluid with Lattice Boltzmann Molecular Dynamics simulations and hydrodynamics. Sci Rep. 2019;9: 16450. pmid:31712588
- 22. Takada S, Kanada R, Tan C, Terakawa T, Li W, Kenzaki H. Modeling Structural Dynamics of Biomolecular Complexes by Coarse-Grained Molecular Simulations. Acc Chem Res. 2015;48: 3026–3035. pmid:26575522
- 23. Nguyen HT, Hori N, Thirumalai D. Theory and simulations for RNA folding in mixtures of monovalent and divalent cations. Proc Natl Acad Sci. 2019;116: 21022–21030. pmid:31570624
- 24. Hinckley DM, Lequieu JP, de Pablo JJ. Coarse-grained modeling of DNA oligomer hybridization: Length, sequence, and salt effects. J Chem Phys. 2014;141: 035102. pmid:25053341
- 25. Kubo S, Niina T, Takada S. Molecular dynamics simulation of proton-transfer coupled rotations in ATP synthase FO motor. Sci Rep. 2020;10: 8225. pmid:32427921
- 26. Li W, Wang W, Takada S. Energy landscape views for interplays among folding, binding, and allostery of calmodulin domains. Proc Natl Acad Sci. 2014;111: 10550–10555. pmid:25002491
- 27. Hori N, Takada S. Coarse-Grained Structure-Based Model for RNA-Protein Complexes Developed by Fluctuation Matching. J Chem Theory Comput. 2012;8: 3384–3394. pmid:26605744
- 28. Freeman GS, Hinckley DM, Lequieu JP, Whitmer JK, de Pablo JJ. Coarse-grained modeling of DNA curvature. J Chem Phys. 2014;141: 165103. pmid:25362344
- 29. Tan C, Terakawa T, Takada S. Dynamic Coupling among Protein Binding, Sliding, and DNA Bending Revealed by Molecular Dynamics. J Am Chem Soc. 2016;138: 8512–8522. pmid:27309278
- 30. Brandani GB, Niina T, Tan C, Takada S. DNA sliding in nucleosomes via twist defect propagation revealed by molecular simulations. Nucleic Acids Res. 2018;46: 2788–2801. pmid:29506273
- 31. Lequieu J, Schwartz DC, de Pablo JJ. In silico evidence for sequence-dependent nucleosome sliding. Proc Natl Acad Sci. 2017;114: E9197–E9205. pmid:29078285
- 32. Tan C, Takada S. Dynamic and Structural Modeling of the Specificity in Protein–DNA Interactions Guided by Binding Assay and Structure Data. J Chem Theory Comput. 2018;14: 3877–3889. pmid:29806939
- 33. Kapcha LH, Rossky PJ. A Simple Atomic-Level Hydrophobicity Scale Reveals Protein Interfacial Structure. J Mol Biol. 2014;426: 484–498. pmid:24120937
- 34. Kim YC, Hummer G. Coarse-grained Models for Simulations of Multiprotein Complexes: Application to Ubiquitin Binding. J Mol Biol. 2008;375: 1416–1433. pmid:18083189
- 35. Dignon GL, Zheng W, Kim YC, Best RB, Mittal J. Sequence determinants of protein phase behavior from a coarse-grained model. Ofran Y, editor. PLOS Comput Biol. 2018;14: e1005941. pmid:29364893
- 36. Dannenhoffer-Lafage T, Best RB. A Data-Driven Hydrophobicity Scale for Predicting Liquid–Liquid Phase Separation of Proteins. J Phys Chem B. 2021;125: 4046–4056. pmid:33876938
- 37. Regy RM, Dignon GL, Zheng W, Kim YC, Mittal J. Sequence dependent phase separation of protein-polynucleotide mixtures elucidated using molecular simulations. Nucleic Acids Res. 2020;48: 12593–12603. pmid:33264400
- 38. Plimpton S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J Comput Phys. 1995;117: 1–19.
- 39. Hinckley DM, Freeman GS, Whitmer JK, de Pablo JJ. An experimentally-informed coarse-grained 3-site-per-nucleotide model of DNA: Structure, thermodynamics, and dynamics of hybridization. J Chem Phys. 2013;139: 144903. pmid:24116642
- 40. Noel JK, Levi M, Raghunathan M, Lammert H, Hayes RL, Onuchic JN, et al. SMOG 2: A Versatile Software Package for Generating Structure-Based Models. Prlic A, editor. PLOS Comput Biol. 2016;12: e1004794. pmid:26963394
- 41. Pronk S, Páll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics. 2013;29: 845–854. pmid:23407358
- 42. Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, et al. Scalable molecular dynamics with NAMD. J Comput Chem. 2005;26: 1781–1802. pmid:16222654
- 43. Lu W, Bueno C, Schafer NP, Moller J, Jin S, Chen X, et al. OpenAWSEM with Open3SPN2: A fast, flexible, and accessible framework for large-scale coarse-grained biomolecular simulations. Schneidman-Duhovny D, editor. PLOS Comput Biol. 2021;17: e1008308. pmid:33577557
- 44. Kenzaki H, Koga N, Hori N, Kanada R, Li W, Okazaki K, et al. CafeMol: A Coarse-Grained Biomolecular Simulator for Simulating Proteins at Work. J Chem Theory Comput. 2011;7: 1979–1989. pmid:26596457
- 45. Li P, Banjade S, Cheng H-C, Kim S, Chen B, Guo L, et al. Phase transitions in the assembly of multivalent signalling proteins. Nature. 2012;483: 336–340. pmid:22398450
- 46. Borcherds W, Bremer A, Borgia MB, Mittag T. How do intrinsically disordered protein regions encode a driving force for liquid–liquid phase separation? Curr Opin Struct Biol. 2021;67: 41–50. pmid:33069007
- 47. Roden C, Gladfelter AS. RNA contributions to the form and function of biomolecular condensates. Nat Rev Mol Cell Biol. 2021;22: 183–195. pmid:32632317
- 48. Turner AL, Watson M, Wilkins OG, Cato L, Travers A, Thomas JO, et al. Highly disordered histone H1−DNA model complexes and their condensates. Proc Natl Acad Sci. 2018;115: 11964–11969. pmid:30301810
- 49. Banani SF, Lee HO, Hyman AA, Rosen MK. Biomolecular condensates: organizers of cellular biochemistry. Nat Rev Mol Cell Biol. 2017;18: 285–298. pmid:28225081
- 50. Jung J, Mori T, Kobayashi C, Matsunaga Y, Yoda T, Feig M, et al. GENESIS: a hybrid-parallel and multi-scale molecular dynamics simulator with enhanced sampling algorithms for biomolecular and cellular simulations. Wiley Interdiscip Rev Comput Mol Sci. 2015;5: 310–323. pmid:26753008
- 51. Kobayashi C, Jung J, Matsunaga Y, Mori T, Ando T, Tamura K, et al. GENESIS 1.1: A hybrid-parallel molecular dynamics simulator with enhanced sampling algorithms on multiple computational platforms. J Comput Chem. 2017;38: 2193–2206. pmid:28718930
- 52. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. 2017;59: 65–98.
- 53. Terakawa T, Takada S. Multiscale Ensemble Modeling of Intrinsically Disordered Proteins: p53 N-Terminal Domain. Biophys J. 2011;101: 1450–1458. pmid:21943426
- 54. Tan C, Jung J, Kobayashi C, Sugita Y. A singularity-free torsion angle potential for coarse-grained molecular dynamics simulations. J Chem Phys. 2020;153: 044110. pmid:32752657
- 55. Catenaccio A, Daruich Y, Magallanes C. Temperature dependence of the permittivity of water. Chem Phys Lett. 2003;367: 669–671.
- 56. Stogryn A. Equations for Calculating the Dielectric Constant of Saline Water. IEEE Trans Microw Theory Tech. 1971;19: 733–736.
- 57. Ashbaugh HS, Hatch HW. Natively Unfolded Protein Stability as a Coil-to-Globule Transition in Charge/Hydropathy Space. J Am Chem Soc. 2008;130: 9536–9542. pmid:18576630
- 58. Miyazawa S, Jernigan RL. Residue–Residue Potentials with a Favorable Contact Pair Term and an Unfavorable High Packing Density Term, for Simulation and Threading. J Mol Biol. 1996;256: 623–644. pmid:8604144
- 59. Chu J-W, Voth GA. Coarse-Grained Modeling of the Actin Filament Derived from Atomistic-Scale Simulations. Biophys J. 2006;90: 1572–1582. pmid:16361345
- 60. Terakawa T, Takada S. RESPAC: Method to Determine Partial Charges in Coarse-Grained Protein Model and Its Application to DNA-Binding Proteins. J Chem Theory Comput. 2014;10: 711–721. pmid:26580048
- 61. Niina T, Brandani GB, Tan C, Takada S. Sequence-dependent nucleosome sliding in rotation-coupled and uncoupled modes revealed by molecular simulations. Panchenko ARR, editor. PLOS Comput Biol. 2017;13: e1005880. pmid:29194442
- 62. Tan C, Takada S. Nucleosome allostery in pioneer transcription factor binding. Proc Natl Acad Sci. 2020;117: 20586–20596. pmid:32778600
- 63. Humphrey W, Dalke A, Schulten K. VMD: Visual molecular dynamics. J Mol Graph. 1996;14: 33–38. pmid:8744570
- 64. Schrödinger L. The PyMOL Molecular Graphics System, Version 1.8. 2015 Nov.
- 65. Lu X-J, Olson WK. 3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures. Nat Protoc. 2008;3: 1213–1227. pmid:18600227
- 66. Adams PD, Afonine P V., Baskaran K, Berman HM, Berrisford J, Bricogne G, et al. Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallogr Sect D Struct Biol. 2019;75: 451–454. pmid:30988261
- 67. Jung J, Mori T, Sugita Y. Midpoint cell method for hybrid (MPI+OpenMP) parallelization of molecular dynamics simulations. J Comput Chem. 2014;35: 1064–1072. pmid:24659253
- 68. Karanicolas J, Brooks CL. The origins of asymmetry in the folding transition states of protein L and protein G. Protein Sci. 2009;11: 2351–2361. pmid:12237457
- 69. Kobayashi C, Matsunaga Y, Koike R, Ota M, Sugita Y. Domain Motion Enhanced (DoME) Model for Efficient Conformational Sampling of Multidomain Proteins. J Phys Chem B. 2015;119: 14584–14593. pmid:26536148
- 70. Best RB, Chen Y-G, Hummer G. Slow Protein Conformational Dynamics from Multiple Experimental Structures: The Helix/Sheet Transition of Arc Repressor. Structure. 2005;13: 1755–1763. pmid:16338404
- 71. Westover KD, Bushnell DA, Kornberg RD. Structural Basis of Transcription. Cell. 2004;119: 481–489. pmid:15537538
- 72. Grant RA, Filman DJ, Finkel SE, Kolter R, Hogle JM. The crystal structure of Dps, a ferritin homolog that binds and protects DNA. Nat Struct Biol. 1998;5: 294–303. pmid:9546221
- 73. Davey CA, Sargent DF, Luger K, Maeder AW, Richmond TJ. Solvent Mediated Interactions in the Structure of the Nucleosome Core Particle at 1.9Å Resolution. J Mol Biol. 2002;319: 1097–1113. pmid:12079350
- 74. Sekido R, Lovell-Badge R. Sex determination involves synergistic action of SRY and SF1 on a specific Sox9 enhancer. Nature. 2008;453: 930–934. pmid:18454134
- 75. Harley VR, Lovell-Badge R, Goodfellow PN. Definition of a consensus DNA binding site for SRY. Nucleic Acids Res. 1994;22: 1500–1501. pmid:8190643
- 76. Murphy EC, Zhurkin VB, Louis JM, Cornilescu G, Clore GM. Structural Basis for SRY-dependent 46-X,Y Sex Reversal: Modulation of DNA Bending by a Naturally Occurring Point Mutation. J Mol Biol. 2001;312: 481–499. pmid:11563911
- 77. Pontiggia A, Rimini R, Harley VR, Goodfellow PN, Lovell-Badge R, Bianchi ME. Sex-reversing mutations affect the architecture of SRY-DNA complexes. EMBO J. 1994;13: 6115–6124. pmid:7813448
- 78. Fornes O, Castro-Mondragon JA, Khan A, van der Lee R, Zhang X, Richmond PA, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2019; 48: D87–D92. pmid:31701148
- 79. Patel A, Lee HO, Jawerth L, Maharana S, Jahnel M, Hein MY, et al. A Liquid-to-Solid Phase Transition of the ALS Protein FUS Accelerated by Disease Mutation. Cell. 2015;162: 1066–1077. pmid:26317470
- 80. Regy RM, Thompson J, Kim YC, Mittal J. Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins. Protein Sci. 2021;30: 1371–1379. pmid:33934416
- 81. Yuan S, Wang J, Zhu D, Wang N, Gao Q, Chen W, et al. Cryo-EM structure of a herpesvirus capsid at 3.1 Å. Science (80-). 2018;360: eaao7283. pmid:29622627
- 82. Öztürk MA, De M, Cojocaru V, Wade RC. Chromatosome Structure and Dynamics from Molecular Simulations. Annu Rev Phys Chem. 2020;71: 101–119. pmid:32017651
- 83. Wedemann G, Langowski J. Computer Simulation of the 30-Nanometer Chromatin Fiber. Biophys J. 2002;82: 2847–2859. pmid:12023209
- 84. Buckle A, Brackley CA, Boyle S, Marenduzzo D, Gilbert N. Polymer Simulations of Heteromorphic Chromatin Predict the 3D Folding of Complex Genomic Loci. Mol Cell. 2018;72: 786–797.e11. pmid:30344096
- 85. Takada S, Brandani GB, Tan C. Nucleosomes as allosteric scaffolds for genetic regulation. Curr Opin Struct Biol. 2020;62: 93–101. pmid:31901887
- 86. Ugarte La Torre D, Takada S. Coarse-grained implicit solvent lipid force field with a compatible resolution to the Cα protein representation. J Chem Phys. 2020;153: 205101. pmid:33261497
- 87. Terakawa T, Kenzaki H, Takada S. p53 Searches on DNA by Rotation-Uncoupled Sliding at C-Terminal Tails and Restricted Hopping of Core Domains. J Am Chem Soc. 2012;134: 14555–14562. pmid:22880817
- 88. Dai L, Xu Y, Du Z, Su X, Yu J. Revealing atomic-scale molecular diffusion of a plant-transcription factor WRKY domain protein along DNA. Proc Natl Acad Sci. 2021;118: e2102621118. pmid:34074787
- 89. Grime JMA, Voth GA. Highly Scalable and Memory Efficient Ultra-Coarse-Grained Molecular Dynamics Simulations. J Chem Theory Comput. 2014;10: 423–431. pmid:26579921
- 90. Garaizar A, Espinosa JR. Salt dependent phase behavior of intrinsically disordered proteins from a coarse-grained model with explicit water and ions. J Chem Phys. 2021;155: 125103. pmid:34598583
- 91. Sugita Y, Okamoto Y. Replica-exchange molecular dynamics method for protein folding. Chem Phys Lett. 1999;314: 141–151.
- 92. Sugita Y, Kitao A, Okamoto Y. Multidimensional replica-exchange method for free-energy calculations. J Chem Phys. 2000;113: 6042–6051.
- 93. Miao Y, Feher VA, McCammon JA. Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation. J Chem Theory Comput. 2015;11: 3584–3595. pmid:26300708
- 94. Kamiya M, Sugita Y. Flexible selection of the solute region in replica exchange with solute tempering: Application to protein-folding simulations. J Chem Phys. 2018;149: 072304. pmid:30134668
- 95. Fernandez-Leiro R, Scheres SHW. Unravelling biological macromolecules with cryo-electron microscopy. Nature. 2016;537: 339–346. pmid:27629640
- 96. Hylton RK, Swulius MT. Challenges and triumphs in cryo-electron tomography. iScience. 2021;24: 102959. pmid:34466785
- 97. Uchihashi T, Ganser C. Recent advances in bioimaging with high-speed atomic force microscopy. Biophys Rev. 2020;12: 363–369. pmid:32172451
- 98. Luchinat E, Banci L. In-Cell NMR in Human Cells: Direct Protein Expression Allows Structural Studies of Protein Folding and Maturation. Acc Chem Res. 2018;51: 1550–1557. pmid:29869502
- 99. Johnson GT, Autin L, Al-Alusi M, Goodsell DS, Sanner MF, Olson AJ. cellPACK: a virtual mesoscope to model and visualize structural systems biology. Nat Methods. 2015;12: 85–91. pmid:25437435
- 100. Jewett AI, Stelter D, Lambert J, Saladi SM, Roscioni OM, Ricci M, et al. Moltemplate: A Tool for Coarse-Grained Modeling of Complex Biological Matter and Soft Condensed Matter Physics. J Mol Biol. 2021;433: 166841. pmid:33539886
- 101. Martínez L, Andrade R, Birgin EG, Martínez JM. PACKMOL: A package for building initial configurations for molecular dynamics simulations. J Comput Chem. 2009;30: 2157–2164. pmid:19229944
- 102. Seo S, Shinoda W. SPICA Force Field for Lipid Membranes: Domain Formation Induced by Cholesterol. J Chem Theory Comput. 2019;15: 762–774. pmid:30514078
- 103. Han W, Wan C-K, Jiang F, Wu Y-D. PACE Force Field for Protein Simulations. 1. Full Parameterization of Version 1 and Verification. J Chem Theory Comput. 2010;6: 3373–3389. pmid:26617092
- 104. Tian C, Kasavajhala K, Belfon KAA, Raguette L, Huang H, Migues AN, et al. ff19SB: Amino-Acid-Specific Protein Backbone Parameters Trained against Quantum Mechanics Energy Surfaces in Solution. J Chem Theory Comput. 2020;16: 528–552. pmid:31714766
- 105. Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods. 2017;14: 71–73. pmid:27819658
- 106. Oostenbrink C, Villa A, Mark AE, Van Gunsteren WF. A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6. J Comput Chem. 2004;25: 1656–1676. pmid:15264259
- 107. Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J. 2020;18: 162–176. pmid:31969975
- 108. Shimizu M, Takada S. Reconstruction of Atomistic Structures from Coarse-Grained Models for Protein–DNA Complexes. J Chem Theory Comput. 2018;14: 1682–1694. pmid:29397721