Skip to main content
Advertisement
  • Loading metrics

Cryo-EM ligand building using AlphaFold3-like model and molecular dynamics

  • Nandan Haloi,

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden

  • Rebecca J. Howard,

    Roles Project administration, Supervision, Writing – review & editing

    Affiliations Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden

  • Erik Lindahl

    Roles Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    erik.lindahl@scilifelab.se

    Affiliations Department of Applied Physics, Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden

Abstract

Resolving protein-ligand interactions in atomic detail is key to understanding how small molecules regulate macromolecular function. Although recent breakthroughs in cryogenic electron microscopy (cryo-EM) have enabled high-quality reconstruction of numerous complex biomolecules, the resolution of bound ligands is often relatively poor. Furthermore, methods for building and refining molecular models into cryo-EM maps have largely focused on proteins and may not be optimized for the diverse properties of small-molecule ligands. Here, we present an approach that integrates artificial intelligence (AI) with cryo-EM density-guided simulations to fit ligands into experimental maps. Using three inputs: 1) a protein amino acid sequence, 2) a ligand specification, and 3) an experimental cryo-EM map, we validated our approach on a set of biomedically relevant protein-ligand complexes including kinases, GPCRs, and solute transporters, none of which were present in the AI training data. In cases for which AI was not sufficient to predict experimental poses outright, integration of flexible fitting into molecular dynamics simulations improved ligand model-to-map cross-correlation relative to the deposited structure from 40-71% to 82-95%. This work offers a straightforward pipeline for integrating AI and density-guided simulations to model building in cryo-EM maps of ligand-protein complexes.

Author summary

Understanding how proteins interact with small molecules, such as drugs, has long been a central challenge in structural biology. Visualising these interactions at the atomic level can reveal critical details of protein function and empower drug design. Despite recent advances in imaging such as cryogenic electron microscopy (cryo-EM) and computational predictions using artificial intelligence (AI), many important protein-drug complexes remain difficult to decipher by either method in isolation. In this study, we show how combining AI-driven structure prediction with cryo-EM imaging data and molecular simulations can enable accurate modeling of protein-drug complexes with minimal system knowledge or structural biology expertise. Our proposed pipeline demonstrates the power of integrating experimental and computational methods to decode the complex language of molecular recognition, and holds promise for advancing both basic science and pharmaceutical innovation.

Introduction

Protein-ligand binding is of key importance to both biomolecular regulation and pharmaceutical activity. Resolving protein binding of a ligand at the atomic level, including its geometry, pose, and interactions with specific amino acid residues, can critically aid the characterization of endogenous modulators or design of novel drugs. Recent advances in single-particle cryogenic electron microscopy (cryo-EM) have enabled the determination of diverse macromolecular structures, including targets inaccessible by X-ray crystallography, and often with ligands bound [1]. However, even with high-quality data, the resolution of bound ligands is often lower than the surrounding protein, and may be too poor for definitive model building. For example, in a recently reconstructed cryo-EM map for β-galactosidase bound to the inhibitor phenylethyl β-D-thiogalactopyranoside, the protein was resolved to 1.5 Å, while the ligand densities were limited to 3–3.5 Å [2].

Several methods have been developed to refine molecular models in cryo-EM maps, broadly categorized as rigid-body fitting, flexible fitting, and de novo model building, as well as various combinations of these [36]. However, such approaches have primarily been applied to proteins, with less focus on bound small molecules. It may be particularly challenging to maintain the native intricacies of intermolecular ligand-protein interactions during computational modeling. Recent efforts to combine physics-based docking with relatively low-resolution ligand densities [79] may be limited by reliance on an initial protein structure, which must be determined by separate methods, and substantially restricted from further refinement even in the case of overlapping protein and ligand densities. Flexible fitting methods can in principle refine a ligand and the surrounding protein pocket simultaneously, but typically require a masked ligand-only density, hindering automation. For most automated methods, successful modeling depends on a reasonably accurate initial model for the protein-ligand complex.

Recent artificial intelligence (AI)-based methods, including AlphaFold3 and its open-weights analogs such as Chai-1, offer new approaches to predict structures of protein-ligand complexes based on amino-acid sequences and ligand specification [10,11]. Here we test the applicability of such a protocol, in combination with simulation-based flexible fitting where necessary, to support an semi-automated pipeline for modeling pharmaceutically relevant protein-ligand complexes into cryo-EM maps not represented in the AI training data (Fig 1). For ten biomedically relevant targets, including cytosolic kinases, membrane-bound receptors and secondary transporters, ligand models generated in Chai-1 fit the target cryo-EM density with at least 82% accuracy relative to the deposited structure, either directly or after density-guided simulations. These results demonstrate the utility of combining AI with flexible fitting to ligand building in a variety of pertinent systems.

thumbnail
Fig 1. An pipeline to modeling of ligand-protein complexes.

First, the protein sequence and ligand SMILES information were provided in Chai-1 to predict the protein-ligand complex structure. Then, rigid body alignment followed by molecular dynamics simulation-based flexible fitting were performed.

https://doi.org/10.1371/journal.pcbi.1013367.g001

Results

An pipeline to modeling of ligand-protein complexes

To test the integration of predictive AI with flexible fitting in building experimental structures of protein-ligand complexes, we first predicted five molecular models for each target using Chai-1 [11]. This open-weights model is based on comparable architecture and training strategies to those of AlphaFold3 [10], and has shown similar performance in predicting protein-ligand complexes [11]. For each target, we input the protein amino-acid sequence and a ligand specification using the simplified molecular input line-entry system (SMILES) (Fig 1, left). The predicted complexes were then rigid-body aligned with their target cryo-EM maps using ChimeraX (Fig 1, center) [12].

Next, we used density-guided molecular dynamics (MD) simulations in GROMACS [13] to fit the best Chai-1 model to the density (Fig 1, right). Briefly, in this step, we applied additional forces to atoms of the protein and ligand scaled by the gradient of similarity between a simulated density based on the initial model and the reference cryo-EM map. No additional restraints were applied during these simulations, in order to enable conformational adjustments to both the protein and ligand during refinement. During fitting, we monitored the model-to-map cross-correlations (CCs) as a metric to track the quality of the fit, protein-ligand interaction energy (PLIE) for favorable interactions without clashes, and the generalized orientation-dependent all-atom potential (GOAP) score [14] for protein geometry. Notably, PLIE does not correspond to free energy which is a non-trivial parameter to estimate from our simulations. To minimize technical challenges arising from inconsistent ligand nomenclature, we focused on CCs to validate our fitted versus experimental structures (ground truth). Because a ground truth would not be available in fitting original data, information from the experimental structure was not used during model building, fitting or quality assessment, but served as a reference for validation.

A test set of biomedically relevant protein-ligand complexes

We tested our approach on monomeric target complexes in the protein-ligand interactions dataset and evaluation resource (PLINDER) [15] (see Methods). Experimental structures for these test cases were reported after December 2021, such that they were not present in the Chai-1 training set. We filtered these for complexes containing native protein sequences under 1200 amino acids, along with pharmaceutically relevant small molecules (quantitative estimation of drug-likeness (QED) 0.7 [16]) within 4 Å of protein atoms. The resulting test set consisted of ten protein-ligand complexes of 400–1200 amino acids resolved to 2.7–3.7 Å, including two cytosolic kinases, two G protein-coupled receptors (GPCRs) and six secondary transporters (Table 1 and S1 Fig), and ligands with molecular weights from 218 to 442 g/mol (16-30 heavy atoms), 0-4 rotatable bonds, and 3-10 hydrogen bond donor/acceptor atoms (Table 2 and S2 Fig).

thumbnail
Table 2. Properties of the ligands used in this study, extracted from PubChem [37].

https://doi.org/10.1371/journal.pcbi.1013367.t002

The kinases included 1) leucine-rich repeat kinase 2 (LRRK2), a multifunctional enzyme (1194 residues) and driver of heritable Parkinson’s disease [27]; and 2) a variant of phosphoinositide 3-kinase α (PI3Kα), a lipid kinase of similar size (1096 residues) but distinct fold, mediating cellular growth signals in human cancers [28] (Table 1 and S1 Fig). The target complex with LRRK2 included MLi-2, an inhibitor with high affinity and selectivity [17]; the complex with PI3Kα included BYL-719 (alpelisib), a Food and Drug Administration (FDA) approved compound for the treatment of solid tumors [29] (Table 2 and S2 Fig).

The membrane-bound receptors included 1) the histamine H1 receptor (H1R), a GPCR (438 residues) that is responsible for allergic and inflammatory symptoms [30]; and 2) the type-3 hydroxycarboxylic acid (HCA3) receptor, a smaller GPCR (387 residues) with only 13% identity to H1R, also involved in inflammation as well as neuroprotection [31] (Table 1 and S1 Fig). The target H1R complex included desloratadine, a second-generation antihistamine [19]; the complex with HCA3 included acifran, a drug capable of lowering plasma low-density lipoprotein concentrations [20] (Table 2 and S2 Fig).

The secondary transporters included 1) the sodium- and chloride-dependent glycine transporter 1 (GlyT1), a member of the solute carrier-6 (SLC6) family (652 residues) regulating both inhibitory and excitatory neurotransmission, and a target for the treatment of schizophrenia [21]; 2) the high-affinity choline transporter 1 (CHT1), part of the SLC5 family (580 residues) mediating reuptake of synaptic choline following neurotransmitter hydrolysis and involved in conditions such as depression and anxiety [32]; 3) the thiamine transporter 2 (ThTr2), a member of the SLC19 family (496 residues) that takes up dietary vitamins and is involved in diseases such as Wernicke’s encephalopathy [33]; 4) the norepinephrine transporter (NET), another SLC6 member (617 residues) which plays an essential role in the central nervous system and is a therapeutic target for emotional and cognitive disorders [34]; 5) the organic cation transporter 3 (OCT3), a member of the SLC22 family (556 residues) responsible for cellular uptake of cationic drugs in various tissues [35]; and 6) the organic anion transporting polypeptide 1B1 (OATP1B1), another SLC22 family member (691 residues) that transports a wide range of amphipathic organic anions, including clinical drugs [36] (Table 1 and S1 Fig). In these test cases, GlyT1 was bound to its inhibitor SR504734 [21]; CHT1 to its inhibitor hemicholinium-3 [22]; ThTr2 to its substrate thiamine [23]; NET to its inhibitor bupropion, an FDA-approved antidepressant [24]; OCT3 to its inhibitor corticosterone [25]; and OATP1B1 to its endogenous metabolite estrone sulfate [26] (Table 2 and S2 Fig).

Imprecise prediction resolved by flexible fitting

For three of our test cases, including both kinases (LRRK2+MLi-2, PI3Kα+alpelisib) and one GPCR (H1R+desloratadine), standard Chai-1 prediction followed by model-to-map rigid-body alignment generated ligand poses with at least 0.6 CC and -6 kJ/mol PLIE (Table 3). These poses corresponded to at least 90% ligand accuracy, as measured by CC relative to that of the ground-truth structure (see Methods) (Figs 2 and S3). Indeed, although we report accuracy only for the best of five generated models, all predictions in these cases appeared largely consistent, deviating less than 0.1 in raw CC (S4, S5 and S6 Figs). Protein structures were also similar to the ground truth, predicted with at least 88% accuracy for pocket residues within 4 Å of the ligand, and with at least 83% accuracy for the entire protein (S3 and S7 Figs). For these cases, although AI modeling appeared sufficient to approximate the target complexes as previously reported for other systems [11], we still performed flexible fitting and maintained a similar level of accuracy.

thumbnail
Fig 2. Accuracy of AI models before and after flexible fitting in 10 test systems.

Accuracy for ligand (left), pocket residues (center) and protein residues (right) based on model-to-map CC relative to that of the corresponding deposited structure, for the best initial AI prediction (yellow) and after simulation-based fitting to the experimental map, when applied (blue). For fitted models, columns represent mean accuracy and standard error over the final 20 frames of the simulation.

https://doi.org/10.1371/journal.pcbi.1013367.g002

thumbnail
Table 3. Protein-ligand interaction energies (PLIE) and model-to-map CCs for the best Chai-I prediction (Pred.), final complex from flexible fitting (Fitted) and ground truth based on the deposited experimental structure (GT) for each test system. PLIE values are scaled by number of ligand heavy atoms.

https://doi.org/10.1371/journal.pcbi.1013367.t003

For the remaining seven test cases, ligand poses were predicted with less than 0.6 CC, corresponding to less than 71% accuracy and suggesting a need for further optimization. In three of these cases, including one GPCR (HCA3+acifran) and two transporters (GlyT1+SSR504734, CHT1+hemicholinium-3), standard Chai-1 generation followed by rigid-body alignment produced some of the least accurate ligand fits in this work (0.2-0.35 CC, or 40-60% accuracy) (Table 3, Figs 2, 3, S8, S9 and S10). For HCA3, the ligand only partly overlapped the target density in 1 of 5 generated models. However, overall protein conformation was predicted in these cases with at least 86 % accuracy (Table 3, Figs 2, S11, S12 and S13), including outward-facing states for both transporters (overall root mean squared deviation [RMSD], based on Cα atoms, within 1.5 Å from their respective experimental structures [21,22]), indicating that only the ligand fit needed substantial improvement. Indeed, density-guided MD simulations of each of these complexes for 2 ns improved ligand CC from 0.2-0.35 to 0.6-0.7 and accuracy from 40-60% to 85-95%. Accuracy also improved for the protein models, increasing from 63-86% to 84-96% for residues in the binding pockets, and 86-87% to 90-99% for the proteins overall. During our simulations, PLIE remained below -6 kJ/mol (Fig 3). Similarly, GOAP scores largely remained below ground-truth values, indicating that geometric plausibility was retained (S14 Fig).

thumbnail
Fig 3. Imprecise prediction resolved by flexible fitting.

(Left) Initial Chai-1 prediction of the binding pocket for the type-3 +hydroxycarboxylic acid receptor+acifran (A), glycine transporter+SSR504734 (B), and choline transporter+hemicholinium-3 (C) systems. Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. (Middle) Final frame of the density-fitting simulations (blue) along with the experimental structure (green). (Right, top) Time trace of the protein-ligand interaction energy is shown. (Right, below) Time trace of the cross-correlations of the ligand (yellow), pocket residues (silver), and the entire protein (black) during the density-fitting simulations. The vertical lines (with same color coding) represent values calculated using the experimental structure. Chai-1 prediction of all the 5 possible binding sites can be found in S9, S8 and S10 Figs.

https://doi.org/10.1371/journal.pcbi.1013367.g003

Flexible fitting improves protein as well as ligand conformations

In three additional test cases, including the transporters ThTr2+thiamine, NET+bupropion, and OCT3+corticosterone, the protein conformations as well as ligand poses were predicted with relatively low accuracy (71-75% and 50-68%, respectively) (Table 3, Figs 2, 4, S15, S16 and S17). Visual inspection indicated they deviated from the functional states assigned to the deposited structures, a particular challenge for targets like SLCs whose biological function involves conformational cycling [38]. Poor correspondence did not appear to correlate with the functional state or fold of the target density: ThTr2 and NET were resolved in inward-facing states [23,24], while OCT3 was outward-facing [25], similar to GlyT1 and CHT1 in the previous section. ThTr2 and OCT3 are in the major facilitator superfamily, while NET as well as GlyT1 and CHT1 are in the LeuT family. Still, all best-fit models characterized in this work deviated less than 4 Å overall from the ground-truth structures deposited for their corresponding targets, such that density-guided simulations could be expected to accomplish reasonably accurate refinement [6]. Indeed, density-guided simulations improved accuracy of the overall protein models to 93-94%, and the ligands to 82-90% (Table 3 and Figs 2, 4), indicating that protein conformational changes as well as ligand fitting can be accommodated by this flexible fitting protocol. PLIE remained below -6 kJ/mol for all the three cases, indicating favorable packing between the ligand and protein.

thumbnail
Fig 4. Flexible fitting improves protein as well as ligand conformations.

(Left) Initial Chai-1 prediction of the binding pocket for the thiamine transporter 2+thiamine (A), norepinephrine transporter+bupropion (B), and organic cation transporter+corticosterone (C) systems. Same coloring scheme used as in Fig 3. Alignment of the entire protein in addition to binding pockets are shown. (Middle) Final frame of the density-fitting simulations (blue) along with the experimental structure (green). (Right, top) Time trace of the protein-ligand interaction energy is shown. (Right, bottom) Time trace of the cross-correlations of the ligand (yellow), pocket residues (silver), and the entire protein (black) during the density-fitting simulations. The vertical lines (with same color coding) represent values calculated using the experimental structure. Chai-1 prediction of all the 5 possible binding sites can be found in S15, S16 and S17 Figs.

https://doi.org/10.1371/journal.pcbi.1013367.g004

Challenging ligand poses improved by extended sampling in Chai-1

For the OATP1B1+estrone sulfate complex, standard Chai-1 prediction produced a best-fit model in the apparent outward-facing state (RMSD of 1 Å from the experimental target structure [26]) with 90% accuracy for the binding pocket, and 92% accuracy for the protein overall (Table 3 and Figs 2, 5A and 5B). The ligand was modeled with 67% accuracy (Table 3, Fig 2), and improved to 77% after density-guided simulations. However, PLIE values above −1 kJ/mol during simulation suggested the ligand was not well coordinated in the binding pocket. Indeed, visual inspection indicated that even the best-fit ligand was flipped relative to the ground truth, a type of discrepancy that may not be easily corrected by flexible fitting (Fig 5B). To improve the initial model, we generated an additional 20 predictions in Chai-1. The best resulting model included a ligand pose that was 71% accurate, and oriented similar to the experimental structure; ligand accuracy improved to 92% after density-guided simulations (Table 3 and Figs 2, 5C, 5D). PLIE for this model was around -10 kJ/mol during the entire simulation. Thus, extended sampling in Chai-1 may usefully improve initial models for challenging ligand poses, although they were only required in one of our ten test cases.

thumbnail
Fig 5. Challenging ligand poses improved by extended sampling in Chai-1.

(A) Chai-1 prediction of 5 possible binding sites of estrone sulfate in a solute carrier organic anion transporting polypeptide, without pocket information. Same coloring scheme used as in Fig 3. (B) One representative Chai-1 prediction of the binding pocket for complex, with the greatest cross-correlation for the ligand. Final frame of the density-fitting simulations (blue) along with the experimental structure (green). (C) Time trace of the protein-ligand interaction energy, cross-correlations of the ligand, pocket residues, and the entire protein during the density-fitting simulations are shown. The simulation automatically terminated before reaching 2 ns, possibly due to the high force from the density. (D) Chai-1 prediction of 25 possible binding sites of the same complex. Same coloring scheme used as in Fig 3. (E) One representative Chai-1 prediction of the binding pocket for complex, with the greatest cross-correlation for the ligand. (F) Simulation time traces are shown as in panel C.

https://doi.org/10.1371/journal.pcbi.1013367.g005

Discussion

Here, we showcase how AI models can complement MD simulations to accurately fit small molecules into cryo-EM maps, limiting user effort and bias during modeling. When benchmarked against ten biomedically relevant entries from the EMDB, our approach fits ligands with 82-95% accuracy relative to deposited structures. The resulting models would constitute appropriate templates for final manual or automated refinement, or in some cases for direct analysis and deposition.

Our approach addresses several challenges in automated structure determination of protein-ligand complexes [9,3941]. First, the flexible fitting protocol implemented here enables refinement of the protein alongside the ligand; the protein structure does not need to be accurately built in advance to achieve an accurate model of the complex. Ligand model-to-map cross-correlation relative to the deposited structure (our accuracy measurement) improved from 40–71% to 82–95%. For pocket and protein, accuracy improved from 63-97% to 84–98% and 71–89% to 84–99%, respectively. Here, we focused on ligand model-to-map cross-correlation relative to the ground-truth deposited structure as a metric for accuracy. However, some challenging cases also benefited from monitoring PLIE as a check for chemical plausibility, for instance when an initial prediction is flipped relative to its optimal pose. In the case of OATP1B1+estrone sulfate, initial modeling was associated with reasonable cross-correlations, but unfavorable PLIE scores (Fig 5C). Increasing AI-based sampling produced an initial configuration more similar to the optimal orientation, enabling fitting with accuracy comparable to other cases (Table 3 and Fig 2)

Second, the AI step requires only the amino acid sequence and SMILES specification as inputs, avoiding any need for structural templates. The initial models thus generated also do not require rebuilding of unresolved atoms, residues, loops, or other regions, as is often necessary when preparing experimental structures for MD simulations. Third, our method does not require knowledge of the ligand-binding location or coordinating residues, a nontrivial inference when studying novel complexes. Independent predictions for the same input complexes can vary, so we used model-to-map cross-correlations for the ligand as a metric to select the likely-best candidate ligand-receptor model for further refinement. It is plausible that this approach may also be applicable to simultaneously model multiple ligands, including ions and cofactors, as allowed by both AlphaFold3-like models and density-guided simulations [10,11].

As noted in Fig 4, AI methods might not predict the relevant conformational state of a protein-ligand complex. This may be a particular concern for proteins such as transporters that undergo conformational changes between functional states. The SLCs investigated in this study operate by various alternating-access mechanisms, cycling between inward- and outward-facing states. Proteins in the LeuT superfamily such as GlyT1, CHT1, and NET generally employ gated-pore mechanisms, while members of the major facilitator superfamily such as ThTr2, OCT3, and OATP1B1 are associated with rocker-switch mechanisms respectively [38]. Possibly due to the diversity of relevant structure for each of these transporters, AI methods can fail to predict a given state. Nonetheless, conformational differences in all cases studied here could be accommodated with reasonable accuracy by flexible fitting. We previously reported that transporters for which functional states deviate more than 4 Å benefit from the generation and clustering of an ensemble of initial models [42], an approach that could also prove valuable in more challenging cases of ligand fitting. As AI predictions continue to improve, the success of flexible fitting is likely to extend to a broader range of systems.

A potential limitation of our approach is its performance with proteins that lack evolutionary data or have highly flexible structures. Previous studies have shown that AlphaFold2 can struggle to predict new protein conformations in such scenarios [43]. This challenge underscores the importance of the density-guided molecular dynamics step in our approach, as it enables a good fit even when none of the generated models closely matches the target conformation. Incorporating previously described multi-step approaches, such as iterative density-guided simulations with progressively increasing resolution or enhanced sampling techniques [39], could further improve our pipeline performance in more challenging cases. Another limitation of our approach is that forcefields for small molecules in classical MD simulations may not be highly accurate. However, given the relatively high force arising from the cryo-EM density potential, dependence on classical forcefields should be minimized.

We acknowledge that the success of our strategy currently relies on the quality of initial conformations predicted in Chai-1. The generation of starting structures reasonably close to the optimal configuration—especially in terms of protein architecture and ligand pose—is a critical factor in the effectiveness of our density-guided refinement pipeline. As noted in prior literature [5,6], flexible fitting methods are typically most successful when the starting model is within a local basin of the correct conformation. Major errors in the input model can limit the accuracy of the final structure. While our results underscore the value of Chai-1 in generating such high-quality initial predictions, we recognize that the current approach could be less robust in case of severe inaccuracies in the starting model. Nonetheless, our framework establishes a promising direction for integrating AI-based prediction with physics-based refinement. Future efforts aimed at improving robustness against poor initial guesses, such as incorporating enhanced sampling strategies or iterative model rebuilding, could extend the utility of this method, even in more challenging modeling scenarios.

Methods

Data curation

To curate test cases, we used the recently released protein-ligand interaction structural database called PLINDER [15]. Using PLINDER’s parquet file format, we extracted recently released cryo-EM monomeric protein ligand complexes (from January, 2022- June, 2024) using the keywords “entry_determination_method”, “entry_release_date” and “entry_oligomeric_state”. The date cutoff ensures that our test cases were not in the training set of Chai-1 (cutoff date: Dec, 1st, 2021) [11]. We further narrowed our search contains non-covalent drug-like small molecules (with quantitative estimation of drug-likeness [QED] value 0.7 [16]) using the keywords “ligand_is_covalent”, “ligand_qed” and “ligand_is_lipinski”. This resulted into a dataset of 23 test cases, which was narrowed down to 10 after removing proteins with unnatural chimeric constructions, nonconclusive ligand binding sites (judged based on no atoms being closer than 4 Å in the binding pocket), homologous pairs, proteins with more than 1200 amino acids.

Chai-1 predictions

We used the Chai-1 webserver lab.chaidiscovery.com to predict our protein-ligand complexes, where the protein amino acid sequence and ligand SMILE string were used as input [11]. The webserver by default outputs 5 predictions. Predictions made independently for the same input complexes may differ, so we used ligand model-to-map cross-correlations as a criterion to identify the most accurate complex for further refinement. For the case of OATP1B1+Estrone sulfate complex, 20 additional predictions were generated. For cross-correlation calculations, we rigid-body fitted the Chai-1 predicted model into the map using the “fitmap” command in ChimeraX [12].

System preparations

Each protein-ligand complex predicted by Chai-1 was prepared for molecular dynamics (MD) simulations using the CHARMM-GUI [44] webserver. Each system was solvated with TIP3P water [45] and neutralizing in 0.15 M KCl to generate systems containing in the range from 90,000 to 210,000 atoms. It was shown previously that the membrane does not substantially improve density guided simulation results [46], so we did not include membrane building step into the pipeline for simplicity. The systems were energy minimized and then relaxed in simulations at constant pressure (1 bar) and temperature (310K) for 1 ns, during which the position restraints on the protein and ligands were gradually released. The restraints were used as recommended by CHARMM-GUI. Specifically, we minimized the energy for 5000 steps with backbone and side chains restraints of 400 and 40 kJ mol−1 nm−2, respectively. Then, equilibrations were performed for around 1 ns with the same backbone and side chain restraints.

Density-guided simulations

Density-guided MD simulations in this study were performed using GROMACS-2024 [13] utilizing CHARMM36m [47] and CHARMM General Force Field (CGenFF) [48] force field parameters for proteins and ligands, respectively. Bonded and short-range nonbonded interactions were calculated every 2 fs, and periodic boundary conditions were employed in all three dimensions. The particle mesh Ewald (PME) method [49] was used to calculate long-range electrostatic interactions with a grid spacing below 0.1 nm−3. A force-based smoothing function was employed for pairwise non-bonded interactions at 1 nm with a cutoff of 1.2 nm. Pairs of atoms whose interactions were evaluated were searched and updated every 20 steps. A cutoff of 1.2 nm was applied to search for the interacting atom pairs. Constant pressure was maintained at 1 bar using the c-rescale barostat [50] and temperature was kept at 310K with the v-rescale thermostat [51]. Forces from density-guided simulations were applied every N = 2 steps. The scaling factor for density-guided simulation forces of 103 kJ/mol was combined with adaptive force scaling.

Analysis

Cross-correlations to the target map during simulations were calculated using the “mdff check -ccc” command in Visual Molecular Dynamics (VMD) [52]. Protein-ligand interaction energies were calculated using the “gmx energy” feature in GROMACS [13]. System visualizations were carried out by ChimeraX [12]. Structure quality check during MD simulations were done using the generalised orientation-dependent all-atom potential (GOAP) score matrix, where low values indicate better structure [14]. Model accuracy is calculate using the following formulae:

Supplementary data

S1 Data. Raw values for each figure in the main text (Figs 15).

https://doi.org/10.1371/journal.pcbi.1013367.s001

(XLSX)

Supporting information

S1 Fig. Experimental structures of the proteins tested in this study.

PDB IDs are described in brackets after the names of each protein. Red sphere shows the ligands bound to the proteins.

https://doi.org/10.1371/journal.pcbi.1013367.s002

(TIFF)

S2 Fig. Ligand overview used in this study.

https://doi.org/10.1371/journal.pcbi.1013367.s003

(TIFF)

S3 Fig. Accurate prediction of protein-ligand complexes.

(Left) Initial Chai-1 prediction of the binding pocket for the, leucine-rich repeat kinase 2+MLi-2 (A), phosphoinositide 3-kinase α+alpelisib (B), and histamine H1 receptor+desloratadine (C) systems. Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. (Middle) Predicted structure (yellow) along with the experimental structure (green). Alignments were done using the entire protein in ChimeraX Matchmaker module. Chai-1 prediction of all the 5 possible binding sites can be found in S4, S5 and S6 Figs.

https://doi.org/10.1371/journal.pcbi.1013367.s004

(TIFF)

S4 Fig. Chai-1 prediction of 5 possible binding sites of MLi-2 in a leucine-rich repeat kinase 2.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s005

(TIFF)

S5 Fig. Chai-1 prediction of 5 possible binding sites of alpelisib in a phosphoinositide 3-kinase . Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s006

(TIFF)

S6 Fig. Chai-1 prediction of 5 possible binding sites of desloratadine in a histamine H1 receptor.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s007

(TIFF)

S7 Fig. Chai-1 prediction of 3 complexes, shown in (S3 Fig), focusing on the entire protein:leucine-rich repeat kinase 2+MLi-2 (A), phosphoinositide 3-kinase +alpelisib (B), and histamine H1 receptor+desloratadine (C) systems.

Predicted structure (yellow) along with the experimental structure (green) are shown on the right.

https://doi.org/10.1371/journal.pcbi.1013367.s008

(TIFF)

S8 Fig. Chai-1 prediction of 5 possible binding sites of acifran in a type-3 hydroxycarboxylic acid receptor.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s009

(TIFF)

S9 Fig. Chai-1 prediction of 5 possible binding sites of SSR504734 in a glycine transporter.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s010

(TIFF)

S10 Fig. Chai-1 prediction of 5 possible binding sites of hemicholinium-3 in a choline transporter.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s011

(TIFF)

S11 Fig. Initial Chai-1 prediction (yellow), final frame of the density-fitting simulations (blue) along with the experimental structure (green) of the type-3 hydroxycarboxylic acid receptor+acifran complex.

https://doi.org/10.1371/journal.pcbi.1013367.s012

(TIFF)

S12 Fig. Initial Chai-1 prediction (yellow), final frame of the density-fitting simulations (blue) along with the experimental structure (green) of the glycine transporter 1+SSR504734 complex.

https://doi.org/10.1371/journal.pcbi.1013367.s013

(TIFF)

S13 Fig. Initial Chai-1 prediction (yellow), final frame of the density-fitting simulations (blue) along with the experimental structure (green) of the choline transporter + hemicholinium-3 complex.

https://doi.org/10.1371/journal.pcbi.1013367.s014

(TIFF)

S14 Fig. GOAP score calculation of the entire protein during our simulation.

GOAP score for the experimental structure is depicted in the horizontal line. Lower the GOAP score, better the structure quality of the protein.

https://doi.org/10.1371/journal.pcbi.1013367.s015

(TIFF)

S15 Fig. Chai-1 prediction of 5 possible binding sites of thiamine in a thiamine transporter 2.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s016

(TIFF)

S16 Fig. Chai-1 prediction of 5 possible binding sites of bupropion in a norepinephrine transporter.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s017

(TIFF)

S17 Fig. Chai-1 prediction of 5 possible binding sites of corticosterone in an organic cation transporter.

Cryo-EM densities for the ligand (yellow) and pocket protein residues (silver) are shown in transparent. Cross-correlation values for the ligands are shown in the table below.

https://doi.org/10.1371/journal.pcbi.1013367.s018

(TIFF)

Acknowledgments

We thank Dr. Marta Bonaccorsi, Dr. Stavros Azinas and the Molecular Biophysics Stockholm environment for valuable feedback and discussion. Computational resources were provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS 2024/3-49). AI runs were performed using the Berzelius resource funded by the Knut and Alice Wallenberg Foundation (project no. Berzelius-2024-384).

References

  1. 1. Kim JJ, Gharpure A, Teng J, Zhuang Y, Howard RJ, Zhu S, et al. Shared structural mechanisms of general anaesthetics and benzodiazepines. Nature. 2020;585(7824):303–8. pmid:32879488
  2. 2. Bartesaghi A, Aguerrebere C, Falconieri V, Banerjee S, Earl LA, Zhu X, et al. Atomic resolution cryo-em structure of β-galactosidase. Structure. 2018;26(6):848-856.e3. pmid:29754826
  3. 3. Terashi G, Wang X, Prasad D, Nakamura T, Kihara D. DeepMainmast: integrated protocol of protein structure modeling for cryo-EM with deep learning and structure prediction. Nat Methods. 2024;21(1):122–31. pmid:38066344
  4. 4. Chen S, Zhang S, Fang X, Lin L, Zhao H, Yang Y. Protein complex structure modeling by cross-modal alignment between cryo-EM maps and protein sequences. Protein Sci. 2024;15(1):8808.
  5. 5. Igaev M, Kutzner C, Bock LV, Vaiana AC, Grubmüller H. Automated cryo-EM structure refinement using correlation-driven molecular dynamics. Elife. 2019;8:e43542. pmid:30829573
  6. 6. Blau C, Yvonnesdotter L, Lindahl E. Gentle and fast all-atom model refinement to cryo-EM densities via a maximum likelihood approach. PLoS Comput Biol. 2023;19(7):e1011255. pmid:37523411
  7. 7. Robertson MJ, van Zundert GCP, Borrelli K, Skiniotis G. GemSpot: a pipeline for robust modeling of ligands into Cryo-EM maps. Structure. 2020;28(6):707-716.e3. pmid:32413291
  8. 8. Muenks A, Zepeda S, Zhou G, Veesler D, DiMaio F. Automatic and accurate ligand structure determination guided by cryo-electron microscopy maps. Nat Commun. 2023;14(1):1164. pmid:36859493
  9. 9. Vant JW, Lahey S-LJ, Jana K, Shekhar M, Sarkar D, Munk BH, et al. Flexible fitting of small molecules into electron microscopy maps using molecular dynamics simulations with neural network potentials. J Chem Inf Model. 2020;60(5):2591–604. pmid:32207947
  10. 10. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024:1–3.
  11. 11. Discovery C, Boitreaud J, Dent J, McPartlon M, Meier J, Reis V, et al. Chai-1: decoding the molecular interactions of life. bioRxiv. 2024.
  12. 12. Meng EC, Goddard TD, Pettersen EF, Couch GS, Pearson ZJ, Morris JH, et al. UCSF ChimeraX: tools for structure building and analysis. Protein Science. 2023;32(11):4792.
  13. 13. P´all S, Zhmurov A, Bauer P, Abraham M, Lundborg M, Gray A, et al. Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS. The Journal of Chemical Physics. 2020;153(13):134110.
  14. 14. Zhou H, Skolnick J. GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Journal of Computational Chemistry. 2023;101(8):2043–52.
  15. 15. Durairaj J, Adeshina Y, Cao Z, Zhang X, Oleinikovas V, Duignan T, et al. PLINDER: the protein-ligand interactions dataset and evaluation resource. bioRxiv. 2024.
  16. 16. Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL. Quantifying the chemical beauty of drugs. Nat Chem. 2012;4(2):90–8. pmid:22270643
  17. 17. Sanz Murillo M, Villagran Suarez A, Dederer V, Chatterjee D, Alegrio Louro J, Knapp S, et al. Inhibition of Parkinson’s disease-related LRRK2 by type I and type II kinase inhibitors: Activity and structures. Sci Adv. 2023;9(48):eadk6191. pmid:38039358
  18. 18. Liu X, Zhou Q, Hart JR, Xu Y, Yang S, Yang D, et al. Cryo-EM structures of cancer-specific helical and kinase domain mutations of PI3Kα. Proc Natl Acad Sci U S A. 2022;119(46):e2215621119. pmid:36343266
  19. 19. Wang D, Guo Q, Wu Z, Li M, He B, Du Y, et al. Molecular mechanism of antihistamines recognition and regulation of the histamine H1 receptor. Nat Commun. 2024;15(1):84. pmid:38167898
  20. 20. Suzuki S, Tanaka K, Nishikawa K, Suzuki H, Oshima A, Fujiyoshi Y. Structural basis of hydroxycarboxylic acid receptor signaling mechanisms through ligand binding. Nat Commun. 2023;14(1):5899. pmid:37736747
  21. 21. Wei Y, Li R, Meng Y, Hu T, Zhao J, Gao Y, et al. Transport mechanism and pharmacology of the human GlyT1. Cell. 2024;187(7):1719-1732.e14. pmid:38513663
  22. 22. Qiu Y, Gao Y, Huang B, Bai Q, Zhao Y. Transport mechanism of presynaptic high-affinity choline uptake by CHT1. Nature Structural & Molecular Biology. 2024;:1–9.
  23. 23. Dang Y, Zhang T, Pidathala S, Wang G, Wang Y, Chen N, et al. Substrate and drug recognition mechanisms of SLC19A3. Cell Res. 2024;34(6):458–61. pmid:38503960
  24. 24. Tan J, Xiao Y, Kong F, Zhang X, Xu H, Zhu A, et al. Molecular basis of human noradrenaline transporter reuptake and inhibition. Nature. 2024;632(8026):921–9. pmid:39048818
  25. 25. Khanppnavar B, Maier J, Herborg F, Gradisch R, Lazzarin E, Luethi D, et al. Structural basis of organic cation transporter-3 inhibition. Nature Communications. 2022;13(1):6714.
  26. 26. Shan Z, Yang X, Liu H, Yuan Y, Xiao Y, Nan J, et al. Cryo-EM structures of human organic anion transporting polypeptide OATP1B1. Cell Res. 2023;33(12):940–51. pmid:37674011
  27. 27. Fruman DA, Chiu H, Hopkins BD, Bagrodia S, Cantley LC, Abraham RT. The PI3K pathway in human disease. Cell. 2017;170(4):605–35.
  28. 28. Samuels Y, Wang Z, Bardelli A, Silliman N, Ptak J, Szabo S, et al. High frequency of mutations of the PIK3CA gene in human cancers. Science. 2004;304(5670):554. pmid:15016963
  29. 29. McPhail JA, Burke JE. Drugging the phosphoinositide 3-kinase (PI3K) and phosphatidylinositol 4-kinase (PI4K) family of enzymes for treatment of cancer, immune disorders, and viral/parasitic infections. Druggable Lipid Signaling Pathways. 2020. p. 203–22.
  30. 30. Moriguchi T, Takai J. Histamine and histidine decarboxylase: immunomodulatory functions and regulatory mechanisms. Genes to Cells. 2020;25(7):443–9.
  31. 31. Chen H, Assmann JC, Krenz A, Rahman M, Grimm M, Karsten CM, et al. Hydroxycarboxylic acid receptor 2 mediates dimethyl fumarate’s protective effect in EAE. The Journal of Clinical Investigation. 2014;124(5):2188–92.
  32. 32. Mineur YS, Obayemi A, Wigestrand MB, Fote GM, Calarco CA, Li AM, et al. Cholinergic signaling in the hippocampus regulates social stress resilience and anxiety- and depression-like behavior. Proc Natl Acad Sci U S A. 2013;110(9):3573–8. pmid:23401542
  33. 33. Yamashiro T, Yasujima T, Said HM, Yuasa H. pH-dependent pyridoxine transport by SLC19A2 and SLC19A3: Implications for absorption in acidic microclimates. J Biol Chem. 2020;295(50):16998–7008. pmid:33008889
  34. 34. Bönisch H, Brüss M. The norepinephrine transporter in physiology and disease. 2006.
  35. 35. Koepsell H. Organic cation transporters in health and disease. Pharmacol Rev. 2020;72(1):253–319. pmid:31852803
  36. 36. Hagenbuch B, Meier PJ. The superfamily of organic anion transporting polypeptides. Biochim Biophys Acta. 2003;1609(1):1–18. pmid:12507753
  37. 37. Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, et al. PubChem 2025 update. Nucleic Acids Res. 2025;53(D1):D1516–25. pmid:39558165
  38. 38. Schlessinger A, Zatorski N, Hutchinson K, Colas C. Targeting SLC transporters: small molecules as modulators and therapeutic opportunities. Trends Biochem Sci. 2023;48(9):801–14. pmid:37355450
  39. 39. Singharoy A, Teo I, McGreevy R, Stone JE, Zhao J, Schulten K. Molecular dynamics-based refinement and validation for sub-5 Å cryo-electron microscopy maps. Elife. 2016;5:e16105. pmid:27383269
  40. 40. Kidmose RT, Juhl J, Nissen P, Boesen T, Karlsen JL, Pedersen BP. Namdinator - automatic molecular dynamics flexible fitting of structural models into cryo-EM and crystallography experimental maps. IUCrJ. 2019;6(Pt 4):526–31. pmid:31316797
  41. 41. Sweeney A, Mulvaney T, Maiorca M, Topf M. ChemEM: flexible docking of small molecules in cryo-EM structures. J Med Chem. 2024;67(1):199–212. pmid:38157562
  42. 42. Shugaeva T, Howard RJ, Haloi N, Lindahl E. Modeling cryo-EM structures in alternative states with generative AI and density-guided simulations. bioRxiv. 2025.
  43. 43. Chakravarty D, Schafer JW, Chen EA, Thole JF, Ronish LA, Lee M, Porter LL.: AlphaFold predictions of fold-switched conformations are driven by structure memorization. 2024;15(1):7296.
  44. 44. Jo S, Kim T, Iyer VG, Im W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J Comput Chem. 2008;29(11):1859–65. pmid:18351591
  45. 45. Jorgensen WL, Chandrasekhar J, Madura JD, Impey RW, Klein ML. Comparison of simple potential functions for simulating liquid water. The Journal of Chemical Physics. 1983;79(2):926–35.
  46. 46. Yvonnesdotter L, Rovˇsnik U, Blau C, Lycksell M, Howard RJ, Lindahl E. Automated simulation-based membrane protein refinement into cryo-EM data. Journal of Molecular Biology. 2023;122(13):2773–81.
  47. 47. Huang J, Rauscher S, Nawrocki G, Ran T, Feig M, de Groot BL, et al. CHARMM36m: an improved force field for folded and intrinsically disordered proteins. Nat Methods. 2017;14(1):71–3. pmid:27819658
  48. 48. Vanommeslaeghe K, Hatcher E, Acharya C, Kundu S, Zhong S, Shim J, et al. CHARMM general force field: a force field for drug-like molecules compatible with the CHARMM all-atom additive biological force fields. J Comput Chem. 2010;31(4):671–90. pmid:19575467
  49. 49. Darden T, York D, Pedersen L. Particle mesh Ewald: An Nlog(N) method for Ewald sums in large systems. The Journal of Chemical Physics. 1993;98(12):10089–92.
  50. 50. Bernetti M, Bussi G. Pressure control using stochastic cell rescaling. J Chem Phys. 2020;153(11):114107. pmid:32962386
  51. 51. Bussi G, Donadio D, Parrinello M. Canonical sampling through velocity rescaling. J Chem Phys. 2007;126(1):014101. pmid:17212484
  52. 52. Humphrey W, Dalke A, Schulten K. VMD: visual molecular dynamics. J Mol Graph. 1996;14(1):33–8, 27–8. pmid:8744570