PF2 fit: Polar Fast Fourier Matched Alignment of Atomistic Structures with 3D Electron Microscopy Maps

doi:10.1371/journal.pcbi.1004289

Fig 1.

Control flow.

A typical control flow of the 3D EM map fitting algorithm developed in this work. The first step of a fitting procedure is the inital exhaustive search. Here one needs to define suitable scoring functions that are amenable for fast correlation computation via the chosen search scheme. Here, we are using FFT-based algorithms for the fast computation of non-uniform rigid-body correlations. The scoring functions may account for various structural aspects such as scattering potential or pockets in the molecular surface. The exhaustive search is followed by an information driven reranking scheme which among others might include the mutual information score or skeleton-secondary structure score. The final output of the procedure will be the best fit between the atomic structure and the 3D EM map.

More »

Expand

Fig 2.

Schematic of representations used in our algorithms.

(A) PDB schematic, showing the target volume V_𝓟 and the complementary volume . (B) 3D EM map schematic, showing the target volume V_𝓜 and the complementary volume . Detailed definitions can be found in the Materials and Methods section.

More »

Expand

Fig 3.

Comparison of PF² fit with other software in synthesized EM fitting at 3Å.

A molecule is fitted into the synthetically generated EM map B with resolution 3Å(transparent green). The top-ranked result 𝓟₁ (red/yellow) is compared to the original PDB molecule 𝓟 (blue). (A) Top-ranked result using PF² fit —SE(3) with 8° uniform rotational sampling and 0.5Å translational step size. RMSD ≈ 0.88Å. (B) Top-ranked result using the Colores package; the ‘nopowell’ option is turned on. RMSD ≈ 3.2Å. (C) Top-ranked result using Colores with default options. RMSD ≈ 2.3Å. The fitted PDB 𝓟₁ is in yellow. (D) Top-ranked result using the ADP_EM package, with bandwidth L = 25. RMSD ≈ 0.94Å.

More »

Expand

Fig 4.

Comparison of PF² fit with other software in synthesized EM fitting at 10Å.

(A) The synthetically generated 3D EM map is a Gaussian blurred version of the PDB 7CAT (chains A and B), with resolution R = 10Å, and random noise added to obtain a signal-to-noise ratio of unity. The PDB 𝓟 (inset) is chain B of the same protein. The top-ranked result 𝓟₁ (red/yellow) is compared to the original PDB molecule 𝓟 (blue). (B) Top-ranked result using PF² fit —SE(3) with 8° uniform rotational sampling and 0.5Å translational step size has RMSD = 0.73Å. (C) Top-ranked result using Colores with default options has RMSD = 1.096Å. (D) Top-ranked result using the ADP_EM package, with bandwidth L = 25 has RMSD = 0.814Å.

More »

Expand

Table 1.

Average rank, rounded to the nearest integer, of best RMSD result returned by PF² fit —SE(3) in the initial search stage for synthetic maps at different resolutions.

The figure in brackets in the second and third columns denotes the rank in the presence of noise at SNR = 1. See the section on “Datasets” for a list of PDBs used in this experiment. Note that even if the rank of the best RMSD is lower for SCCS in some cases, the actual RMSDs are generally lower, cf. Figs 5 and 6.

More »

Expand

Fig 5.

Resolution robustness and comparison of scattering potential (SCCS) and Gaussian (GCCS) scores for synthesized data.

We plot the RMSD of the top-ranked result as a function of the resolution of the EM map used for the fit. See the section titled “Dataset” for a list of PDBs used in this experiment. (A) Average resolution-dependent RMSD of the top-ranked result returned by PF² fit —SE(3) in the absence and presence of noise for the GCCS and the SCCS. (B) Average Z-Score for the ten top results in the absence of noise. Z-Scores in the presence of noise follow the same trend.

More »

Expand

Fig 6.

Effect of complementary space scoring for synthesized data.

Using the complementary space scores from (A) Eq (8) and (B) Eq (9), with w_comp = 1, w_target = 1 we plot the RMSD as a function of the resolution of the EM map. See the section titled “Dataset” for a list of PDBs used in this experiment.

More »

Expand

Table 2.

Results of applying PF² fit —SE(3) on a selection of datasets from the cryoEM modeling challenge (Experiment 3) using both the GCCS and SCCS.

An error measure similar to ETR is provided as the number of residues excluded outside a given iso-contoured molecular surface. The SCCS yielded on average results that exclude 2–4 fewer residues than the GCCS.

More »

Expand

Fig 7.

Speed-accuracy trade-offs in PF² fit.

(A) The plot displays the average runtime (divided by 2000) using the GCCS scoring term, and the corresponding error (in RMSD) when PF² fit is applied on the synthesized EM dataset. Notice that the runtime increases linearly with the number of samples in SO(3), but the average error is quite steady between 0.4 to 0.5Å except for the case when only 2000 samples were used. We believe that such robustness stems from the low discrepancy of the sampling. (B) We compared the average speeds of PF² fit on the synthesized EM dataset with GCCS, SCCS and NCCS using the same expansion degree (L = 20). The plot shows that NCCS is faster than GCCS, specially when fewer samples are used. On the other hand SCCS is marginally slower (around 0.1%) than GCCS.

More »

Expand

Table 3.

Average rank of best RMSD result returned by PF² fit—SE(3) after reranking.

In the initial stage GCCS was used. The figures in brackets denote the rank in the presence of noise at SNR = 1. We see a strong decrease in rank for the skeleton-secondary structure score with and without noise while the mutual information score remains predictable across the range of resolutions. See the section on “Datasets” for a list of PDBs used to generate the synthetic maps used in this experiment. Note that even if the ranks of the best RMSD solution, on average across all experments, show no improvement over Table 1 (mostly because GCCS already does an excellent job of ranking them)- the ranks actually improved for several of the experiments (73/318 for MIS, and 5/318 for MIS). Please see Section ‘The performance of reranking increases with resolution’ for details.

More »

Expand

Fig 8.

Comparison of PF² fit with other software in subunit-assembly fitting.

Fitting the PDB molecule 𝓟 (1GC1) to the EM map 𝓜 of SIV 20Å(EMD5020), using the GCCS. Two different views of the molecules are given: (A) Results from PF² fit. The ETR is 0.03. (B) Results from colores with default options. The ETR is 0.1. (C) Results from ADP_EM with L = 25. The ETR is 0.08.

More »

Expand

Fig 9.

Fitting the PDB molecule 𝓟 (1AONb) to the GroEL 3D EM map 𝓜 (EMD 1461) at 7.7Å.

(A) Full 3D EM map 𝓜 with segmented subunit 𝓜_s (inset, top). The molecule 𝓟 is fitted into 𝓜_s using PF² fit (inset, bottom). (B) Initial guess for rigid-body fit into 𝓜. (C) PF² fit generates translational samples local to the initial guess to find the depicted correct result. Correctness is measured by deviation from the rigid-body fit in (A). The result has an RMSD of 0.3Å from the fitting result in (A) and is ranked at number four in a run of PF² fit with angular resolution of 10°.

More »

Expand

Fig 10.

Speed-accuracy trade-offs for NCCS.

NCCS is computed on a non-uniform grid based on the atom positions. If the grid is sparse, then it is expected that a lower degree expansion of the spherical basis functions would sufficiently represent it. We applied NCCS with the expansion degree (L) varied between 5 to 20, on the synthesized EM dataset (blurring to 12A resolution) and using 30k samples in SO(3) space. The plot shows that the error decreases and runtime increases with L. However, the change is runtime is more pronounced than the change in error, for example, the runtime is 35% faster for L = 5 while the error is only 5% more than that of L = 20.

More »

Expand