^{1}

^{2}

^{1}

^{2}

^{*}

Conceived and designed the experiments: DX YZ. Performed the experiments: DX. Analyzed the data: DX. Wrote the paper: DX YZ.

The authors have declared that no competing interests exist.

Macromolecular surfaces are fundamental representations of their three-dimensional geometric shape. Accurate calculation of protein surfaces is of critical importance in the protein structural and functional studies including ligand-protein docking and virtual screening. In contrast to analytical or parametric representation of macromolecular surfaces, triangulated mesh surfaces have been proved to be easy to describe, visualize and manipulate by computer programs. Here, we develop a new algorithm of EDTSurf for generating three major macromolecular surfaces of van der Waals surface, solvent-accessible surface and molecular surface, using the technique of fast Euclidean Distance Transform (EDT). The triangulated surfaces are constructed directly from volumetric solids by a Vertex-Connected Marching Cube algorithm that forms triangles from grid points. Compared to the analytical result, the relative error of the surface calculations by EDTSurf is <2–4% depending on the grid resolution, which is 1.5–4 times lower than the methods in the literature; and yet, the algorithm is faster and costs less computer memory than the comparative methods. The improvements in both accuracy and speed of the macromolecular surface determination should make EDTSurf a useful tool for the detailed study of protein docking and structure predictions. Both source code and the executable program of EDTSurf are freely available at

There are mainly three types of macromolecular surfaces—

A variety of methods have been proposed to compute the three macromolecular surfaces. These methods can be generally categorized into two classes: analytical computation and explicit representation. For analytical computing, Connolly first presented an algorithm for calculating the smooth solvent-excluded surface of a molecule

Although analytical methods have the advantage of getting accurate values of surface area and volume, they are not convenient to be employed in other applications when explicit surfaces of local atoms are required for further processing. For example, local surfaces of proteins and ligands are often used for shape comparison in the docking problem. The explicit surface generation method is a grid-based approximation which uses space-filling model where each atom is modeled as a volumetric item

After the space-filling procedure, an important step is surface representation and construction. In general, macromolecular surface could be represented by parametric equations or triangular patches. Parametric representations of protein molecular surfaces are a compact way to describe a surface, and are useful for the evaluation of surface properties such as the normal vector, principal curvatures, and principal curvature directions

A commonly used method to construct triangulated isosurface from 3D grid is the Marching Cube algorithm

In this paper, we develop a new method of EDTSurf for the calculation of the three major macromolecular surfaces. We demonstrate that all the macromolecular surfaces can be universally connected with the theory of

The definitions of the three surfaces are illustrated in

(A) van der Waals surface (blue); (B) solvent-accessible surface (red); (C) molecular surface which includes contact surface (green) and reentrant surface (pink).

The

The

However, in our study, we found only the Euclidean distance has a direct relation to the three macromolecular surfaces (see Eqs. 7–9 below). Therefore, our discussions will be focused on this distance.

The

Suppose the set of boundary points (or surface) of an object

Isosurface can be extracted conveniently after the EDT. The isosurface with isovalue

Obviously, if

Macromolecular solids are solid bodies which are enveloped by the macromolecular surfaces. The

For points on the solvent-accessible surface, we define a subset

Suppose the minimum van der Waals radius of all the

If the van der Waals radius of an atom

The above three equations stand for a kind of space-filling methods which are the preliminary steps for grid-based macromolecular surface generation.

After applying EDT to macromolecular solids as described above, the macromolecular surfaces can be treated as isosurface extracted from EDMs.

Equation (7) is elucidated in

(A) EDT with the minimal macromolecular surface (yellow) as the boundary. The isosurface with isovalue equaling the negative of the minimal van der Waals radius is the van der Waals surface (blue). (B) EDT with van der Waals surface (blue) as the boundary. The isosurface with isovalue equaling the negative of the probe radius is the solvent-accessible surface (red). (C) EDT with solvent-accessible surface (red) as the boundary. The isosurface with isovalue equaling the probe radius is the molecular surface which contains the surface (green) and reentrant surface (pink).

Suppose the van der Waals surface (the blue part of

Similarly, in Equation (9), we apply EDT to the solvent-accessible solid which is enveloped by the solvent-accessible surface (the red part of

There is another way to distinguish the contact surface from the reentrant surface without the pre-calculation of van der Waals surface, i.e.

We can record the nearest boundary point

In

(A) van der Waals surface; (B) molecular surface; (C) solvent-accessible surface.

Translate and scale the coordinates of all the atoms in the molecular structure in order to fit them in the bounding box. After scaling, the van der Waals radius

To construct the van der Waals surface, treat each atom of type

To get the van der Waals surface and the solvent-accessible surface, go to step IV directly. To get the molecular surface, do EDT to the volumetric model by using Equation (9). Get rid of the voxels whose Euclidean distances are less than

Use Vertex-Connected Marching Cube method to construct the triangulated surfaces from the volumetric models.

Scale and translate the generated surface back to the original size and position.

Since

In step III, the propagation stops when the Euclidean distance is larger than

After we get the three kinds of macromolecular solids, triangulation is needed to construct the ultimate macromolecular surfaces. We developed the dual of the traditional Marching Cube algorithm here, which is called Vertex-Connected Marching Cubes (VCMC). The difference between them is that the vertices of the triangles in the traditional Marching Cubes are surface-edge intersections while the vertices in the VCMC are the existing grid points. When the resolution of grid is very high, there is no additional cost for real-time construction and rendering of the triangular surface by VCMC. Furthermore, the triangulation result generated by VCMC contains fewer vertices and faces than that by MC.

For a unit cube which has eight vertices, there are totally

The number of cases for each pattern is also marked in

All the triangles formed in

Molecular structures in the RCSB Protein Data Bank (PDB) are mainly obtained by the techniques of X-ray crystallography and nuclear magnetic resonance spectroscopy.

(A) van der Waals surface (B) solvent-accessible surface (C) molecular surface.

Since the area of surface can be analytically calculated by MSMS ^{2}. The purpose of such a high vertex density is to make the numerical volume calculation of MSMS as close as possible to the exact value. When the MSMS program fails to generate output results with such a high vertex density, however, we set the vertex density to a lower value. As a control, we also run LSMS ^{2}. In ^{3}. The average relative errors of these algorithms are listed in

Left panel is the numerical surface areas and analytical surface area of 31 proteins; right panel is the corresponding numerical volumes enveloped by the molecular surfaces.

Method | Resolution | Average relative error of area | Average relative error of volume |

EDTSurf | 128^{3} |
3.96% | 1.18% |

256^{3} |
1.99% | 0.48% | |

LSMS | 128^{3} |
6.10% | 3.57% |

256^{3} |
7.87% | 0.84% | |

MSMS | 1.0 | 4.56% | 0.72% |

100.0 | 0.45% | ------- |

If the vertex density is very high, the difference between numerical and analytical surface calculations by MSMS is small, i.e. 0.45%. If we take the values of the analytical area and the high-accurate numerical volume by MSMS as the golden-standard, the relative errors for surface and volume are 3.96% and 1.18% for EDTSurf at the resolution

Except for the accuracy of surface, an important requirement of the surface calculation programs is the increase of speed and decrease of memory cost. The time spent on computing the molecular surface in EDTSurf is composed of three parts: generation of scaled solvent-accessible solid

For testing the computer cost, we apply our algorithm to 15 large protein molecules taken from the PDB, which have 27,375 to 97,872 atoms. This set of proteins has also been used by Can et al. to compare their algorithm LSMS with three other programs, including the MSMS, which is integrated in UCSF Chimera ^{2} here). Both the three algorithms run on a Microsoft Windows XP machine with Intel Pentium 4 Processor at 1.9GHZ and 768 MB of RAM.

As shown in

Protein | #Atoms | Surface generation time (s)/maximum memory use (MB) | ||

EDTSurf | LSMS | MSMS | ||

1a8r | 27375 | 4.25/71.33 | 16.28/288.36 | 4.10/31.22 |

1h2i | 32802 | 6.60/78.94 | 17.20/299.91 | 12.83/94.99 |

1fka | 34977 | 13.85/208.72 | 19.21/328.77 | 11.94/116.42 |

1gtp | 35060 | 5.88/65.10 | 17.17/298.07 | 30.80/110.26 |

1gav | 43335 | 13.21/244.46 | 18.07/309.66 | 18.21/132.68 |

1g3i | 45528 | 11.31/121.66 | 19.10/319.21 | 46.95/145.66 |

1pma | 45892 | 17.85/159.73 | 20.68/333.26 | 19.80/146.19 |

1gt7 | 46180 | 7.00/103.88 | 17.10/296.31 | 14.53/106.38 |

1fjg | 51995 | 12.86/192.19 | 19.34/321.88 | 44.91/183.89 |

1aon | 58884 | 14.36/140.13 | 20.71/335.77 | 63.59/191.70 |

1j0b | 60948 | 11.84/196.99 | 17.96/308.77 | 72.83/167.54 |

1ffk | 64281 | 16.62/200.09 | 21.00/356.07 | 70.01/270.90 |

1otz | 68620 | 17.63/218.82 | 21.40/331.28 | 52.21/165.49 |

1ir2 | 87087 | 10.12/105.05 | 18.41/309.58 | 53.93/159.28 |

1hto | 97872 | 15.32/172.59 | 20.95/333.08 | 35.15/250.49 |

avg. | 53389 | 11.91/151.98 | 18.97/318.00 | 36.79/151.54 |

The speed of EDTSurf and LSMS are both dependent on the size of bounding box while that of MSMS relies on the number of atoms and vertex density. If the triangulation result contains singularities in each round, MSMS will change the radii of some atoms and perform several rounds of computations. This is partly the reason for the expensive time cost of MSMS for most of proteins in the

Since the computational complexity of MSMS is ^{3})

Protein cavities can be empty or water-containing. They can be within domains, between domains, or between subunits. The buried water molecules in the internal cavities contribute to protein stability. This is because the water-filled cavities are important for modulating residues surrounding the cavities. Cavities can help us to locate the proton transport pathway in the membrane protein

After the triangulated surface generation, one part of the molecular surface is in contact with outside space while the other part is buried inside the molecular solid. Cavities are those formed by the inner molecular surface. Since molecular surface is propagated from solvent-accessible surface by our method, it can be seen that the number of cavities in the molecular surface obtained is equal to that in the solvent-accessible surface.

In ^{2}. It is shown in

Protein | #R |
No. of cavities/cavity volume (in Å^{3}) |
||

EDTSurf | LSMS | MSMS | ||

2act | 218 | 14/533.00 | 16/514.66 | 18/573.858 |

2cha | 248 | 7/347.44 | 19/529.91 | 20/587.81 |

2lyz | 129 | 5/220.76 | 6/190.47 | 11/274.44 |

2ptn | 230 | 7/411.29 | 14/608.94 | 20/680.45 |

5mbn | 154 | 4/168.41 | 8/298.52 | 13/293.94 |

8tln | 318 | 14/441.75 | 29/642.06 | 42/942.91 |

In

Cavity | Volume (in Å^{3}) |
Contributing residues |

1 | 186.00 | G23, N25, T26, V27, P28, Y29, Q30, V31, L46, L67, G69, E70, D71, R117, V118, W141, L155 |

2 | 50.25 | Q30, H40, G43, S139, G140,W141, G193, D194, G197 |

3 | 24.07 | Y29, L137, S139, P198 |

4 | 13.19 | A160, C136, I138, A183, V199 |

5 | 39.43 | S45, V53, G196, G197, P198, L209, I212 |

6 | 85.17 | L99, N100, N101, D102, N179, M180, S214, W215, V227, Y228, T229 |

7 | 13.19 |

In the left panel of

Left panel is the outer molecular surface and cavities of the protein; right panel shows the atoms around the cavities.

Quantitative measures such as the area and the volume of molecular surface will be more precise if the grid resolution is higher. In ^{3}, 64^{3}, and 128^{3}) to see the visual effects. We also compare our generated surfaces with the molecular surface (see

Chain A is in blue and chain D is in red. (A) MSMS ^{2}; (B) 2874 vertices and 5740 faces, resolution 32^{3}; (C) 12880 vertices and 25752 faces, resolution 64^{3}; (D) 55873 vertices and 111738 faces, resolution 128^{3}.

The molecular surface obtained with our approximation method approaches to the accurate analytical surface when the resolution is increased. From

Because EDTSurf and LSMS are based on the volumetric manipulation and the surface is only an approximation to the actual analytical surface, it is interesting to examine whether and how the calculations of the gird-based methods approach to the real value of the surface and volume. Here, we use the three atoms in

(A) area; (B) volume.

There is another type of macromolecular structures which are reconstructed from electron microscopy (EM) images. On the left panel of

Left panel is the isosurface of electron microscopy volume data (EMDB ID: 1180); right panel is the molecular surface of PDB data (PDB ID: 2c7c).

In

(A) generated by EDTSurf-MC; (B) generated by EDTSurf-VCMC.

We also compare the efficiency of MC and VCMC algorithms on the isosurface extraction for 18 EM density maps. The average CPU time by VCMC (0.54s) is about 1.4 times faster than the MC algorithm (0.75s).

As discussed in

When we add one run of Laplacian smoothing to the generated surface, each mesh vertex is moved to the centroid of the surrounding mesh vertices which are topologically connected. This post-processing step will make the mesh surface closer to the smooth continuous surface in some degree.

We have developed a new method, EDTSurf, for calculating three major macromolecular surfaces based on the method of Euclidean Distance Transform. Triangulated surfaces are then constructed by using Vertex-Connected Marching Cube method. The two parts of the molecular surface which are the contact surface and the reentrant surface can be efficiently distinguished. The resolution of the grid system can be controlled flexibly. The area and the volume of molecular surface are calculated accurately. Surfaces of the interior cavities and their surrounding atoms could be detected. Moreover, compared with the methods in literature, the EDTSurf algorithm is faster in speed and consumes less memory, especially when the number of atoms in the molecule is large.

As an application in protein structure prediction, we have applied EDTSurf to generate the solvent-accessible surface area of each residue for all proteins in the PDB library. This provides an essential frame for matching the predicted solvent accessibility with that of template structures in our fold-recognition algorithm

Although the illustrations have been given for proteins molecules throughout the paper, the surface of any other macromolecules such as RNA or DNA can also be calculated using EDTSurf. The source code and executable package of EDTSurf are freely available at