Recent advances in experimental techniques have led to a rapid growth in complexity, size, and number of macromolecular structures that are made available through the Protein Data Bank. This creates a challenge for macromolecular visualization and analysis. Macromolecular structure files, such as PDB or PDBx/mmCIF files can be slow to transfer, parse, and hard to incorporate into third-party software tools. Here, we present a new binary and compressed data representation, the MacroMolecular Transmission Format, MMTF, as well as software implementations in several languages that have been developed around it, which address these issues. We describe the new format and its APIs and demonstrate that it is several times faster to parse, and about a quarter of the file size of the current standard format, PDBx/mmCIF. As a consequence of the new data representation, it is now possible to visualize structures with millions of atoms in a web browser, keep the whole PDB archive in memory or parse it within few minutes on average computers, which opens up a new way of thinking how to design and implement efficient algorithms in structural bioinformatics. The PDB archive is available in MMTF file format through web services and data that are updated on a weekly basis.
Citation: Bradley AR, Rose AS, Pavelka A, Valasatava Y, Duarte JM, Prlić A, et al. (2017) MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLoS Comput Biol 13(6): e1005575. https://doi.org/10.1371/journal.pcbi.1005575
Editor: Dina Schneidman, Hebrew University of Jerusalem, ISRAEL
Received: April 6, 2017; Accepted: May 16, 2017; Published: June 2, 2017
Copyright: © 2017 Bradley et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Funding: This project was funded by the National Cancer Institute/National Institutes of Health Award U01 CA198942. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
The Protein Data Bank (PDB)  is the global archive of 3D structures of proteins, nucleic acids, and complex assemblies. Recent advances in experimental techniques have led to an explosion in both the number and size of such structures. The entire PDB now exceeds one billion atoms and the largest structure currently contains about 2.4 million atoms  (Fig 1A). In addition to a growing number of depositions per year (Fig 1B) and average number of atoms per structure (Fig 1C), 68 of the 100 largest structures were deposited in the past three years. In Fig 1D, we show the rising importance of Cryo-Electron microscopy as a technique . It is expected that much larger molecular machines and molecular assemblies will be modeled by combining multiple experimental techniques .
(A) The currently largest asymmetric structure in the PDB—the HIV Capsid (PDB ID 3J3Q) contains over 2.4 million atoms. (B) The number of depositions per year (obsoleted or superseded entries are excluded). (C) The average structure size (asymmetric unit size for crystallographic structures). (D) Electron microscopy structures are contributing ~10 million atoms per year for the past 3 years (1% of the archive).
Significant increases in data sizes have been seen in many fields. Efficient storage and transmission of data using novel file formats and data compression methods are integral to these development, e.g., for the transport of HD-TV, video, and audio. A similar trend has emerged in the handling of whole genome data .
Few notable developments have been made in developing such a format for macromolecules. First, WPDB  stored the data as binary files with limited precision, allowing efficient access. WPDB is however no longer maintained and was tied to the Windows operating system. The second development is mmJSON , which represents all data from the PDBx/mmCIF format (http://mmcif.wwpdb.org/) in the JSON serialization format that can be efficiently parsed by modern web browsers. After compression with gzip (a commonly used general purpose compression tool) the largest structure (PDB ID 3J3Q) takes up 27 MB in mmJSON. In addition, a “lite” version of mmJSON is available that contains a minimal amount of information to render backbone structures (3.3 MB for PDB ID 3J3Q). Neither WPDB, mmJSON, nor other formats such as PDBx/mmCIF, provide all data necessary to represent a full macromolecular model including bond information. Furthermore, as text based formats, they are slow to parse, and clean Application Programming Interfaces (APIs) are generally not made available.
Commercial software providers have produced their own internal representations of macromolecular structures. No such format, however, is openly available and thus they cannot be incorporated into third party software or developed with community involvement. For this reason, structural analysis is currently a laborious and error-prone process, often involving substantial duplicated effort to reliably process the entire PDB archive into a 3rd party data structure. Structure visualization can be equally challenging for large structures, due to slow data download and high client-side memory requirements to parse large structure files. Some of the largest structures in the PDB require more memory than is typically available within in web browser.
In this paper we describe a new data representation, the MacroMolecular Transmission Format (MMTF) (http://mmtf.rcsb.org/) that aims to resolve these deficiencies. MMTF is a binary machine-readable file format that can be parsed, in some instances at least an order of magnitude faster than existing text-based formats. Custom lossless and lossy compression methods with either full atom level detail and a reduced representation (C-alpha, P atoms) are applied  to reduce the file size and thus further improve transmission and parsing speeds. MMTF uses a combination of encoding and compression techniques, such as delta encoding, and reduced precision for lossy compression, which have also been used for MD trajectory compression [9–11]. Finally, MMTF is designed for interoperability and use by a broad community. APIs are provided in common programming languages and a full chemical description required to understand a structure is included in the file. The PDB archive is provided in MMTF format through web services and updated weekly. A number of third-party tools already support MMTF.
Design and implementation
Above we demonstrated that existing file formats are becoming less suitable for modern macromolecular data. Due to these challenges, the MMTF format was designed with three core aims. First, to minimize data storage requirements and transfer times, the format should represent data in compressed form without loss of accuracy. Second, it should be fast to parse, since I/O is often a bottleneck in structural analysis and visualization. Third, we designed MMTF to be as extensible, self-contained, and interoperable as possible. As a binary, machine-readable format, the preferred access to MMTF data is through the APIs provided in several programming languages. This allows the developers to focus on scientific applications and not on developing custom file parsers.
Data items and encoding
The MMTF format was designed to include the core data commonly used by macromolecular visualization and analysis tools (Table 1), rather than support all metadata available in PDBx/mmCIF. A comprehensive list of the data items is available in the MMTF specification. Additional metadata present in PDBx/mmCIF files and other annotations, if required, can be accessed through web services .
MMTF files have been augmented with calculated DSSP  secondary structure using the BioJava implementation . This information speeds up both visualization and analysis applications and ensures data consistency across all structures in the PDB archive. MMTF includes the full chemical description of all molecules in a PDB entry. Bonds and bond orders for both standard and non-standard residues, e.g., ligands, are included from the Chemical Component Dictionary  and additional covalent bonds (struct_conn category in the PDBx/mmCIF files), such as disulfide bonds or covalent bonds between ligands and polymers are also included in MMTF. Metal coordination and hydrogen bond information is not included in MMTF, since there are no generally agreed upon standards how to define them. Fig 2 describes the creation of an MMTF file from a PDBx/mmCIF archive file.
After parsing a PDBx/mmCIF file, DSSP secondary structure is calculated and bond information is added for all residues. Custom encoding strategies are applied to the different data categories to achieve a compact representation. These data are serialized in binary form and then further compressed with standard compression tools to create a compressed MMTF file.
In order to reduce the overall file size, we applied specialized encoding techniques to make the data more compressible. These techniques either reduce redundancy in the data or reduce the dynamic range (entropy) of numbers, to make them more compressible using standard entropy encoding techniques.
Fields of the same type are grouped together in MMTF to create a flat data structure. For instance, the coordinates of all atoms are stored together, instead of in atom objects with other atom-related data. This avoids imposing a deeply nested hierarchical structure on consuming programs, while still allowing efficient traversal of models, chains, groups, and atoms. This approach represents a columnar encoding  of data, which facilitates data encoding and enhances data compressibility. Columnar encoding is also used in mmJSON  to increase compressibility.
Lossless integer encoding is applied to all fixed precision floating point numbers. Integer numbers have a simpler bitwise representation and are therefore more compressible than the equivalent floating-point numbers . Atomic coordinates are typically represented with a precision of 3 decimal places, and temperature factors with 2 decimal places. For lossless encoding, we multiply coordinate and temperature factor values by 1000 and 100, respectively, and round the values to the nearest integer.
A further increase in compression can be obtained through lossy encoding by rounding coordinates to 0.1 Å precision and temperature factors and occupancy to 0.1 precision. Lossy compression is particularly important for the visualization of large complexes, for which the reduced precision is not visually perceptible [6,8].
Dictionary encoding is used for data repeated across multiple residues. In standard PDB and PDBx/mmCIF files, atoms within a residue are listed in a standard order. Exploiting this, atom name, element symbol, intra residue bonds and bond orders, etc. can be stored once for each unique residue type and not repeated across the file, as shown for the dictionary entry for serine (Fig 3). MMTF has been designed to handle exceptions to a consistent atom order, if they occur, however, the encoding will be less efficient.
Delta encoding is applied to data of large magnitude that change in small increments. For example, instead of storing absolute atom coordinate values, differences in the x, y, and z coordinates are stored. Due to the covalent bond structure in molecules, these differences typically lie within a small dynamic range bound by their bond distances. Previous work determined this method to be the most effective encoding technique . Temperature factors are also delta encoded, since their variation from residue to residue is typically low.
Run-length encoding compresses a list of repeated values, such as occupancy values in X-ray structures, most of which are constant (1.0). Here the value itself and the number of repetitions is stored. For atom serial numbers, delta and run-length encoding are combined to achieve a very compact encoding.
Recursive indexing—Given the small dynamic range of delta encoded coordinates, most, but not all values can be represented as 16-bit signed integers, rather than 32-bit signed integers. We have explored the effect of packing on compression  and identified recursive indexing as a simple and effective packing strategy for this data type. Any values that lie outside the 16-bit integer range [–32,768, 32,767] are decomposed into a series of values, such that the individual values fit into the 16-bit range (Fig 4D), and their sum adds up to the original value.
(A) Columnar data are first converted to integer arrays. Depending on the type of the values in the array, three types of custom encoding are applied to: 1. Repeated values, 2. Sequential values, 3. Small differences between adjacent values. All encoded values are finally encoded as a byte array. (B) Example of encoding 2,000 occupancy values by integer encoding (x100) followed by run-length encoding. (C) Example of encoding 2000 atom serial numbers by applying delta and run-length encoding. (D) Example of encoding atom coordinate values by integer encoding (x1,000), delta encoding, and recursive index encoding into a 16 bit signed integer array. Here, the value 32,867 exceeds the maximum value (32,767) for a 16-bit signed integer. Therefore, recursive index encoding decomposes this value into two numbers 32,767 and 100 that sum up to the original value. All subsequent values are within range and are represented directly by their values 2,001, and 1,053.
The overall workflow for the encoding of columnar data is shown in Fig 4.
MMTF data are stored in the MessagePack format (version 5, http://msgpack.org) binary container format. MessagePack is an efficient binary serialization format, similar to JSON, but faster to parse and more compact. Encoding and decoding libraries for MessagePack are available in many languages. The top-level of the container holds the field names as keys and field data as values. Non-columnar data are described using standard MessagePack data types. Columnar data, e.g., most data columns in the “ATOM” records, are custom encoded. The MMTF specification defines Codec Types used to custom encode columnar data. These data records are described by the following data structure (Fig 5), which is represented as a binary array in MessagePack.
A Codec Type describes the columnar encoding strategy. A Codec may describe the combination of several encoding strategies. For example, coordinate data are encoded by a Codec that combines integer encoding, delta encoding, recursive index encoding. Data Length represents the number of values that have been encoded, and here the Codec Parameter for coordinate encoding is a divisor to convert integers to floating point numbers.
MMTF data files
MMTF files for all PDB entries are updated weekly as part of the RCSB PDB weekly update pipeline. Semantic versioning (http://semver.org) is employed to the file specification and the APIs. Major version changes of the specification may require code updates to decode and parse data. For this reason, after the release of a new major version of the specification, the previous major version will be retained for a number of months to allow time for code updates and testing. Such version changes will be disseminated through a mailing list and updates to the documentation.
MMTF files are generated with two types of molecular representation (Table 2). The reduced representation, which uses lossy compression and less atomic level detail is suitable for 3D visualization, e.g., ribbon diagrams, or calculations that require only a C-alpha representation.
MMTF application programming interface
MMTF files are accessible through RESTful web services via HTTP and HTTPS protocols, or downloadable as individual gzipped files (http://mmtf.rcsb.org/download.html). A weekly update procedure ensures the availability of the latest structures, as provided by the wwPDB. For large-scale analysis of the PDB archive, where loading of thousands of individual files is inefficient, a single Hadoop Sequence file (https://wiki.apache.org/hadoop/SequenceFile) is provided. These files can be efficiently processed in parallel by Big Data frameworks such as Apache Hadoop (http://hadoop.apache.org/) or Apache Spark (http://spark.apache.org/).
The preferred access to MMTF data is via the provided decoder APIs, which are available through open source GitHub repositories. API documentation and example code are available from the MMTF project page (http://mmtf.rcsb.org/). Fig 6 shows the integration of third-party applications and software libraries with the MMTF APIs.
The PDB archive can be accessed in MMTF format through RESTful web services. APIs available in common programming languages provide efficient access to the MMTF data. Third party applications then access the data through the language-specific APIs.
File size comparison
In Fig 7 we compare the size of the PDB archive in PDBx/mmCIF, PDB, and MMTF file formats. In the MMTF file format the PDB archive can be stored in about 8 GB, making it less than 1/4 the size of the PDBx/mmCIF files and 1/3 the size of the PDB files. In practice, being stored in about 8 GB also means the entire archive can be stored in RAM on many standard desktop and laptop computers.
About 500 large structures (> 99,999 atoms or > 62 chains) cannot be represented in the PDB format, however, they are available as split PDB files (.tar.gz files) and take up about 2.7 GB, which is included in the reported PDB file size. For MMTF, we report the size of the all atom representation (MMTF-full) and the reduced representation (MMTF-reduced).
Load time benchmarks
The following benchmarks assess the file load time for MMTF compared to PDBx/mmCIF and PDB data formats. The load times reported in the figures below consist of reading the files from a local disk, decompressing and parsing the data, instantiating a hierarchical molecular data structure (model->chain->residue->atom), and storing the metadata. All parsing benchmarks were performed using a single core on a MacMini, 2.6 GHz Intel Core i5, 16 GB RAM 1600 MHz DDR3, with a solid state drive.
The first benchmark uses the existing file parsers (PDBx/mmCIF, PDB) in BioJava and compares their performance with the new BioJava MMTF parser, which uses the MMTF-Java API. In Fig 8 we compare the load times for ~127,000 PDB entries as individual gzip compressed PDBx/mmCIF, PDB, and MMTF files, and as uncompressed Hadoop Sequence files.
Load time for the PDB archive (~127,000) entries using the gzip compressed PDBx/mmCIF, PDB, and MMTF formats. For MMTF, we report the load time for individual gzipped files, as well as, the load time for uncompressed Hadoop Sequence Files containing MMTF records in the full (all atom, MMTF-full) and the reduced format (MMTF-reduced). For PDB file loading, about 500 large structures that cannot be represented in the PDB format (>99,999 atom, > 62 chains) were excluded.
To assess the effect of structure size on load time (Fig 10), we created samples of 100 structures around the 25 percentile (S2 Table), 50 percentile (S3 Table), and 75 percentile (S4 Table) from the atom size distribution of the PDB archive. To create these subsets, we selected 100 structures symmetrically around the quartile values. S1 Appendix contains links to the software repositories to run the BioPython and BioJava benchmarks.
The benchmarks contain 100 structures each around the 25, 50 and 75 percentile of the PDB size distribution: Q25 (2,309–2,313 atoms), Q50 (4,054–4,063 atoms), Q75 (7,862–7,885 atoms).
MMTF was specifically designed to handle the efficient transfer and visualization of very large structures that could not be parsed and visualized using the PDBx/mmCIF format due to the large memory overhead. For example, the currently largest asymmetric structure (PDB ID 3J3Q) in the PDB with 2,440,800 atoms, shown in Fig 1A, was rendered with NGL viewer using the MMTF-reduced format. Table 3 compares the load times for this entry using BioPython, NGL, and BioJava.
Simple application programming interfaces
Availability and future directions
In this paper we present a modern macromolecular transmission format. MMTF addresses the growing size and complexity of macromolecular structures in the PDB archive through a new binary, custom compressed file format. Furthermore, MMTF is self-contained and simple APIs are provided in multiple popular programming languages. Software developers do not need to implement their own parsers—often an error-prone process, but rather build on the tools provided by MMTF. Through both these advances MMTF allows rapid user-friendly access to any structure in the PDB archive with a few lines of code. We demonstrate that the format is 75% smaller, an order of magnitude faster to parse, and is provided along with a user friendly API that promotes interoperability.
Due to simple API, user-friendly specification and licensing model, the format has already been incorporated into several protein analysis tools and 3D structure visualization tools (Table 4).
We envisage the above advances will have a major impact in two areas of structural bioinformatics (Fig 12).
(A) MMTF enables fast transfer, parsing, and low client side overhead for high-performance visualization in web-based viewers and in particular mobile devices. (B) MMTF can be represented in “Big Data” formats and the small size enables high-performance, in-memory analysis and calculations of the entire PDB archive using Big Data frameworks for parallel processing.
The first key area of impact is visualization of macromolecular structures, in particular when used on mobile device or in a web browser. MMTF enables low bandwidth file transfer, low client-side memory consumption, and fast parsing of PDB structures. For example, the 3D visualization on the RCSB PDB website is powered by MMTF , using the MMTF-full representation for entries < 10,000 residues and the MMTF-reduced representation for larger entries. Using the NGL viewer  and MMTF, the currently largest structure in the PDB, the HIV viral capsid (PDB ID 3J3Q) , can now be visualized on a mobile device (Fig 1A).
Second, by greatly increasing file-parsing speed, a rapid analysis of the entire PDB archive can be carried out. As an example, we have used the MMTF format to rapidly mine the PDB for interatomic distance distributions. Coupled with the use of an efficient geometric hashing algorithm in BioJava, the distances between all C-alpha carbons can be calculated in minutes. Parsing of the text-based PDBx/mmCIF format alone would take several hours. Using a Hadoop Sequence file with MMTF records enables the scalable analysis of the PDB using standard distributed parallel processing frameworks. Further work is ongoing to demonstrate the use of MMTF with Big Data frameworks.
MMTF is an open source project and we welcome additions and new applications that use the new technology. As an example, the MMTF-C and MMTF-C++ decoders were developed in collaboration with community members.
S1 Table. Benchmark set of 1000 randomly selected PDB entries.
S2 Table. Benchmark set Q25.
100 PDB entries symmetrically selected around the 25 percentile of the PDB atom size distribution (2,309–2,313 atoms).
S3 Table. Benchmark set Q50.
100 PDB entries symmetrically selected around the 50 percentile of the PDB atom size distribution (4,054–4,063 atoms).
S4 Table. Benchmark set Q75.
100 PDB entries symmetrically selected around the 75 percentile of the PDB atom size distribution (7,862–7,885 atoms).
We thank Robert Hanson, Thomas Holder, and David Koes for their feedback on the MMTF specification and API. We thank Thomas Holder, Julien Ferté, Gazal Kalyan for developing the MMTF-C decoding library and Gerardo Tauriello, Stefan Bienert, Gabriel Studer, and Andrew Waterhouse for developing the MMTF-C++ decoding library. Robert Hanson provided efficient Java code for decoding of MessagePack. We also thank all users who helped with MMTF file transfer benchmarks worldwide, and Shih-Cheng Huang for performing the BioPython benchmarks. We thank Ezra Peisach for help with data validation, and Cole Christie and Chris Randle for setting up the weekly update process and data download for MMTF files.
- Conceptualization: PWR.
- Data curation: ARB APa ASR YV.
- Formal analysis: ARB APa ASR YV.
- Funding acquisition: PWR.
- Investigation: ARB ASR PWR.
- Methodology: ARB ASR APa APr JMD PWR YV.
- Project administration: PWR.
- Resources: APr JMD PWR.
- Software: ARB ASR APa APr JMD PWR YV.
- Supervision: PWR.
- Validation: ARB APa ASR PWR.
- Visualization: ARB PWR YV.
- Writing – original draft: ARB PWR.
- Writing – review & editing: ARB PWR.
- 1. Berman HM. The Protein Data Bank. Nucleic Acids Res. 2000;28: 235–242. pmid:10592235
- 2. Zhao G, Perilla JR, Yufenyuy EL, Meng X, Chen B, Ning J, et al. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics. Nature. 2013;497: 643–6. pmid:23719463
- 3. Callaway E. The revolution will not be crystallized: a new method sweeps through structural biology. Nature. 2015;525: 172–174. pmid:26354465
- 4. Callaway E. Data bank struggles as protein imaging ups its game. Nature. 2014;514: 416–416. pmid:25341769
- 5. Paten B, Diekhans M, Druker BJ, Friend S, Guinney J, Gassner N, et al. The NIH BD2K center for big data in translational genomics. J Am Med Informatics Assoc. 2015;43: ocv047.
- 6. Shindyalov IN, Bourne PE, IUCr. WPDB–PC Windows-based interrogation of macromolecular structure. J Appl Crystallogr. 1995;28: 847–852.
- 7. Bekker G-J, Nakamura H, Kinjo AR. Molmil: a molecular viewer for the PDB and beyond. J Cheminform. 2016;8: 42. pmid:27570544
- 8. Valasatava Y, Bradley AR, Rose AS, Duarte JM, Prlić A, Rose PW. Towards an efficient compression of 3D coordinates of macromolecular structures. PLoS One. 2017;12: e0174846. pmid:28362865
- 9. Lundborg M, Apostolov R, Spångberg D, Gärdenäs A, van der Spoel D, Lindahl E. An efficient and extensible format, library, and API for binary trajectory data from molecular simulations. J Comput Chem. 2014;35: 260–269. pmid:24258850
- 10. Huwald J, Richter S, Dittrich P. Compressing molecular dynamics trajectories: breaking the one-bit-per-sample barrier. J. Comput Chem. 2016; 1–23.
- 11. Marais P, Kenwood J, Smith KC, Kuttel MM, Gain J. Efficient compression of molecular dynamics trajectory files. J Comput Chem. 2012;33: 2131–2141. pmid:22730053
- 12. Rose P W., Beran B, Bi C, Bluhm WF, Dimitropoulos D, Goodsell DS, et al. The RCSB Protein Data Bank: Redesigned web site and web services. Nucleic Acids Res. 2011;39: D392–D401. pmid:21036868
- 13. Velankar S, van Ginkel G, Alhroub Y, Battle GM, Berrisford JM, Conroy MJ, et al. PDBe: improved accessibility of macromolecular structure data from PDB and EMDB. Nucleic Acids Res. 2016;44: D385–D395. pmid:26476444
- 14. Kinjo AR, Bekker G-J, Suzuki H, Tsuchiya Y, Kawabata T, Ikegawa Y, et al. Protein Data Bank Japan (PDBj): updated user interfaces, resource description framework, analysis tools for large structures. Nucleic Acids Res. 2017;45: D282–D288. pmid:27789697
- 15. Kabsch W, Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983;22: 2577–2637. pmid:6667333
- 16. Prlić A, Yates A, Bliven SE, Rose PW, Jacobsen J, Troshin P V., et al. BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics. 2012;28: 2693–2695. pmid:22877863
- 17. Westbrook JD, Shao C, Feng Z, Zhuravleva M, Velankar S, Young J. The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank. Bioinformatics. 2015;31: 1274–1278. pmid:25540181
- 18. Abadi DJ, Madden SR, Hachem N. Column-stores vs. Row-stores: How Different Are They Really? Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. New York, NY, USA: ACM; 2008. pp. 967–980. https://doi.org/10.1145/1376616.1376712
- 19. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25: 1422–1423. pmid:19304878
- 20. Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW. Web-based molecular graphics for large complexes. Proceedings of the 21st International Conference on Web3D Technology—Web3D ‘16. New York, New York, USA: ACM Press; 2016. pp. 185–186. https://doi.org/10.1145/2945292.2945324
- 21. Prlić A, Yates A, Bliven SE, Rose PW, Jacobsen J, Troshin P V., et al. BioJava: An open-source framework for bioinformatics in 2012. Bioinformatics. 2012;28: 2693–2695. pmid:22877863
- 22. Rego N, Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics. 2015;31: 1322–1324. pmid:25505090
- 23. NCBI Resource Coordinators. Database Resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2017;45: D12–D17. pmid:27899561
- 24. Rose PW, Prlić A, Altunkaya A, Bi C, Bradley AR, Christie CH, et al. The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucleic Acids Res. 2017;45: D271–D281. pmid:27794042