MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures
Fig 4
Workflow for encoding columnar data within MMTF.
(A) Columnar data are first converted to integer arrays. Depending on the type of the values in the array, three types of custom encoding are applied to: 1. Repeated values, 2. Sequential values, 3. Small differences between adjacent values. All encoded values are finally encoded as a byte array. (B) Example of encoding 2,000 occupancy values by integer encoding (x100) followed by run-length encoding. (C) Example of encoding 2000 atom serial numbers by applying delta and run-length encoding. (D) Example of encoding atom coordinate values by integer encoding (x1,000), delta encoding, and recursive index encoding into a 16 bit signed integer array. Here, the value 32,867 exceeds the maximum value (32,767) for a 16-bit signed integer. Therefore, recursive index encoding decomposes this value into two numbers 32,767 and 100 that sum up to the original value. All subsequent values are within range and are represented directly by their values 2,001, and 1,053.