Table 1.
Encodings types supported by the BinaryCIF format.
Fig 1.
Compression strategies of BinaryCIF.
The BinaryCIF codec represents diverse data types in a standardized manner: The indices wherein particular strings occur together with float values can be encoded as integer values. Interval Quantization is the only lossy encoding. For integer arrays the most efficient combination of Run Length, Delta, and Integer Packing is detected. This approach allows management of arbitrary data and even columns that are not defined by any schema. MessagePack is employed downstream of BinaryCIF encoding.
Fig 2.
Archive sizes for 154,015 files are given in GB (see S1 Text). Original refers to the content of the original structure files. Pruned resembles the set of information provided by MMTF files (see S1 Table). Use of BinaryCIF yields an archive size similar to MMTF.
Fig 3.
BinaryCIF provides the most effective compression for the largest structures, enumerated in S2 Table.
Fig 4.
Read performance of JavaScript implementation.
Average single-threaded parsing time for 154,015 PDB structures is given in minutes. Reading of binary data (BinaryCIF and MMTF) can provide a dramatic speedup. Handling gzipped files slows down parsing in most cases. Read performance can be easily improved by omitting less used meta-information as seen for the pruned bins.
Fig 5.
Read performance of Java implementation.
Average single-threaded parsing time for 154,015 PDB structures are given in minutes.