Enhancing reproducibility in scientific computing: Metrics and registry for Singularity containers
Fig 5
Reproducibility assessment algorithm: A comparison between two containers comes down to comparing the members of the tar stream first based on an md5sum of the file member itself, and then in the case of a mismatch, looking at the content hash (non-root owned) or using a size heuristic (root owned).
The final counts of files of overlapping versus different files are then used to calculate an information coefficient using a subset of files particular to a filter (Levels of Reproducibility of Containers) to describe similarity of the two containers.