Reader Comments
Post a new comment on this article
Post Your Discussion Comment
Please follow our guidelines for comments and review our competing interests policy. Comments that do not conform to our guidelines will be promptly removed and the user account disabled. The following must be avoided:
- Remarks that could be interpreted as allegations of misconduct
- Unsupported assertions or statements
- Inflammatory or insulting language
Thank You!
Thank you for taking the time to flag this posting; we review flagged postings on a regular basis.
closeSome additional resources
Posted by yarikoptic on 18 Nov 2020 at 18:53 GMT
Great article! Some additional tools I would like to reference:
**"Rule 5: Specify software versions"**
- could be too tedious and not feasible in scenarios where you would like to retroactively reproduce some past environment. On Debian (and NeuroDebian-) systems snapshots of APT repositories allow to "freeze" to a specific date. nd_freeze tool (from neurodebian-freeze in (Neuro)Debian) invocation could be placed as the first command in Dockerfile to run (shipped within all NeuroDebian containers) which would switch to use APT repositories in the state for that date. So it would mimic "real-life" scenarios where we update from state on one date to another, and could be used to reproduce some past environment. E.g. see example: https://github.com/ReproN... produced by neurodocker (see next, which supports nd_freeze)
**"Tools for container generation"** - https://github.com/ReproN... (despite the name it is useful beyond neuro domain) - create Dockerfile or Singularity recipes from a single command line invocation, which would adhere to best practices
**"Use version control"** - DataLad or git-annex directly could be used to store not only recipes but containers themselves. https://github.com/datala... DataLad extension also helps to "register" and use those containers with DataLad. This way you could keep **everything** (code, Dockefile recipes, data files themselves, and container images) under version control
**Rule 7: Mount datasets at run time** - totally agree. But it hinders reproducibility since then outside data resources might change, be unavailable, have different paths on different systems. Have a look at YODA principles to self-contain everything within a "analysis dataset" itself -- input data, containers, etc, so all the paths to be mounted always reside "within" the "analysis dataset". https://github.com/myyoda... . More on that could be found in DataLad handbook: http://handbook.datalad.o... , and here is a sample DataLad dataset with containers for neuroimaging which enforces "total compartmentalization" for singularity containers: https://github.com/ReproN...
RE: Some additional resources
DanielNüst replied to yarikoptic on 25 Nov 2020 at 16:45 GMT
Thank you very much for the comment! It would have been great to learn about them during the preprint stage, but I'm optimistic we'll find a way to make sure interested readers find them here, but also in the articles repository at https://github.com/nuest/...
- Pinning whole APT repositories and putting images into git annexes are quite useful ideas. I'm not sure they would fit the intended target audience of the article, but for advanced users they are one more layer of security.
- neurodocker is a tool to generate containers, but I'm a bit sceptical as to its accessibility as a CLI tool, in surely would help to apply good practices for Dockerfiles though
- Re. mounting: that is why we recommend to version control the Dockerfile and mounted files in the same repository; I did not know YODA, a very good effort - is it picked up broadly in your community? I think if people use/follow YODA and/or DataLad tools/principles, they are already on a very good path and might not need the manually crafted Dockerfile we focus on in the article; therefore, thank you for pointing them out!