BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

Thank you for pointing this out. We have updated the license for the tutorials on GitHub to Apache 2.0. We have also licensed the datasets under the Creative Commons Attribution 4.0 International License and updated Table 1 accordingly.
• Please consider redistributing the tutorials via CodeOcean (https://codeocean.com/) We are currently in the process of distributing these tutorials via NeuroLibre (https://conppcno.github.io), a free platform to run and execute the tutorials. No installation is required by the user. This platform is user-friendly and is widely available to the general public, so we believe that it achieves the same goal as distributing via CodeOcean. Having said that, we are big fans of CodeOcean andgoing forwardwe plan to work with them and other platforms to increase the reach of the tutorials.
• BIDS -The Brain Imaging Data Structure is not even mentioned. Given that BIDS provides a wealth of educational materials, I believe it would be great to mention it in tutorial 2. This is not to suggest to change all datasets used in the tutorial to BIDS (although their availability in BIDS format would be desirable).
Thank you for this suggestion. We are increasingly using BIDS in other work and agree that it is important to highlight the general issue of file naming and header conventions. We now highlight this issue in Notebook 2 and we provide a link to the BIDS educational materials. BIDS data are compatible with our tutorials, as the tutorials can easily be modified to suit any directory and file naming structure. We have provided an example in Notebook 2 about how to read data files in BIDS format.
• Most of the tutorials propose exercises -is there a way for students to check their solutions?
As these materials were designed for classroom teaching --and we hope others will adopt them for the same purpose --we intentionally left the answers out. However, we do have a solution key that we will finalize after the review process to incorporate any revisions to the notebooks. The solution key will be hosted as a private repo and will be provided to users upon request if they affirm that they are not currently students in a for-credit course using these materials. We have also added a note to the public repo that instructors interested in using the tutorials in courses are encouraged to contact the creators for additional information (at which point the solutions will be shared, along with other classroom advice based on three semesters of teaching).
• Tutorial 7 was excessively long for me -I would split the HPC section of the end to a separate unit (probably proposed before starting with searchlight).
We have found that the best way to teach the use of HPC is by linking it to a neuroscientific goal (here, searchlight analysis). A standalone HPC notebook would be different in character from all of the other surrounding notebooks, which are based on fMRI methods rather than generic computing topics. For these reasons, we prefer to keep Tutorial 7 as a single tutorial. Having said this, we understand the reviewer's concern; to address this concern, we have added a message at the start of the HPC section in the notebook mentioning that this will take a long time to execute in non-HPC environments. This will allow users to plan their effort in completing the notebook, and also allow users on clusters to get the full benefit of the tutorials.
• The authors explicitly state that "the materials available to learn these methods are limited, **the software is rarely open-source**, and the analyses are often difficult to run on large datasets" in the Author Summary (and then this is suggested at points). I believe that actually, most of the neuroimaging software is open-source. I believe the authors want to refer to reproducibility issues here. The paper would benefit from some discussion about how these tutorials are useful to address the problem of irreproducibility in neuroimaging.
In the revision, we have highlighted the problem of reproducibility in psychology and neuroscience (see below for changes). The tutorials are inherently designed to reproduce findings with open data and open code. We are hopeful this will encourage further transparency when users adapt the tutorial code for their research studies.
To address the reviewer's concerns listed above, we have added the following text: p. 4, lines 53-55: "Furthermore, the materials available to learn these methods do not encompass all the methods used, work is often published with no publicly available code, and the analyses are often difficult to run on large datasets without cluster computing." p. 6 lines 88-90: "One barrier to increasing the accessibility of these techniques is that, in most cases, they were created as custom code within individual labs and are thus not part of other fMRI software analysis packages." p. 11 lines 203-208: "We have released these tutorials publicly and freely. The users can also apply these methods to publicly available datasets from the existing literature, leading to independent validation of the published results. We are hopeful that this will help increase reproducibility of results more broadly: when tutorial users analyze their own data, they will have already become familiar with the tools necessary to share their code and data, leading to a cycle of improved data sharing and code validation." Lastly, we agree that most neuroimaging software is open source these days, however, some of packages require paid licenses (e.g. Princeton MVPA Toolbox, through Matlab). We have added the following text to the Introduction (p. 5-6, lines 83-88): "There exist multiple open-source packages that implement MVPA techniques and RSA. Some of these packages require paid MATLAB licenses (e.g. Princeton MVPA Toolbox, The Decoding Toolbox [17], and CoSMoMVPA [18]) and others are completely free (e.g. Nilearn [19]and PyMVPA [20,21]). Although all these packages cover a broad range of MVPA and RSA techniques, they do not cover techniques such as FCMA, ISC, ISFC, SRM, and event segmentation." • Limitations: please clearly mention what are the limitations of these tutorials (e.g., no tutorial about GLMs, which is a very standard technique, shallow coverage of containers, etc.). This is not a request for expanding the materials -as I said above they are well designed.
Our goal in these tutorials was to focus on advanced fMRI analysis techniques, and hence we consciously decided not to cover GLMs or preprocessing, or to spend much time on the deployment components such as containers themselves. We agree that these are gaps and limitations, which we have now outlined in the paper, directing readers to other resources (p. 20, lines 382-386): "Furthermore, our goal for these tutorials was to cover advanced fMRI analysis and hence our tutorials do not cover pre-processing methods, General Linear Model analysis, or software deployment options (e.g., containers) in great detail. An exhaustive list covering multiple helpful tools and tutorials is available here: https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md." For all these reasons my recommendation is acceptance with minor revisions.

Oscar Esteban Postdoctoral Fellow, Stanford University
Thank you again for these detailed and constructive suggestions, which we believe have strengthened the paper and tutorials.
In this manuscript, Kumar and colleagues describe their recently released BrainIAK tutorials, a collection of resources designed to make MVPA-style analyses accessible to the broader neuroimaging community. As the authors note, there is a relative lack of educational materials for these methods despite their use throughout computational cognitive neuroscience. These tutorials, therefore, are of significant general interest to researchers working in this or related fields. I do, however, have concerns about the presentation of the tutorials in the present manuscript, particularly in their described relationship to previous work.
Thank you for your detailed and helpful comments. As described below, we have modified the introduction to position our work in the context of other available open-source packages.
• We regret this significant oversight. Our intention was not to diminish these important contributions and we completely agree that highlighting them and their relationship is important.
The Matlab packages had been excluded simply because we were focusing on packages that did not require paid software licenses. In retrospect, mentioning them serves to acknowledge the historical context and to highlight the relevance of the current work to users of those packages. As creators of the Princeton MVPA Toolbox, we view BrainIAK as the next iteration of that project, bringing it into an open-source framework, with considerably expanded functionality, and with more professional coding, documentation, and high-performance capabilities.
pyMVPA and Nilearn are directly relevant and deserve a more thorough treatment. We have now discussed these packages in the Introduction (p. 5-6, lines 83-95). Nilearn provides an interface with algorithms relevant to the contents of some of the tutorials, namely classification, regression, feature selection, dimensionality reduction, and searchlight. BrainIAK creates an expressive environment in which these and several other cutting-edge methods have been implemented, including ISC, ISFC, SRM, FCMA, TFA, Bayesian RSA. The tutorials highlight many of these functions. We are hopeful that researchers will expand and adapt the tutorials to leverage more functions from BrainIAK and other packages. To reflect these points, we have made the following changes: "The user is also encouraged to make novel contributions using the method that they learned in the tutorial, either by enhancing the method, creating a new visualization of the data, or even using the method on another dataset, e.g., from OpenNeuro (http://openneuro.org)."

It is also unclear why the data downloads on the website link to google drive, when several of the datasets such as Sherlock and Raider are available from open source repositories with better long-term archiving.
In testing prior to release, we discovered that the fastest download speeds were achieved using Google Drive and hence delivered these datasets using Google Drive. We also provide ready-touse masks and smaller extracts of the datasets, which can save time (especially on Google Colaboratory) for the novice user and enable execution on platforms with limited resources. The tutorials website (brainiak.org/tutorials) also links to a Zenodo version of the same datasets, which serves as a long-term archive.
That said, to avoid any confusion, we also now list (p. 12, Thank you for pointing this out. We will clarify this. The exercises are a part of the tutorials and may be used to learn the materials and/or may be used as part of a formal course assignment. We have made the following changes: p. 9 lines 156-161 now read as follows: "For all users, we embed background material and references, prompts for further self-study, and problem set exercises to help them learn how to generate and adapt code. The exercises for each notebook focus on neuroscientific applications of the techniques being learned; thus, by working through the exercises, students learn how to use these techniques to answer meaningful neuroscientific questions (course instructors may contact us for more information)." p. 15 lines 277-281 now read as follows: "The accompanying notebook exercises help the user understand the method and its applicability to the scientific question by requiring that they generate answers or code. These questions are posed in the context of a publicly available fMRI dataset. These questions and exercises can be used to formally evaluate students enrolled in a for-credit course (course instructors may contact us for more information)." • The authors note that the "most powerful analyses are complex and computationally intensive" (line 51-52). This is a subjective statement and depends entirely on the research question at hand. "An exhaustive list of helpful tools and tutorials is available here: https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md" Reviewer 3

Reviewer #3: # Summary and general comments
In this submission, Kumar and colleagues present a library called BrainIAK for machine learning in functional neuroimaging, and an accompanying set of tutorials. The tutorials are presented in the form of jupyter notebooks, and are accessible either locally through containers or online on the google collab platform. They also include instructions for deployment on high-performance infrastructure. The data used in the tutorial are freely available and specially prepared to be used as part of a training activity. As a strength, some of the material covered in the tutorials include inter-subject correlations and representational similarity analysis, two applications which are not well covered by currently available tutorials, to my knowledge.
Overall, this new library and tutorials are remarkably comprehensive, and I believe will represent a very valuable resource for the community. My only major concern is that the authors did not properly position their work compared to other efforts.
Thank you for these positive impressions and constructive comments. We have modified the Introduction to position our work in the context of other available packages, which we agree was missing before and is important for advancing the field collaboratively (p. We have edited out these references from the abstract (p. 2, lines 43-45): "These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons." * intro claims several times the lack of existing education material. There is a huge amount of general-purpose tutorials for machine learning, most notably featuring the sklearn documentation.
We agree completely and have benefited tremendously from such materials. Our focus was exclusively on machine learning in the context of neuroimaging and so we did not emphasize general-purpose tutorials, such as from sklearn. In the tutorials we do occasionally provide links to specific aspects of sklearn and nilearn for additional help. We further added a link to the sklearn learning materials in the Introduction when we discuss general-purpose machine learning, and tailored our claims about the lack of education materials to be specifically about machine learning in neuroimaging. We have elaborated on the contribution of BrainIAK and its tutorials with respect to the other packages in more detail in the following paragraphs: In the Author Summary we have removed the statement that materials are unavailable and the modified lines are as follows (p. 4, lines 53-55): "Furthermore, the materials available to learn these methods do not encompass all the methods used, work is often published with no publicly available code, and the analyses are often difficult to run on large datasets without cluster computing." In the Introduction we added (p. We have added examples of preprocessing pipelines that a user could use. We have added a description in Notebook 2 on how to access data in BIDS format by showing users how to form the required file name string with a task name, space name, and run id. This will allow them to read data files in BIDS format. We have also added a link to additional resources in the "Other Resources" section.
p. 13, lines 238-244: "The user is free to use any preprocessing pipeline (e.g., fmriprep, AFNI). Data are exchanged in standard NIFTI and NumPy formats with existing tools such as Nibabel or Nilearn and our tutorials show how to import data into Python structures and use BrainIAK. Data are exchanged in standard NIFTI and NumPy formats with existing tools such as Nibabel or Nilearn and our tutorials show how to import data into Python structures and use BrainIAK. The functions in BrainIAK parse the data in a time x voxels format, with an exception being the searchlight function that takes in 4-D volumes. The BrainIAK package also serves as an ecosystem for users to contribute their own methods while avoiding duplication of methods found in other packages." p. 20, lines 384-386: "An exhaustive list covering multiple helpful tools and tutorials is available here: https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md." * no material is presented to demonstrate that the proposed material achieves the stated goals. Survey results from a workshop, for example, would add some support to the usefulness of the resources.
From informal discussions at hackathons, workshops, and on our Gitter channel, participants have told us that the tutorials have helped them tremendously. We also have formal evaluations from students who have used these materials in courses at Yale and Princeton and they have been overwhelmingly positive (e.g., the most recent incarnation received a course rating of 4.8/5); we don't think that these institutional evaluations can be made publicly available. We are aware of two final projects from this course that are now in preparation for journal submission. Within our labs, these tutorials are now the starting point for all new lab members. We have also distributed them widely to colleagues, who have expressed gratitude and also use them for graduate training. We realize that this feedback is anecdotal. We will create a survey page on the BrainIAK tutorials homepage when this paper is published for people to provide feedback and suggestions, which could be posted publicly with consent. Thanks to Prof. Bellec and the NeuroLibre team for a detailed review of each notebook (so far Notebooks 1-3 have been reviewed). We have already resolved all comments for Notebooks 1-3 and the pull requests have been merged into the NeuroLibre GitHub repository. As the other notebooks are reviewed, we will work towards incorporating them as well.