Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders

Matthew R. Whiteway; Dan Biderman; Yoni Friedman; Mario Dipoppa; E. Kelly Buchanan; Anqi Wu; John Zhou; Niccolò Bonacchi; Nathaniel J. Miska; Jean-Paul Noel; Erica Rodriguez; Michael Schartner; Karolina Socha; Anne E. Urai; C. Daniel Salzman; The International Brain Laboratory; John P. Cunningham; Liam Paninski

doi:10.1371/journal.pcbi.1009439

Peer Review History

Original SubmissionMay 6, 2021
16 Jun 2021 Decision Letter - Frédéric E. Theunissen, Editor, Thomas Serre, Editor Dear Dr Whiteway, Thank you very much for submitting your manuscript "Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Dear Authors, As you will see, we were fortunate to get 4 reviewers to look at your submission. Most of the comments are easy to address. There might be two exceptions which are the generalizability to non-head fixed behaviors and the qualitative assessment of the "ease of interpretation" (see Reviewer3). As mentioned by reviewer 4 you should consider revising your title regarding generalizability or demonstrating that indeed this approach is useful in freely behaving animals. Best wishes, Frederic Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Frédéric E. Theunissen Associate Editor PLOS Computational Biology Thomas Serre Deputy Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Dear Authors, As you will see, we were fortunate to get 4 reviewers to look at your submission. Most of the comments are easy to address. There might be two exceptions which are the generalizability to non-head fixed behaviors and the qualitative assessment of the "ease of interpretation" (see Reviewer3). As mentioned by reviewer 4 you should consider revising your title regarding generalizability or demonstrating that indeed this approach is useful in freely behaving animals. Best wishes, Frederic Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors present an extension of the VAE which makes use of hand-labeled data in order to produce untangled latent representations that may aid in behavioral quantification. The proposed PS-VAE forces a subset of the latent representations to contain label information and encourage the additional latent dimensions to capture features of behavior that are independent from the labeled inputs. Overall this work is a very interesting take on feature ‘untangling’ in low-dimensional latent representations of behavior, and is of interest from both a behavior interpretability a well as deep learning perspective. Movies demonstrating these partitions by generating frames along a single one of the unsupervised dimensions indicate that this method can successfully discover behavioral features from movies that would not be possible using output of posture estimation alone. The idea of using the supervised and unsupervised subspaces is elegant and explained well in the manuscript and the details of training are sufficient for reproduction. Additionally, the authors have generated and submitted code that appears well-documented. Here I list a few weaknesses or simply aspects of the manuscript that might be presented more clearly: Introduction: The idea of disentangling latent dimensions in image data is not new and prior work might deserve more of a mention in the introduction (Zheng, Zhilin, and Li Sun. "Disentangling latent space for vae by label relevant/irrelevant dimensions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, for example). Results: The authors mention that traditional methods are limited to constrained tasks, but then only go on to explore videos in head-fixed animals. Demonstrating the viability of this method in a less constrained environment that captures more of the body and a higher diversity of behavior - where simple tracking within the image would not suffice to describe pose - would greatly strengthen these claims. 2.2 The comparison of the vanilla VAE results to the true x and y coordinates does not seem like a fair one here and in section 2.3.1. Is it possible to compare the emergent behavior detectors that come from the unsupervised latent space with specifically engineered detectors? 2.3.2 Authors mention ‘meaningful features’ in the context that small pixel changes be of greater interest than large pixel variance, but do not address the sampling that that could capture rare events of interest. Is this considered in any accuracy metric? For example a rare event that is known to be important to animal but only occurs for a single frame? The head-fixed example of capturing interpretable unsupervised representations is an exciting addition and should be emphasized - what happens when there are fewer or more unsupervised latent dimensions? How did the authors reach the decision to use only two additional dims? Is it trivial to assign meaning to any number of these latents? When separating x-y limb motion or mechanical equipment motion, does the method fail when using slightly offset imaging angles? Discussion: How is ‘salience of features’ addressed, are there examples where the unsupervised space did not produce immediately interpretable features but may contain information that may be useful to quantify? Again, how relevant is this for phenotyping where individuals may have physical discrepancies in addition to behavioral differences? The authors mention freely moving behavior, but how difficult would it be to apply these methods to recordings with many views where the animal takes up only a small portion of the visual field at any time? Would translating information into an animal-centric coordinate system be feasible before training the PS-VAE? How sensitive would this preprocessing have to be? Methods: Is the data used for each example only coming from one individual and one camera configuration? If so are there ways to make this a viable approach for datasets which are recorded across days and individuals and might have drift or lighting differences? If not then this can be made more clear. Specific comments: Line 49 - consist instead of consists? Reviewer #2: Whiteway et al introduce Partitioned Subspace VAEs, a modification of the classical VAE architecture that they show can extract interpretable features of animal movement from tracked videos. PS-VAEs learn a latent representation that separates out video signals arising from tracked features (here, positions of anatomically defined keypoints) from other sources of variance, for example movement of the jaw or chest. They then demonstrate how these identified latents can be further analyzed using AR-HMMs to identify behavioral events, and how latents can be related to neural activity. The authors apply the PS-VAE to three example datasets, in each case contrasting their model with a standard VAE to show the gains in interpretability their model achieves. The authors develop several helpful metrics to demonstrate this contrast, such as the variance of PS-VAE latents aligned to behavioral events. Another strength of this paper is its clear organization and writing, including the Methods section, which does a very nice job of walking the reader through the details of the PS-VAE. Key points to address: First, several other papers have proposed semi-supervised versions of VAEs, including modifications to enhance the interpretability of learned latent representations. The authors note the contrast between PS-VAEs and these other methods in the paper Discussion, however I believe the paper needs a more detailed discussion of the differences between the PS-VAE and some of these other models, perhaps by the addition of a Related Works section in the Introduction. Ideally, it would also be great to also see the PS-VAE compared to some of these other VAE variations (like the pi-VAE) in the Results section, to show what the specific design of the PS-VAE has achieved beyond what was already possible via other methods. Are the results shown here really something that wouldn’t be possible with other, related methods? Or is this just the first study to try this kind of analysis on videos from behavioral neuroscience? Alternatively, the authors might include an ablation study to show how the different components of the loss function collectively contribute to the performance of the PS-VAE. An obvious question that arose in reading this paper was how the results depend on the dimensionality of the unsupervised latent space. This is a user-provided parameter, and in the paper with one exception it’s always set to two (the one exception being a case where two supervised latents were removed, and the number of unsupervised latents was increased from two to four to compensate.) How should an end-user select the dimensionality of the unsupervised latent space? Is there a way to tell that you’ve done a good job? If you over- or under-estimate the dimensionality of the unsupervised latents, does interpretability of the results suffer? In the Discussion, the authors describe several possible extensions of the PS-VAE, including its application to freely behaving animals and to neural data. As a more general extension: does this model require the supervised latents to come from tracked positions of keypoints? Or could these inputs take a more general form? Similarly, must the input to the PS-VAE be video data, or could other signals also be used? I understand that the target audience of this paper is neuroscientists using tracking methods on behavior videos, however the PS-VAE seems general enough that it could be relevant to other types of data. While it would be great to see this (or some of the other proposed extensions) addressed with a brief example in the results section, I understand if the authors consider this to be out of scope. Instead, it might be helpful for the authors to include some kind of general summary/figure on the types of data to which the PS-VAE might be applied. Finally, I had one more specific comment, motivated by the two-view mouse dataset. Here the authors state that “the PS-VAE provides a substantial advantage over the VAE for any experimental setup that involves moving mechanical equipment.” While I agree that the PS-VAE seems likely to outperform the VAE in most settings, it seems that in order for the PS-VAE to successfully isolate equipment-derived signals, it must be possible to predict the appearance of equipment from one or more tracked keypoints. I would guess that there are some types of equipment (deformable objects like mouse bedding? the black/white patterned balls used for fly-on-a-ball experiments?) where keypoint-based prediction of equipment appearance could fail. I suggest the authors add a section to the Discussion on what kinds of signal can vs cannot be learned by the supervised latents. Reviewer #3: Review is uploaded as an attachment. Reviewer #4: Dear Editors, The manuscript under review, entitled “Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders” by Whitway et al, provides a new computational method termed Partitioned Subspace Variational Autoencoder (PS-VAE). PS-VAE is a variation of semi-supervised modelling based on a fully unsupervised variational autoencoder, which provides the benefit of both a supervised and unsupervised component - allowing for a more full description of both supervised pose-estimation and unsupervised analysis of variability that is not captured by pose-estimation. This is a strong approach for the analysis of pose-estimation based behavioral analysis, which has certainly gained a great deal of popularity with the release of pose-estimation architectures like SLEAP and DLC. However, the applicability of this method to behavior other than highly-controlled head-fixed rodent setups remain to be seen. However, within head-fixed setups, the authors do a great job of validating their approach. The authors use their method in 3 datasets, including one from the IBL, to address the applicability of their approach across different head-fixed behavioral setups. Their approach, although perhaps only applicable pragmatically to head-fixed mice, will certainly be of interest to labs using head-fixed behavioral preparations. I am impressed with the approach the authors present in this paper. The overall approach is clear, the documentation is acceptable, and inclusion of examples of varying pose-estimation architectures is welcomed. I have only one major comments: Based on the current title, I was expecting a more generalizable approach. I strongly suggest the title be changed to better reflect what is shown in the paper. “Partitioning variability in head-fixed rodent behavioral videos using semi-supervised variational autoencoders” would be a much better title, until the inclusion of data suggesting that this method can be applied to freely moving mouse behaviors. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Attachments Attachment Submitted filename: psvae_review_final.pdf https://doi.org/10.1371/journal.pcbi.1009439.r001
Revision 1
3 Sep 2021 Author Response Attachments Attachment Submitted filename: PLOS_reviewer_response.pdf https://doi.org/10.1371/journal.pcbi.1009439.r002
9 Sep 2021 Decision Letter - Frédéric E. Theunissen, Editor, Thomas Serre, Editor Dear Dr Whiteway, We are pleased to inform you that your manuscript 'Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Frédéric E. Theunissen Associate Editor PLOS Computational Biology Thomas Serre Deputy Editor PLOS Computational Biology *********************************************************** Dear authors, Thank you for carefully addressing all of the concerns of the reviewers. Congrats on a nice contribution in quantitative analyses of behavior. Frederic Theunissen https://doi.org/10.1371/journal.pcbi.1009439.r003
Formally Accepted
15 Sep 2021 Acceptance Letter - Frédéric E. Theunissen, Editor, Thomas Serre, Editor PCOMPBIOL-D-21-00850R1 Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders Dear Dr Whiteway, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Olena Szabo PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1009439.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .