Deep learning for robust and flexible tracking in behavioral studies for C. elegans

Kathleen Bates; Kim N. Le; Hang Lu

doi:10.1371/journal.pcbi.1009942

Peer Review History

Original SubmissionJuly 27, 2021
13 Oct 2021 Decision Letter - Barbara Webb, Editor, Dina Schneidman-Duhovny, Editor Dear Dr. Lu, Thank you very much for submitting your manuscript "Deep learning for robust and flexible tracking in behavioral studies for C. elegans" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. In particular please attend to the following points: 1) Although the reviewers were very positive about the fact that an attempt had been made to make the tools available for general use without a high technical bar, in practice, the reviewer who tried to use the provided software was unable to get it working. It is suggested that you actively test the ability of a third party to use the software and make any modifications required in the tools or instructions to ensure the this is straightforward. 2) Both reviews note that there is rather limited discussion of alternative approaches that use Deep Learning methods for animal tracking (such as Deep Lab Cut, LEAP). As mentioned in the reviews, sometimes these methods are being used for significantly different problems such as pose extraction vs. localization, but it would be helpful to make discussion of these differences more explicit. Nor is there reference to previous applications of the Faster R-CNN method to animal tracking, yet it has been used in several such applications to date, so providing more references and comparisons would be appropriate. 3) The reviewers note some situations in which the tracking seems less successful, and as they suggest, more discussion and analysis of the limitations of the method could be valuable. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Barbara Webb Associate Editor PLOS Computational Biology Dina Schneidman Software Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this paper, Bates et al. describe a deep learning pipeline for identifying the locations of C. elegans L2-Adult and eggs in images. They show convincingly that it does an excellent job for this task despite variation in lighting conditions, decoys like worm tracks in the image background, and worm size. They illustrate its use on agar plates and in a microfluidic device, and for different purposes - fecundity assays, tracking movement, and growth measurements. They also have made the software available in an extremely easy to use form via well documented online jupyter notebooks off their github site. First just a comment about software availability. Their web-based implementation of the software is just excellent, and it would be great if this became the standard expectation for this type of methods paper. All too often algorithms are described without any available implementation or there is an implementation but it is difficult to install or get to work or depends on extra software. By using online jupyter notebooks and open tools like colab, Bates et al. avoid these problems and make their software simple to use and access. The paper is straightforwardly written and the results are well described. There are some things that could use some more explanation, however. 1) Lines 520-525 are really important! But they come at the very end. The method is framed as being able to "identify and detect worms" (eg. line 98). And of course it does that in a way. But it seems like it is more accurate to say that it identifies the _locations_ of worms. It identifies a bounding box that contains where a worm is but doesn't actually identify the worm's specific pixels. This is an important distinction for two reasons: (a) It means that this method is good for certain types of questions (line 518) but not others, and that it would need to be combined with something like Mask R-CNN (line 523) to give it that capability; (b) a lot of alternative methods that one might use (Supp fig 9) are actually focused on identifying the worm's pixels and so it's not really an apples-to-apples comparison. However, for any given application the comparison may be perfectly valid because if a bounding box is sufficient to answer the question then other methods are doing a worse job just to try to get extra information that's not needed. It's a good insight to realize that the location (and not the worm shape) is all you need to answer specific questions and that taking this distinction seriously makes it possible to develop methods like their WoP and WiCh that do this very well. But I didn't get this distinction until the very end after I'd spent the paper wondering why they were only looking at bounding boxes and not actually identifying worm pixels. I'd suggest adding a paragraph earlier making this point and also going into more detail about just why "the annotation necessary to train such a [segmentation] model is significantly more intensive." (line 525). 2) Because it identifies bounding boxes of worms, the orientation of the worms seems important. If a worm is oriented along the X or Y axis, then the bounding box will be thin and long. If it is oriented 45 degrees from that, then the bounding box will be square and have maximum area. This arbitrary orientation dependence seems like it would have an impact not just on where the centroid is but also on any overlap or area based metrics like loU in figure 4. As a suggestion, why not optimize the bounding box by rotating the image, identifying bounding boxes at each angle, and then picking the one with the smallest area (or some other metric like the intersection of all the boxes)? My speculation is that optimizing the orientation of the box might reduce some of the variability in some of the data (e.g. Fig S6A). Perhaps the intersection of all (or most) of the boxes might also be able to trim off pixels outside the convex hull of the worm. The downside would be that it would take longer to go through different angles, but there could be a way to do this in a way to cut down the number of angles tested - like a logarithmic search with the standard orientation and 45 degree rotation as starting points. 3) The paper doesn't deal with L1s or early L2s at all. Is this a limitation of their microscopy setup where the magnification was too low to get a lot of pixels for these worms? Or is it that because of background or whatever the software just doesn't work well for these worms and more magnification wouldn't help? 4) In figure 1B, the system doesn't seem to work well for a few of the worms towards the bottom which don't seem unusual conformation-wise (the ones in the middle, not the ones on the lower right which are clumped). What is going on there? What is it about these worms that makes them undetected? Along those lines, in the upper right there is a worm doing a turn that is not identified. Is this just a shortcoming of not enough turning worms in the training set or does the model have difficulty with worms of this shape? 5) The model seems like it does decently for identifying eggs in complex environments, but it is hard to tell where on the spectrum of typical plates their data fell. There are definitely worm tracks in the images in Figure 2A, but is this a typical background, and easy background, more worm tracks and intensity furrows that usual? It's hard to evaluate whether the numbers in lines 222 & 223 would be typical or not. However, the authors do appropriately make the comment that this egg detection is probably well suited for identifying trends (rather than precise counts) and so it's probably the case that identifying trends would be decently robust to image background variation. Overall, this is a nice method, described in a well-written paper, provided in an excellent web-based format, and will be quite useful for a lot of worm assays. It makes an important implicit point (which should become explicit) that some kinds of information are easier to extract from images than others and that if this easier data is the essential data, you'd do better to focus on it and not try to do everything. Minor points: Line 435: left -> top Some of the methods have been described in previous papers (e.g. the microfluidic device, palmitic acid). Some are standard worm methods (culture on agar plates). The Raspberry Pi imaging setup is not cited, but neither is it described completely enough for someone to be able to replicate it. There is a mi-pi repository on the Lu lab github page. If this is the system used, I'd suggest adding a link around line 556. Reviewer #2: While classic image processing techniques have enabled us to extract rich datasets that accurately describe the behavioural repertoire of C. elegans, there are still many behaviours that cannot be analysed using this approach. This leads to time consuming manual analysis and precludes larger scale studies. Bates et al., have utilised a deep learning approach to overcome this challenge. Using Faster R-CNN they trained and validated 3 models that have been tuned to look at 3 behaviours that are notoriously challenging to analyse in an automated way; development, egg laying, and aging. They have also established a web platform that allows other researchers to apply these models to their own data and train their own models. Their approach is also flexible and performs well under different environmental conditions, another important consideration for researchers who can't afford to set up a new imaging system. This work will be incredibly beneficial to many researchers who work with C. elegans. I recommend the publication of this manuscript in PLOS Computational Biology; however, the authors need to first address the following comments. 1. I followed the ‘data annotation’ notebook until the ‘annotating images’ section and when I ran the cell I could see ‘loading widget…’ but no GUI emerged after several minutes. I didn’t test the training notebook as I got stuck at this point in the annotation notebook. I used the ‘Faster_R_CNN_inferencing’ notebook to test the egg model on data from our lab. I ran all the code and didn’t encounter any errors; however, no worms or eggs were detected. I then tested a video from the OpenWorm database to check if my data was the problem and again no worms were detected. I tried inputting the data as videos and as a folder of images and no worms were detected. I lowered both ‘target_min_scores’ values to 0.1 and still no worms were detected. I wondered if this was just an issue with the egg model, so I also tested the development model, and again no worms were detected. I may be doing something wrong or there is an invisible bug that’s preventing the model from being utilised correctly. I hope the authors can resolve this issue or provide additional instructions. 2. I think the authors should give a broader overview of deep learning approaches used for behavioural analysis in other species in the introduction. For example, DeepLabCut (http://www.mackenziemathislab.org/deeplabcut). Why did the authors choose Faster R-CNN over DeepLabCut? 3. Supplemental movie 1 shows the performance of the egg model on a video from the OpenWorm database. It appears the eggs are only counted after the worm has moved away from them and not immediately as they are laid (egg laying event). In many cases C. elegans researchers want to measure the time an egg is laid so that the interval between egg laying events can be analysed. I think the authors need to be more explicit about the limitations of their egg detection model for measuring an egg laying event. 4. Were all models trained on images from N2 worms? I’m wondering if the models will perform as well on worms with locomotion defects. Do the models take movement into consideration? 5. I really appreciate that the authors took the time to provide clear instructions and resources so that other scientists could train their own models. How do you suggest other scientists validate their models? I think the authors should discuss the limitations of deep learning approaches being used more broadly by non-experts without the correct validation. 6. The authors demonstrated how machine-learning image classification tools like Ilastik were not effective at segmenting the worms under different environmental conditions. Can the authors suggest any other deep learning segmentation approaches that might fair better? Or do they think deep learning is not the right approach for segmentation of worms? 7. Line 233 and 173: typo ‘lain’ ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. https://doi.org/10.1371/journal.pcbi.1009942.r001
Revision 1
11 Jan 2022 Author Response Attachments Attachment Submitted filename: responses_frcnn_bates_le.pdf https://doi.org/10.1371/journal.pcbi.1009942.r002
21 Feb 2022 Decision Letter - Barbara Webb, Editor, Dina Schneidman-Duhovny, Editor Dear Dr. Lu, We are pleased to inform you that your manuscript 'Deep learning for robust and flexible tracking in behavioral studies for C. elegans' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Barbara Webb Associate Editor PLOS Computational Biology Dina Schneidman Software Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed my previous comments. The discussion of the range of applicability and limitations of their approach versus other approaches are explained earlier and more clearly and have explained the false negatives in their images by (predominantly) a lack of similar images in their training set. The paper remains well-written, well-documented and should be of great interest to researchers who are thinking of applying CNNs to locate and track their organisms in images over time. minor: line 94 there is a superfluous "with" Reviewer #2: I think the authors have done a very good job addressing reviewer comments. I am supportive of publication. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No https://doi.org/10.1371/journal.pcbi.1009942.r003
Formally Accepted
3 Apr 2022 Acceptance Letter - Barbara Webb, Editor, Dina Schneidman-Duhovny, Editor PCOMPBIOL-D-21-01379R1 Deep learning for robust and flexible tracking in behavioral studies for C. elegans Dear Dr Lu, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Olena Szabo PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1009942.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .