Peer Review History

Original SubmissionOctober 9, 2019
Decision Letter - Lyle J. Graham, Editor, Frédéric E. Theunissen, Editor

Dear Dr Ivanenko,

Thank you very much for submitting your manuscript 'Classification of mouse ultrasonic vocalizations using deep learning' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the manuscript as it currently stands. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts.

In addition, when you are ready to resubmit, please be prepared to provide the following:

(1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.

(2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.

(3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

- Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition).

- Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video.

- Funding information in the 'Financial Disclosure' box in the online system.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here

We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us.

Sincerely,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Lyle Graham

Deputy Editor

PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Dear Alexander and Bernhard,

Both I and the reviewers found that your use of the DNN for classifying SUV is interesting - although I still find it somewhat incremental. Both reviewers have addressed minor points that you should address. I am very much in agreement with reviewer 1 that the only cross-validation test that is correct is what you call the single recording test set. Your paper should not use any other tests - period! This is true not only for the DNN but with your comparisons with other methods like linear regression. Other wise you are admitting that you have a "random- effect" the identity of each mouse but ignoring it. It is as if you knew that you need to do mixed-effect modeling but refrain from doing so. In terms of the analysis of vocalizations, this point was very explicitly made by Mundry and Sommer (Anim Behavior 2007) as applied to linear classifiers - they call their approach permuted DFA. Note also that this is the type of cross-validation that my lab has always performed both with Hyenas and with Zebra Finches. I have also not looked at what is usually done for mouse for USV but the 9 bio acoustical features you choose are quite simple - I would like you try a much more complete set of features that include a better description of the spectral enveloppe (spectral mean, quartiles, etc), temporal envelope and fundamental frequencies (we have used such features in our work but they are quite commonly used by bioacoustibcians). For the fundamental frequency, you might even to PC of the time-varying fundamental (eg. Dahl A, Sherlock BR, Campos JJ, Theunissen FE, 2014.

Best wishes,

Frederic.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: Review: PCOMPBIOL-D-19-01759

Ivanenko et. Al., have developed several DNNs (Deep Neural Networks) capable of classifying individual USVs into discrete categories with accuracy not previously achievable. Most notably they are capable of classifying the sex of the emitting mouse ~78% of the time. They were also able to extend this technique to classify USVs from a transgenic mouse, as well as classifying USVs by sever experimenter defined acoustic properties. The work reported here is extremely thorough and described in excellent detail, and the underlying method could prove extremely useful for researchers hoping to analyze vocalizations during opposite sex interactions. Computational methods of identification have significant benefits over competing methods, such as multi-microphone arrays, namely zero cost and ease of implementation. Although Dr. Englitz appears to be working on that solution in parallel, as well as considering combining the methods in the future. The technique also has some drawbacks. This is only useful for studying opposite-sex interactions, and 78% accuracy may be too low to effectively study syllable ordering, or syntax. The precise pattern of vocalizations is thought to be an important feature of social vocalization, and miss-categorizing 22% of syllables could drastically change results.

I have no major concerns

Minor concerns and suggestions

Calling the network 84% accurate is a bit misleading. DNNs are really meant to classify new information, and the network is only 78% accurate when tested against a novel mouse. Although the authors admit this, they lean heavily on the 84% number throughout the manuscripts.

The title feels a little broad, and perhaps should include classification of sex or genotype.

This manuscript seems to have two main threads. The first is proof that information about the sex of emitter mice is contained within individual USVs. I believe the authors do an excellent job proving this point, and they show clearly that the sex information is embedded in a complex combination of many features, including individual variation in USV production. While this work is thorough and well done, my enthusiasm for it is not particularly high. The features underlying classification are still unknown and I’m not exactly sure why it would be useful for mice identify sex based on a single call. It feels like this thread occupies the bulk of the text and sort of occludes the authors more important contribution.

Namely, the authors have laid the groundwork for a universal sex classifier for mouse USVs. The current network is ~78% accurate within novel animals of the same strain, a major improvement to all existing methods I am aware of. The great thing about neural networks is they aren’t fixed or permanent. They can be shared and retrained by researchers around the world who have already collected datasets from separate strains or transgenic animals. Unfortunately, I couldn’t find the final networks among the available data. The Authors seem to have made all of the code and raw files available, so it should be possible to re-train the networks from scratch, but that seems like an unnecessary deterrent to wider adoption/refining of the technique.

Typos

79: feature combination s (I’m not sure about this)

98: Sentence cuts off

905: vocalization..

Reviewer #2: USV – plos computational biology

The authors propose an interesting approach to sex discrimination in USV. They rightly identify that this a very challenging line of investigation.

Automated methods were used to automatically extracted USVs - please provide validation data on these methods as automated extraction is not a straightforward process and rarely as accurate as claimed.

In presenting the final outcomes, it would be helpful to report the metrics resulting from the rerun analysis as this more likely represents what would be observed in a novel cohort. Ie. performance is reduced.

Re-analyzing previously published data which reported no differences between groups, and then reporting different outcomes, is an important process.

Wording of this phrase needs revising “As intruders we used anaesthetized females to ensure that only the resident mouse could emit calls.”

More information on the WT/Cortexless paradigm is needed in the main document rather than making the reader find it in the supplementary materials.

The authors mention that data were derived from a subset of both cohorts. How were these subsets selected? Were only the ‘best ‘recordings selected, was there any bias in this process or were they randomly selected?

The number of calls appears sufficient but is 17 representative of mice in general?

DNN are a great way forward in this field, but as in other fields, understanding the machinations within the convolution and connected layers and how the output represents a concrete concept is challenging.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Kevin R Coffey

Reviewer #2: No

Revision 1

Attachments
Attachment
Submitted filename: ResponseLetter_USVClassify.docx
Decision Letter - Lyle J. Graham, Editor, Frédéric E. Theunissen, Editor

Dear Mr Ivanenko,

Thank you very much for submitting your manuscript "Classifying sex and strain from mouse ultrasonic vocalizations using deep learning" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations.

Dear Alexander and Bernhard,

As you know from our previous correspondence you have performed an interesting analysis and showed that the USV in mice show some degree of sexual dimorphism but that this is exhibited in a complex acoustic signature that can best be extracted from a spectrographic representation. I still have some comments that I would like you to address. The mostly relate to your methods. You also use descriptions and language that are not completely in-line with what bio-acousticians use. I like your analyses of your networks.

Minor comments:

1. Methods: “Spectrograms were converted to a sparse representation by setting all values below 70th percentile to zero, which eliminated most close to silent background noise bins”. It is common to talk about thresholds in dB. So I suggest stating the corresponding dB threshold (e.g. This floor corresponds to a xx dB thresholds below peak).

2. “Hence, vocalizations longer than 100 ms (11.7%) were included shortened.” Reword: “Hence, vocalizations longer than 100 ms () were truncated to 100…”. (I assume the end was deleted – otherwise use the correct words to explain how it was shortened).

3. I know that you already provided more details on the NN but I am here also asking you to provide more details for the calculation of the auditory features. You should also use descriptions (and terms) that are well understood among bioacousticians. I am assuming that [0,1] means any value between 0 and 1? You figure 1 seems to say that these are attributed by visual inspection (by hand) but only 1, 0.5 and 0???:

a. E.g. how is the broadband defined? And how do you go from 0 to 1? Why not use Weiner Entropy which might be more common in sound analyses?

b. Similarly for complexity? Is this the Weiner entropy?

c. Similarly for tremolo? Specific the equation that you used.

d. What is the average Frequency of a vocalization? The spectral mean? Or the average fundamental? These are very different. You should probably have both. You do talk about the mean spectral energy for detecting vocalizations. If that is what you use, use the same term.

e. Spectral mean requires a distribution. How is this one estimated?

f. Also Power requires a frequency range – just all energy above 25 kHz. Normalized by duration? dB is a unitless measure (relative) so dB2 is meaningless. Both amplitude and power can be expressed in dB.

4. I don’t think that any of the measures should be assigned by hand. Ok for Direction, Peaks and Breaks but Broadband and Complexity should be quantified. Why not use measures such as spectral bandwidth and Weiner entropy? You could also do something like pitch saliency.

5. I still find your choice of acoustic features somewhat limited. I think that some of the most experienced bioacousticians might wince and I would not like that given that my name will also be on your report as the associate editor. I would also add fundamental. You could do the mean but also do a time-varying fundamental (that you can express with PC for dimensionality reduction if you want). My lab as a Python toolbox that you can find in github.com:/theunissenlab/BioSoundTutorial that you might find useful to efficiently extract features. But don’t take my word for it – you can also see what has been published by other groups. One issue is that mice USV are late in the game and that most folks that have analyzed them are not bioacousticians so you will need to get inspired by analyses in other animals.

Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. 

When you are ready to resubmit, please upload the following:

[1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out

[2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file).

Important additional instructions are given below your reviewer comments.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Lyle Graham

Deputy Editor

PLOS Computational Biology

***********************

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

[LINK]

Dear Alexander and Bernhard,

As you know from our previous correspondence you have performed an interesting analysis and showed that the USV in mice show some degree of sexual dimorphism but that this is exhibited in a complex acoustic signature that can best be extracted from a spectrographic representation. I still have some comments that I would like you to address. The mostly relate to your methods. You also use descriptions and language that are not completely in-line with what bio-acousticians use. I like your analyses of your networks.

Minor comments:

1. Methods: “Spectrograms were converted to a sparse representation by setting all values below 70th percentile to zero, which eliminated most close to silent background noise bins”. It is common to talk about thresholds in dB. So I suggest stating the corresponding dB threshold (e.g. This floor corresponds to a xx dB thresholds below peak).

2. “Hence, vocalizations longer than 100 ms (11.7%) were included shortened.” Reword: “Hence, vocalizations longer than 100 ms () were truncated to 100…”. (I assume the end was deleted – otherwise use the correct words to explain how it was shortened).

3. I know that you already provided more details on the NN but I am here also asking you to provide more details for the calculation of the auditory features. You should also use descriptions (and terms) that are well understood among bioacousticians. I am assuming that [0,1] means any value between 0 and 1? You figure 1 seems to say that these are attributed by visual inspection (by hand) but only 1, 0.5 and 0???:

a. E.g. how is the broadband defined? And how do you go from 0 to 1? Why not use Weiner Entropy which might be more common in sound analyses?

b. Similarly for complexity? Is this the Weiner entropy?

c. Similarly for tremolo? Specific the equation that you used.

d. What is the average Frequency of a vocalization? The spectral mean? Or the average fundamental? These are very different. You should probably have both. You do talk about the mean spectral energy for detecting vocalizations. If that is what you use, use the same term.

e. Spectral mean requires a distribution. How is this one estimated?

f. Also Power requires a frequency range – just all energy above 25 kHz. Normalized by duration? dB is a unitless measure (relative) so dB2 is meaningless. Both amplitude and power can be expressed in dB.

4. I don’t think that any of the measures should be assigned by hand. Ok for Direction, Peaks and Breaks but Broadband and Complexity should be quantified. Why not use measures such as spectral bandwidth and Weiner entropy? You could also do something like pitch saliency.

5. I still find your choice of acoustic features somewhat limited. I think that some of the most experienced bioacousticians might wince and I would not like that given that my name will also be on your report as the associate editor. I would also add fundamental. You could do the mean but also do a time-varying fundamental (that you can express with PC for dimensionality reduction if you want). My lab as a Python toolbox that you can find in github.com:/theunissenlab/BioSoundTutorial that you might find useful to efficiently extract features. But don’t take my word for it – you can also see what has been published by other groups. One issue is that mice USV are late in the game and that most folks that have analyzed them are not bioacousticians so you will need to get inspired by analyses in other animals.

Figure Files:

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Data Requirements:

Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

Reproducibility:

To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see http://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-materials-and-methods

Revision 2

Attachments
Attachment
Submitted filename: ResponseLetter2_USVClassify.docx
Decision Letter - Lyle J. Graham, Editor, Frédéric E. Theunissen, Editor

Dear Mr Ivanenko,

We are pleased to inform you that your manuscript 'Classifying sex and strain from mouse ultrasonic vocalizations using deep learning' has been provisionally accepted for publication in PLOS Computational Biology.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. 

Best regards,

Frédéric E. Theunissen

Associate Editor

PLOS Computational Biology

Lyle Graham

Deputy Editor

PLOS Computational Biology

***********************************************************

dear Bernhard,

Thank you for your patience and carefully addressing all of my questions and recommendations.

Best wishes,

F

Formally Accepted
Acceptance Letter - Lyle J. Graham, Editor, Frédéric E. Theunissen, Editor

PCOMPBIOL-D-19-01759R2

Classifying sex and strain from mouse ultrasonic vocalizations using deep learning

Dear Dr Englitz,

I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Laura Mallard

PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .