Peer Review History
| Original SubmissionMay 20, 2024 |
|---|
|
PONE-D-24-18772Do not trust your ears: Ai-determined similarity increases likability and trustworthiness of human voicesPLOS ONE Dear Dr. Jaggy, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 15 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Ying Shen, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes Reviewer #3: Partly Reviewer #4: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: I Don't Know Reviewer #4: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Dear Sir/Madam, Kindly receive my comments on the on the manuscript entitled: “Do not trust your ears: Ai-determined similarity increases likability and trustworthiness of human voices” - I think the title is attractive, but it needs, from my perspective, to be more scientific than popular. I suggest deleting the first part, “Do not trust your ears.” - The abstract lacks information on the approach and methods used. - Line 60-63. … “Like fingerprints, the human voice can be used 61 to distinguish individuals from one another with a high degree of accuracy and gives insights 62 into the speaker's emotions and physical attributes”. The first part needs a reference. - Line 131. ..”Only one study has explored the relationship between voice similarity estimates by humans and an automatic speaker recognition system [37]. I think this is a very strong statement. You can extend your search to cover the following study: Grágeda N., Busso C., Alvarado E., García R., Mahu R., Huenupan F., Yoma N.B. Speech emotion recognition in real static and dynamic human-robot interaction scenarios (2025) Computer Speech and Language, 89, art. no. 101666, Cited 0 times. DOI: 10.1016/j.csl.2024.101666 Rágeda N., Busso C., Alvarado E., Mahu R., Yoma N.B. Distant speech emotion recognition in an indoor human-robot interaction scenario (2023), 2023-August, pp. 3657 - 3661 DOI: 10.21437/Interspeech.2023-1169 - I suggest having the three research questions together as a list. This makes it clearer to the reader and more organized! - Since “AI” is listed as a keyword and in the title, It is very necessary to have a paragraph about AI in general and then how it affects the voice and generates sounding voices and different platforms used. - It is very important to explain the perspective of how the authors understand/perceive the terms “likability and trustworthiness”. A comprehensive argumentation should be followed with proper references. - The method part should be extended. The authors need to explain more about all the proceeds of experiments 1 and 2, 3, 4, and 5. The sample also needs to be explained more (selection criteria, exceptions, exclusions, time of the experiment, is the gender balance considered?, How did you approach the participants??? Did you inform them about the whole process? Did you provide a trial session for them before the actual experiments, or was it spontaneous?? - I understand you have five experiments, each with a part for each method and result. However, this might be very confusing to the reader. I recommend you combine all methods in one part called “Materials and Methods”. Then, all results could be in one place under a title called “Results” but divided into parts as per experiments. - Line 567: To me, this part is more discussion than a “general discussion.” If you want to keep it general, the logic says that you need to have another title called further/deep/focused discussion. Personally, I recommend having one title called “Discussion”. I recommend the following articles to be listed under the general text about “AI”: 10.3390/buildings14030786 10.3390/buildings14030781 Reviewer #2: I would like to thank the authors for the innovative contribution to understanding the correlation between voiceprints and human perception. Using similarity assessments and d-vectors to generate speech that resembles a target speaker's voice, the authors concluded that cosine similarity is a valid measure for perceived similarity in this particular area with implications for cognitive research. In my opinion, the significance of the study is clearly stated and the manuscript is well-written although some sections need to be re-drafted. For instance, there is a lack of a "Background/Related Work" section which could better describe the state-of-the-art in the field under study. The Introduction section provides some hints but I would rather suggest to create a section that could describe the existing related work more comprehensively. The General Discussion could be slightly expanded with more implications for practice in terms of Text-to-Speech systems, while the Conclusion section is quite short and must be expanded accordingly. Regarding the methodological issues, the recruitment occurred between March 2021 (Experiment 1) and November 2021 (Experiment 5), which means that the data presented here has more than 2.5 years already when considering the last experiment conducted. The authors should also provide more details about the recruitment process on Prolific (e.g., total amount of USD paid per HIT). A better description of the interface and tasks presented to the participants would be helpful. Minor issues: The first part of the title needs a question mark (?) I usually have doubts when authors claim that there is only one study addressing a certain aspect (e.g., "only one study has explored the relationship between voice similarity estimates by humans and an automatic speaker recognition system [37]"). Fig. 2 is labelled as showing an "Illustration of the results of Experiment 3" but it provides insights for the Experiment 2 Instead of naming sections as just "Experiment 1…2…3…", I would kindly suggest to create section titles with the main purpose of each experiment Acronyms like LSTM, AIC, and AI should be described in the first time they are mentioned in the text via prolific -> via Prolific (insert a link to the crowdsourcing platform as footnote) MOSNet and MOSNET -> MOSNet (uniformity) "[…] aggregated dataset compared to the first Experiment" -> "[…] aggregated dataset compared to the first experiment" Reviewer #3: The present manuscript aims to explore the utility of voice similarity as measured by a neural network as a proxy for human perception of voice similarity as well as how this similarity might then guide perceptions of person likeability and trustworthiness. Across 5 experiments, the manuscript presents some evidence that neural network similarity measures could approximate human similarity measures to an extent when comparing unfamiliar voices (Exp. 1-2) or one’s own voice to unfamiliar voices (Exp. 3). There is also some indication that voices considered more typical (as measured by a neural network) are perceived somewhat more trustworthy (Exp.4) and that voices measured to be more similar to one’s own voice are also perceived as more likeable and trustworthy (Exp. 5). This work undoubtedly asks some interesting and very timely questions but I do have some concerns about the strength and generalisability of the reported effects. Please find my major and more minor comments detailed below. - A large part of the Introduction is overly technical which could be particularly challenging for a broader audience. Personally, I am a behavioural scientist (not a computer scientist or a linguist) and I could not make sense of all the terminology used throughout. Also, more clarity is needed when reporting previous findings – most of the literature presented here was based on computer-based estimations, not behavioural ones but this is not made explicitly clear in the text. For example, at the start, it is stated that human voices can be used to distinguish individuals from one another – this is actually a very challenging and error-prone task for human observers (who have even been shown to struggle recognising their own voice). Same is true for the examples provided for recognising stress, emotions, demographics, etc. – they are mostly based on computer measurements. While human observers can also detect these in other’s voices, it is worth highlighting the perspective the authors are taking here to avoid any confusion. - The overview presented at the end of the Introduction lacks clarity and does not help to get a good idea of the overall structure of this work. For example, it is not stated what the differences between Experiments 1-3 are. It might be clearer to simply describe each of the 5 experiments briefly. The authors provide no clear hypotheses and there is no justification as to why they are expecting quadratic relationships. Was this based on some previous research where such quadratic relationships were reported? - There are some methodological clarifications needed throughout: - What online platform was used to present these studies? - Is there any particular reason why only male voices were used in Exp. 1? - From my reading of the text, it seems like each pair of voices was rated by 2 participants only – this might not therefore lead to a very reliable, stable and generalisable behavioural similarity index. - Were participants able to replay the pairs of voices? - How was the unmarked continuous rating scale used to provide a numerical similarity measure? - A power analysis is provided for Exp.2 but none of the others. It is also worth specifying the exact effect size used for this calculation. Is this the authors’ estimate of the relationship between first and second ratings or the relationship between AI-measured and behavioural voice similarity? - In Exp.2, participants heard the voice pairs a second time but in the same order they first heard them in. The authors argue that this is to avoid variance from individual randomization but I’m concerned that this approach might artificially increase the correlation between first and second ratings, especially if participants can recall the pattern of responses from the first rating. - In Exp.3 – were participants asked to indicate to what extent the recordings of their own voice sounded similar to their representation of their own voice? - As stimuli were gender-matched in Exp.3, does that mean that participants who identified as diverse or did not specify their sex, were excluded? - Were participants explicitly instructed that they will be hearing a recording of their own voice on each trial? - How many trials did participants complete in Exp.3? - Some further clarification is needed about the attention check used – it is stated that participants were asked to detect trials where voices were from two different speakers. Was that not true in all trials? My understanding is that voices from two speakers were presented at every trial. - How many speakers were used in Exp.4? - The authors argue that average faces are perceived as more attractive but it might be worth pointing out that this is not always the case. For example, Perrett et al. (1994, Nature) present some evidence against the averageness hypothesis. - Looking at the data made available by the authors, it seems like some pairs of voices have a negative cosine similarity – this might be worth mentioning in the text together with how to interpret these negative values. - How was the median correlation calculated? - The main effect reported in the paper – the relationship between human- and AI-based measures of voice similarity seems rather inconsistent in size. In fact, the median correlations reported decrease with every next experiment. - I also worry about the seemingly low within-rater consistency – it might indicate that people are not able to do this reliably enough. What is more, nothing is said about the linguistic content of the recordings. I’m assuming that different speakers had different utterances – therefore it might be that perceivers remember the utterance which can then artificially increase their consistency. - I do not think it is appropriate to apply an attenuation correction due to the low perceiver reliability – low reliability means that perceivers might not be able to do this task with high levels of consistency, not that the correspondence between human and AI similarity measures might be overestimated. - I realise that this might be a personal preference, but I would not advise describing an effect with a p value of .078 as marginally significant. - In the Discussion, authors argue that a potential problem with voice averages (composites) is that they might be representative across age. Is their suggested alternative stable across time? Reviewer #4: The authors present a series of experiments in which they test whether an AI AI-based voice similarity measure (voiceprint) corresponds to human similarity measures and whether voice similarity influences the voice’s trustworthiness and likeability. They report a correspondence between the voiceprint and human ratings as well as (in one experiment) an increase in perceived trustworthiness and likeability with similarity to one’s own voice. The topic is interesting and timely, and the experiments are well conducted and analyzed. While I applaud the authors for this, I see some serious flaws in the paper which can partly easily be mended, but partly would need at least a very thorough rethinking of the argument. It is not impossible that they are due to the fact that I am not an expert in the topic. But even then, I think they show that the authors are not making themselves sufficiently clear to non-experts. In fact, I accepted the review because I assumed trust would be a central theme; here I would have been an expert. The first part of the title misled me (and the abstract did not correct this). So, I would suggest a different title – and one that is less loud, by the way. The authors claim relevance because of the introduction of a new methodological tool and the possibly important results and speak of a broad appeal of their results. This is not implausible, but this part specifically is not well developed. I would expect more precise arguments (or less strong claims). The introduction is disorganized; the reader must construct the line of reasoning himself, understanding a series of scattered defined technical terms and finally figuring out what is actually the core of the later examination, the technical terms or human perception. Statements on the social relevance of the topic are thrown in, but not very integrated (and rather strongly formulated). It seems to me that the Introduction would have to be totally rewritten, and I am not sure that a tightly knit Introduction would lead to exactly the present experiments as an appropriate answer to the questions. I especially wonder whether the two parts – the test of the voiceprint and the test of the influence of similarity – need to be published together. Neither part is very strong, but putting them together in one paper does not make them stronger. Also, the research questions should be better developed and better derived from the introductory paragraphs. I would make effect sizes and explained variance a much more prominent part of the presentation of results and the interpretation and conclusion. The relatively strong claims about the practical relevance of the study that are interspersed throughout the text only make sense if combined with a clear estimate and discussion of the practical relevance. What is a “fair amount” of explained variance in this context? I wondered whether some of the more technical parts could be moved to a subsection in the Introduction. As such, it is one major source that distracts from a psychological argument in the Introduction. For all experiments, there are long, dense, and often unexplained sections describing results and comparatively short sections discussing them. Methods should be better justified (sometimes explained), and the explanation what the results mean and how they are limited should be substantively expanded. In several places it seems as if the authors – post hoc – explain away unexpected results (changes between experiments, low correlations) (e.g. 11 and 13). However, the respective objections are highly relevant for their study and its interpretation. The General Discussion contains at least one argument that should have been considered pre-hoc instead of post-hoc (p. 23, threshold an ecological validity). So, considering PLOS ONE’s criteria for Reviewing papers, there are (partly substantial) weaknesses in • stating the main claims of the paper and showing how significant are they for the discipline, • in supporting the claims with the data, or, more precisely, in discussing the amount of support in a balanced way • in accessibility for non-specialists and organization of the paper As the study shows much potential, a thoroughly revised version (maybe including new experiments) could be appropriate for PLOSE ONE. Minors The Introduction contains a sentence that includes “Forecasts predict that by 2024 …”. As we are already well into 2024, it would be appropriate to compare the forecast to reality here. p. 7: It is unclear to me, how a research question can extend findings. p. 9: Here, a slider is introduced suddenly. Could be explained better in the methods section. Experiment sections should not end with new (secondary) information (p. 15). ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Ai-determined similarity increases likability and trustworthiness of human voices PONE-D-24-18772R1 Dear Oliver Jaggy, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Ying Shen, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: The revised manuscript fully incorporates my comments, thank you for your work. Please proofread your English carefully. Reviewer #4: I thank the authors for taking my comments seriously and addressing them. They have done so very thoroughly and convincingly. The paper is a fine contribution to oud knowledge and I wish them many interested readers. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #2: No Reviewer #4: No ********** |
| Formally Accepted |
|
PONE-D-24-18772R1 PLOS ONE Dear Dr. Jaggy, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Ying Shen Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .