Peer Review History

Original SubmissionJuly 9, 2019
Decision Letter - Lidia Adriana Braunstein, Editor

PONE-D-19-19332

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

PLOS ONE

Dear Mr. Gray,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. 

We would appreciate receiving your revised manuscript by Oct 18 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Lidia Adriana Braunstein, Phd in Physics

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please include additional information about your dataset and ensure that you have included a statement specifying whether the collection method complied with the terms and conditions for the website.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This is a very charming piece of work; quite apart from its appeal, it is also a very nice contribution to the study of language use online. It is the first time I've seen this phenomenon studied in a rigorous fashion. There is also some really nice technical innovation in the appendix.

I have only one minor point: we know that regressions on a log-log plot are not statistically robust measures of power law indices. Can the authors try, e.g., the Clauset-Shalizi powerlaw estimation tools? These allow for an xmin specification, and will not be sensitive to extreme outliers on the right hand side, so should work well.

Congratulations on a very nice paper.

Reviewer #2: Overall, I really enjoyed the paper and I think it should be accepted for publication after some minor revisions, although the authors might want to go a bit further then what I’m insisting on, which isn’t much. The methods and findings are interesting, original, and clearly presented. It was a fun paper to read and I feel like it provides a real foundation for further linguistic analysis.

As a linguist, I do find the framing, analysis, and discussion all a bit superficial though. Right now this reads solely as a methodological paper, but I think there is more insight here than what was discussed. The rationale for the paper is almost entirely around the application of these methods, and even there it’s pretty limited, mostly lexicography and general NLP, and the NLP examples aren’t really clear to me. This is fine – the author’s aren’t overclaiming, which is nice – but there is some more room here for discussion if they want.

The introduction/literature review is especially quick. I realise the authors aren’t linguists, but my one insistence is that the authors review some previous research, if only briefly. The authors say lengthening is fundamental to speech, and I guess that’s true, but that’s really all they say. That point could be developed a bit, but at the very least the processes of vowel lengthening and gemination should be noted, and the directly relevant literature in linguistics and NLP should be acknowledged. Like in a few minutes online I found a bunch of very relevant research, which should be cited at a minimum:

https://repository.upenn.edu/pwpl/vol18/iss2/14/

https://www.aclweb.org/anthology/D11-1052

https://www.sciencedirect.com/science/article/pii/S0747563214000594

https://gretchenmcculloch.com/book/

https://www.aclweb.org/anthology/N13-1037

There is other material the authors might find relevant as well, if they want to go further. The frequency distribution of word lengths obsewrved by Zipf, for example, is not wholly tangential to the topic of this paper. We have also done research on new words on Twitter in particular that has looked at creative spellings, including stretched words, and at how much word length matters in terms of predicting the success of new words over time.

https://doi.org/10.1017/S1360674316000113

http://evolang.org/torun/proceedings/paperpdfs/Evolang_12_paper_171.pdf

In their discussion/conclusion, I think the authors should also consider how their methods and results could inform our understanding of how and why words are stretched on Twitter. To me that is the main scientific value of this method, and it isn’t really realised. A number of relevant observations are made as the methods are described, but they are never really brought together. And then of course they aren’t related back to previous research, since none is covered. I’m not insisting on this. I’m fine with a purely methodological paper. It just seems to me like a missed opportunity. I think that would increase the impact and significance of the paper.

Otherwise, in terms of the methods, I think they are well presented for the most part. I don’t find the prose descriptions especially easy to follow. On the one hand, I think they could be explained a bit more plainly and in some more detail. On the other hand, equations/algorithm descriptions in the main text would help clarify things. After I look at the figures and read the captions, I understand the methods, but I was a bit lost up till then, but maybe that’s just me. FWIW the jellyfish plots were especially unclear to me. I more or less get the idea, but I’m not sure I’m 100% totally following there. And anyway I don’t really see much point to them – like why are they useful? I’d recommend the authors either expand or cut that section, but it’s a pretty minor point and I’m not bothered one way or the other.

Also, briefly in terms of the plots, the authors might think about including character labels on the lines in the balance plots, so that they can be read without a caption. I’m also not sure how much varying the shading and the width of the branches in the spelling trees. It’s a neat effect, but I found it kind of hard to read, especially as the trees got more complex and the resolution got smaller. Plus, if I understand correctly, it’s all redundant information. Anyway, just my reaction. I realise some thought has been put into this, and it’s not a big deal.

Overall, I find the paper really interesting and it seems like the methods could be used for lots of different types of analyses. I can think about a bunch of interesting linguistic research questions that one could pursue with the methods: looking for differences in stretching across words classified according to their grammatical or semantic characteristics; looking for change over time or across space the spelling of specific words; looking at lengthening patterns on different character types (e.g. vowels vs. consonants; stops vs. fricatives).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Jack Grieve

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

Dear Reviewers,

We thank the reviewers for their time and effort spent reviewing our manuscript and appreciate their comments and suggestions towards improving it. We also thank both reviewers for their kind general remarks about our manuscript and are glad you seemed to enjoy the paper. As detailed further below, we have worked to address your comments and hope we have a clearer, overall improved article as a result.

Reviewer 1

Comment:

“We know that regressions on a log-log plot are not statistically robust measures of power law indices. Can the authors try, e.g., the Clauset-Shalizi powerlaw estimation tools? These allow for an xmin specification, and will not be sensitive to extreme outliers on the right hand side, so should work well.”

Response:

We agree that those methods are good for finding power law indices. However, we do not find any power laws in our work. We mention that the token count distributions seem to follow a rough power law shape, but we do not calculate any indices or formally test how well a power law fits. We do perform linear regression on a log-log plot in relation to Fig. 2 when we are calculating a cutoff rank. However, we do not mean to imply that the underlying distribution follows a power law. We just use the line as a step in our attempt to find a cutoff in a more principled way than arbitrarily setting a cutoff using the idea of a cutoff frequency from signals analysis. In this way, the line should almost be thought of more like a roughly tangent line to the main part of the curve, giving us something to calculate a drop from. Furthermore, the exact cutoff is not important and will not change our results in any meaningful way. We have added the following text to the paper in an attempt to make this clearer to the reader:

“Note that we are using a rough guide to find a practical cutoff for the number of kernels we include in our study. While we are finding a linear fit as part of this process, this token count distribution is not some archetypal power-law. We merely use the regression line as a reference from which to calculate a drop analogous to the process of finding a cutoff frequency, and the precise cutoff is not particularly important. The cutoff rank is not used in the statistics of any individual kernel, and for the analyses that examine how stretchable words behave as a function of kernel rank, the resultant figures and statistics will only be affected at the margin of the cutoff rank. An alternative might be to simply pick a cutoff rank based on visual inspection of Fig. 2 or to pick a lower bound for the data amount (token count sum), and find which rank falls below that bound.”

Reviewer 2

Comment:

“The introduction/literature review is especially quick. I realise the authors aren’t linguists, but my one insistence is that the authors review some previous research, if only briefly. The authors say lengthening is fundamental to speech, and I guess that’s true, but that’s really all they say. That point could be developed a bit, but at the very least the processes of vowel lengthening and gemination should be noted, and the directly relevant literature in linguistics and NLP should be acknowledged. Like in a few minutes online I found a bunch of very relevant research, which should be cited at a minimum:

https://repository.upenn.edu/pwpl/vol18/iss2/14/

https://www.aclweb.org/anthology/D11-1052

https://www.sciencedirect.com/science/article/pii/S0747563214000594

https://gretchenmcculloch.com/book/

https://www.aclweb.org/anthology/N13-1037”

Response:

We added a brief mention of vowel lengthening and gemination, but do not belabor it as we feel it is quite separate from the lengthening we are examining in our study. We appreciate the links to the related research and have reviewed and referenced all of them in our revision, and also included some more that we found.

Comment:

“As a linguist, I do find the framing, analysis, and discussion all a bit superficial though. Right now this reads solely as a methodological paper, but I think there is more insight here than what was discussed. The rationale for the paper is almost entirely around the application of these methods, and even there it’s pretty limited, mostly lexicography and general NLP, and the NLP examples aren’t really clear to me. This is fine – the author’s aren’t overclaiming, which is nice – but there is some more room here for discussion if they want.

There is other material the authors might find relevant as well, if they want to go further. The frequency distribution of word lengths observed by Zipf, for example, is not wholly tangential to the topic of this paper. We have also done research on new words on Twitter in particular that has looked at creative spellings, including stretched words, and at how much word length matters in terms of predicting the success of new words over time.

https://doi.org/10.1017/S1360674316000113

http://evolang.org/torun/proceedings/paperpdfs/Evolang_12_paper_171.pdf

In their discussion/conclusion, I think the authors should also consider how their methods and results could inform our understanding of how and why words are stretched on Twitter. To me that is the main scientific value of this method, and it isn’t really realised. A number of relevant observations are made as the methods are described, but they are never really brought together. And then of course they aren’t related back to previous research, since none is covered. I’m not insisting on this. I’m fine with a purely methodological paper. It just seems to me like a missed opportunity. I think that would increase the impact and significance of the paper.”

Response:

Thank you for the suggestions on how you think we could make our paper more impactful. In the revised manuscript, we try to include a clearer application of our methods to NLP and we include references to the provided papers. In the section of the paper where we discuss the distributions we added a paragraph discussing Zipf’s brevity law and how it relates. We have also significantly increased the discussion in the concluding remarks section of the paper, bringing together some results from our paper and some from the research of others. Though, perhaps, we have not done so as much as you think there is opportunity for. We leave the remainder of this to further research.

Comment:

“Otherwise, in terms of the methods, I think they are well presented for the most part. I don’t find the prose descriptions especially easy to follow. On the one hand, I think they could be explained a bit more plainly and in some more detail. On the other hand, equations/algorithm descriptions in the main text would help clarify things. After I look at the figures and read the captions, I understand the methods, but I was a bit lost up till then, but maybe that’s just me. FWIW the jellyfish plots were especially unclear to me. I more or less get the idea, but I’m not sure I’m 100% totally following there. And anyway I don’t really see much point to them – like why are they useful? I’d recommend the authors either expand or cut that section, but it’s a pretty minor point and I’m not bothered one way or the other.”

Response:

We tried to find the parts of the paper that were potentially most confusing and tried to make them clearer, including adding equations in quite a few places. In particular, we also added more explanation around the jellyfish plots explaining their usefulness.

Comment:

“Also, briefly in terms of the plots, the authors might think about including character labels on the lines in the balance plots, so that they can be read without a caption. I’m also not sure how much varying the shading and the width of the branches in the spelling trees. It’s a neat effect, but I found it kind of hard to read, especially as the trees got more complex and the resolution got smaller. Plus, if I understand correctly, it’s all redundant information. Anyway, just my reaction. I realise some thought has been put into this, and it’s not a big deal.”

Response:

For the balance plots, we had considered adding the character labels to the plots, but hadn’t largely because for many of the plots there is not room and the labels would overlap. However, for the plots in the main paper, that is not really the case. What we decided to do is only add character labels for characters that are allowed to stretch in the kernel, and then only if they do not overlap each other when printed on the plot.

For the spelling trees, the width of the branches are not redundant information. The width is related to the number of tokens that pass through that branch when spelled out. So wider branches reflect paths that are more common when spelling out stretched words, and this information is not available in any other way from the figure. The shading however is indeed redundant information, just reflecting the direction of the branch. In an earlier life, the trees were all a single color. Through our iterations of making the figures though, we felt that the two shades helped with following some of the patterns and have decided to keep the shading. Even though we did not change them this time, it is still always useful to get feedback on the figures and we much appreciate it.

Comment:

“Overall, I find the paper really interesting and it seems like the methods could be used for lots of different types of analyses. I can think about a bunch of interesting linguistic research questions that one could pursue with the methods: looking for differences in stretching across words classified according to their grammatical or semantic characteristics; looking for change over time or across space the spelling of specific words; looking at lengthening patterns on different character types (e.g. vowels vs. consonants; stops vs. fricatives).”

Response:

Thank you for the additional future research suggestions. We have added these to the future research part of our concluding remarks.

Yours sincerely and on behalf of the manuscript’s authors,

Tyler Gray

Department of Mathematics and Statistics

The University of Vermont

Attachments
Attachment
Submitted filename: Response to Reviewers.pdf
Decision Letter - Lidia Adriana Braunstein, Editor

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

PONE-D-19-19332R1

Dear Dr. Gray,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Lidia Adriana Braunstein, Phd in Physics

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #2: Paper looks really good! I always really liked the study and I think the background and methods are much clearer now. Sorry about being a bit slow with the review. Busy these days.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: Yes: Jack Grieve

Formally Accepted
Acceptance Letter - Lidia Adriana Braunstein, Editor

PONE-D-19-19332R1

Hahahahaha, Duuuuude, Yeeessss!: A two-parameter characterization of stretchable words and the dynamics of mistypings and misspellings

Dear Dr. Gray:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Lidia Adriana Braunstein

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .