Peer Review History
| Original SubmissionJuly 15, 2020 |
|---|
|
PONE-D-20-21971 Scaling laws in natural conversations among elderly people PLOS ONE Dear Dr. Abe, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The referees recognize the interest and importance of your manuscript but indicated points that need to be addressed to ensure that the manuscript satisfies PLOS ONE publication criteria 3. 4. and 7.: 3. Experiments, statistics, and other analyses are performed to a high technical standard and are described in sufficient detail. 4. Conclusions are presented in an appropriate fashion and are supported by the data. 7. The article adheres to appropriate reporting guidelines and community standards for data availability. Please submit your revised manuscript by Oct 01 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Eduardo G. Altmann Academic Editor PLOS ONE Additional Editor Comments: Please consider clarifying the following points around Eq. (1): - Is n_c the same as N? - Clarify why \\gamma is related to cognition, I understand the role it plays in the equation but it is unclear why this should be connected to cognition. - it might be worth emphasizing that large \\beta corresponds to large vocabulary. Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. In your Methods section, please provide additional information about the demographic details of your participants. Please ensure you have provided sufficient details to replicate the analyses such as: a) a table of relevant demographic details and, b) a statement as to whether your sample can be considered representative of a larger population. 3. Please ensure you have thoroughly discussed any potential limitations of this study within the Discussion section. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 5. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper analyzes a set of data extracted from recorded conversations between mentally aged healthy Japanese-speakers. A cognitive score, which tests the cognitive capacity of an individual, is provided. The goal of this paper is to find out: 1. Whether Zipf's and Heaps' laws are followed by the analyzed subjects. 2. Whether there was any relationship between the cognitive score for each subject and the values of the parameters computed for the said laws. 3. How the laws vary according to the cognitive score. The paper suffers from many different problems, which I will try and explain in the following. PRESENTATION. In general terms, the presentation of the material is unclear. Although this paper is not yet in production, the submission should follow some quality standards. Unfortunately, this is not the case. In fact, the submission contains numerous presentation errors, which resulted in a very uncomfortable review of the paper. I personally think that the paper should have been first rejected because of this and await a resubmission, just to fix its presentation problems. These are: 1. Figures appear at the end of the paper, out of their respective caption boxes, and in landscape format. 2. There are no hyperlinks in the text, which makes the reading still more uncomfortable. 3. No numeration of the sections. It is difficult to see the difference between sections and subsections. 4. The appendix is very short, it makes no sense to have it separately from the main article. It only makes the reading more painful. The contents are the important part of the paper, because we all take the presentation part for granted. The presentation of in this submission is too poor, and it is a lack of respect for the journal and the reviewers alike. INTRODUCTION. In the Introduction section the authors do not clearly motivate their work. In the first paragraph they vaguely talk about the language and brain, then they talk about cognitive functions, and then they talk about the laws that are the backbone of the paper. The only motivation is in the lines 72-78 and are not sufficiently clear. They try to argue that finding out if these laws hold for *healthy* aged speakers, it may help to understand how language works with people with "low congnitive functions", but the paper is not related to that, from my point of view. I find the paper interesting by some of the results that it provides and also for what can be suggested by them, but the intorduction says nothing about it. Moreover, in the intro, there should be a few lines explaining what the main results of the paper are, and this is lacking as well. DATA COLLECTION. There are relevant details that are missing w.r.t. the processing of the data. You say that "we automatically decomposed all text into words using MeCab". Does it mean that you lemmatized the texts and kept the lemma only? For instance, in English, that would mean that the occurences of "book" and "books" would be transformed into the lemma "book". Is that what you performed, a lemmatization? Please clarify it. RESULTS. The methodology that the authors use in order to decide if the data follow the two laws (Zipf's and Heaps'), comparing them to seven candidate distributions is meaningful and follows standard practices: to fit the use likelihood + parameter estimation with nelder-mead + best model with AICs and Akaike. In this section, the authors say: "The results obtained by the model selection showed that all the rank-frequency distributions for the 65 participants were fitted to shifted power-law distributions". However, there are no numerical results present in the paper. This reviewer has not been able to see this data on the paper, and if this is the case, this is an important flaw. These numbers *must* be on the paper, and also there should be a table that condenses all those results. Figure S2.A and S2.B are Figures 1 and 2 averaged? I do not see what information is contained in Figures in S2 that is not already contained in Figures 1 and 2. The graphs seem to show that there is a 2-states power-law (breakpoint at 10 un the x-axis) which seems to be consistent with state-of-the-art results. However, to say ""evidenced by the straight line in the log-log plot" is an error. Lines show, they do not evidence, prove nor confirm. The numerical values and statistical tests *do* show and prove. Unfortunately, they are missing in this paper. Unless shown, these results are meaningless. Related with this discussion, I do not see why the authors did not use this same technique in order to verify that the Heaps' law is followed as well. That is: why don't you compare your data with different distribution as you did to see if Zipf's law was consistent. This is not what you do for Heaps' law (as stated in lines 219-229). Why this difference of methodology? (As an aside comment, lines 219-229 are extreamly confusing and difficult to follow). In Section "Relationship between word production patterns and cognitive scores" authors state that "we found no significant relationship between the cognitive score and the total number of words spoken (Spearman’s 237 correlation coefficient p = −0.11, p = 0.38)". I think that using a correlation test between a discrete variable (cognitive score) and a "continuous" variable (number of words) is an error. If it is not, please reference a paper where this technique has been used in a similar scenario (discrete vs continuous variable, where the discrete variable has only 5 possible values) Instead of that, there are some other techniques that could be used, as for instance, an Anova test between the alpha average and sd for score -2 and 2 (for instance) and then for scores -1 and 1, -2 and 0 etc... This test would say if the average value for the alpha (beta) values for each score are significantly different. A correlation does not seem to be convincing at all. It would help also to provide the average + sd for each cognitive score value in a graph. This comment should be extended to the rest of correlations that have been performed with the cognitive score variable and I give them low or little validity. There is another issue with the relationship between the length of the conversations/productions and the rest of the analyses. First of all, lines 192-194 show that there is a large variability in both length and vocabulary. This means that there are speakers that speak more than others significantly. This induces a bias to the analysis. The authors claim that there is "no significant relationship between the cognitive score and the total number of words spoken" (the correlation is not significant). This seems to indicate that the results that are later obtanied relating the cognitive scores and the alpha/beta parameters are free of interference from the length. But this is not necessary true (taking aside the methodological problem that I just commented in the preceeding paragraph). In order to prove that the length (word production) is taken out of the equation (that is: length does not bias the analyses) one should make the analysis taking fixed-length samples and comparing the results obtanied in the same analyses with thoses presented in the paper. If the results were similar, then, it could be concluded that text length does not bias the results. Assuming that this effect may be "cancelled" by the fact that there is no relation between two of the variables is not necessarily true. The authors also state that "the longer the data length (e.g., > 10,000), the higher the correlation coefficient (e.g., r close to 0.5) between the Heaps’ exponent and the cognitive score". That is: this implies that the longer the conversation, the more correlated the heaps exponent to the cognitive score, then, it implies that the length is relevant to this analysis, that is, the length *biases* the analysis. Said otherwise: we could have that it is the length of the text that provides a higher value of the parameter. This seems inconsistent with the above paragraph. But moreover, this would not mean that "it is not necessary to analyze datasets with tens of thousands of words for each participant", as the authors point out. Maybe, a longer length would exhibit different results. It just means that for the lengths that are in the dataset, it is not relevant. As for the section "Computational models bridging scaling laws and cognitive functions", I have not been able to understand the meaning of this section in this paper. The material contained here seems completly irrelevant to me, but I may be wrong. I would appreciate a reason why this section offers a new insight or adds new ideas. It seems that the authors tested a theoretical models, but I do not see any relationship with the rest of the paper. In Section "Source of new words", gives me some deep theoretical doubts. The authors seem to suggest that there is a way to find out what words uttered by the speakers have been acquired during the conversations and which belonged already to the speaker's repertoire. However, it seems to me very unlikely to assume that old people acquire new words at all. Therefore, the assumption in this section seems seriously flawed to me: what is the evidence that this feature (word acquisition) is a productive feature in aged people? It would make perfectly sense in the case of a longitudinal study of children that are acquiring words, but in this case, it seems tremendously unclear and it has no theoretical support (at least, I see no reference in the text). If a subject utters a word in the conversation, it may be that this is because he heard it from a conversation partner. But from here, one can't imply that the subject *acquired* that word taking into account the age of the individuals analyzed (here I mean *acquire* in the linguistic sense of adding it to his own lexicon). DISCUSSION. In the Discussion section, the authors claim that "In contrast, MCI or dementia patients might not follow the scaling laws because repeating a certain word due to critical cognitive impairment or memory disorder may result in the collapse of the scaling laws". This is just a speculation that has no empirical evidence in the paper. This section (Discussion) is a mix of different subjects with no meaningful narrative. I strongly suggest to rewrite this section, following a coherent line of discussion, not mixing future work (349-360), speculations with the discussion of the material contained on the paper. I would also avoid sentences like "we revealed the association between cognitive functions and word production patterns" because as this reviewer states above, this is debatable. You say that "While we did not find a significant relationship between Zipf’s exponent and the cognitive score (Fig 2A), the exponents of Heaps’ law, that is, the slope of the relationship between the number of words and different words, were significantly associated with cognitive function scores". Taking aside that (as I previously commented on) the methodology that you use to assume that there is a relationship between the exponents of the Heaps' law and the congitive scores is flawed, the fact that this relation exists, but the relation between exponents of the Zipf's law and the same scores, how does it relate to the fact that both Zipf's and Heaps' laws are related? (see bibliographical suggestions at the end of this text). Is it consistent? FINAL REMARKS. The authors should discuss the relationship between both Zipf's and Heaps' laws. Some references that may be of help: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0014139 https://doi.org/10.1076/jqul.8.3.165.4101 https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.114.238701 In fact, along the paper, they seem to ignore that both laws are related, and that some of the results could be inconsistent with this assumption. That should also be discussed. There is a lot to improve the paper, starting by the presentation. However, I would like to say that the material in this paper seems to be of interest. The study of Zipf's and Heaps' laws in diferent kind of people and languages, is an interesting topic of research. But there is a lot to be improved: presentation, motivation, relevant methodological problems, a more systematic presentation of the results, stating *clearly* each of the relationships that have been studied, with what methodology, and present the *full* results (in the appendix if they are too extensive) with figures that help to understand them. Therefore, I would encourage the authors to improve the quality of their paper, be more consistent in their analyses. Reviewer #2: The authors present a quantitative linguistics study of spoken Japanese conversation in healthy elderly subjects. The manuscript lays out several interesting results, but not all claims are properly sustained in my opinion. I believe the manuscript could be published if the following points are taken into consideration in a revised version: 1. I appreciate that the authors share the cognitive test results and fitted Heaps/Zipf’s exponents. However, for the sake of reproducibility, I would suggest to publish also the processed conversational data. While the fitting method used by the authors seemed robust enough to me, fitting of fat-tailed distributions is a delicate issue and many fitting procedures have been proposed in the literature. Therefore, I would be more comfortable if the reader is given access to the raw tokenized dataset, which would allow him/her to (i) verify the results of the manuscript and (ii) perform additional alternative analysis of the dataset. There are simple solutions to publish large datasets such as Zenodo. 2. The R code / necessary scripts to reproduce the results of the paper should also be published (as per PLOS editorial policy). 3. Is there a measurement error associated to the cognitive tests results? For instance, if each subject performed each test more than once, then it would be better to report the variances in addition to the averages. If each test was only taken once then it would be good to know, perhaps from the literature, what are the typical inter-subject variabilities of said tests. 4. I am not convinced the PCA-based cognitive score is a good metric for several reasons. First, it only captures 40% of the variability. My understanding is that the different cognitive tests measure different cognitive functions which could but do not necessarily correlate with each other. If that is the case, then taking the PCA of all measures might obscure more interesting results. Second, using the first principal component as a “summary” cognitive score has the undesirable consequence that scores of different subjects are not independent of each other anymore: to compute the score of one subject, we need the test results of all other subjects. I suggest the authors mention, at least, the loads of the first principal component, so that the reader has an idea of what are the weights of each test in the final cognitive score, and perform some additional analysis based on each cognitive test separately. 5. Zipf’s law and Heaps’ law are statistical laws tightly dependent on each other, see for instance (Lü, Zhang, and Zhou 2010; van Leijenhorst and van der Weide 2005; Font-Clos and Corral 2015, ). In this sense, the fact that only Heaps’ exponent significantly correlates with cognitive scores –but not Zipf’s exponent— might just be a technical artifact. Regardless, the number of tokens could certainly be a confounding variable in this case: First, there has been already ample debate in the community regarding the relation between text length and Zipf’s exponent, see (Corral and Font-Clos 2017; Bernhardsson, da Rocha, and Minnhagen 2009) (in summary, and depending on the fitting methodology etc, one tends to obtain larger exponents for longer texts). Second, having a quick look at the supplementary CSV data, it would seem that text length is a very good predictor of Heaps’ exponent, but not of cognitive score. The authors might want to attempt to more clearly untangle the role of text length in their analysis. 6. The reasoning of lines 269-272 is very unclear to me: basically, the Figure shows that in short datasets the association is lost, which would imply the opposite of what is being said? That is, we need to analyze large datasets to make sure we observe a meaningful association. 7. I do not see a clear connection between the computational model and brain cognitive functionality. Clearly, if Heaps’ exponent correlates with cognitive score, then this will also be captured by the model of Gerlach et al, but that does not bring new information per se. The authors say: “we focused on a parameter related to the decay rate of probability for new word production and interpreted it as a cognitive function.” So, the relation between word production (Heaps’ law) and cognitive function is an assumption of the authors, not a conclusion obtained from the analysis. This is clearly seen in Figure 4A (which would be obtained with any other dataset). 8. The authors claim “the scaling laws are useful to detect the tendency of cognitive decline, even in healthy people.”. I do not think this conclusion is well supported by the analysis presented in this manuscript. 9. The authors claim that “out approach requires only limited data from which to detect the relationship between cognitive functions and word patterns, even for healthy participants.”, at odds with the results presented in Figure 3. Bernhardsson, Sebastian, Luis Enrique Correa da Rocha, and Petter Minnhagen. 2009. “The Meta Book and Size-Dependent Properties of Written Language.” New Journal of Physics 11 (12): 123015. Corral, Álvaro, and Francesc Font-Clos. 2017. “Dependence of Exponents on Text Length versus Finite-Size Scaling for Word-Frequency Distributions.” Physical Review. E 96 (2–1): 022318. Font-Clos, Francesc, and Álvaro Corral. 2015. “Log-Log Convexity of Type-Token Growth in Zipf’s Systems.” Physical Review Letters 114 (23): 238701. Leijenhorst, D. C. van, and Th P. van der Weide. 2005. “A Formal Derivation of Heaps’ Law.” Information Sciences 170 (2): 263–72. Lü, Linyuan, Zi-Ke Zhang, and Tao Zhou. 2010. “Zipf’s Law Leads to Heaps’ Law: Analyzing Their Relation in Finite-Size Systems.” PloS One 5 (12): e14139. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Francesc Font-Clos [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-20-21971R1 Scaling laws in natural conversations among elderly people PLOS ONE Dear Dr. Abe, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised by reviewer 1. In particular, the criticism of the statistical analysis indicates that some of the conclusions would need to be re-considered, please address all the points and revise your conclusions accordingly as this is expected to be the last round of review. The reviewer's criticism on PLOS ONE's manuscript format will not be taken into account for the decision to accept the manuscript. Nevertheless, you may want to include a pdf version of your manuscript with figures in place. Please submit your revised manuscript by Jan 22 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Eduardo G. Altmann Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: (No Response) ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: (No Response) ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: (No Response) ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The paper is now easier to read and somehow more consistent. However, some serious flaws still persist. I will try and explain it in the following. Before that, I would like to discuss again the format of the paper. In this version that you have submitted, when a link to a graph is clicked, the focus moves to the caption of the graph, but the graph is somewhere else. It is extremly upsetting to read a paper like this. If in the previous version the problem was that you were using Word instead of latex, I do not see why this still happens when you use latex. As you may know, latex has a way to place the graphs in the pace where they are defined in the .tex file ([h!]). Still worse, the graphs are vertical instead of horizontal. That means that when I have a reference to your graph, I click it, and then, I just see a caption, and then, I need to find (and figure out) where the graph is and which one it is, and after all this, then, I need to rotate the page 90 degrees in order to be able to read it. We review papers just as a service to the community, and not only for free, but also at the expenses of our efforts and time. This is acceptable, and I do not complain for it, I am happy to help, but I do not accept that I need to read the paper in such an uncomfortable and difficult way, specially when the tools to prevent this mess are easily available. This comment is not only for the authors, but for the editors as well. Comments: First of all, at the end of the abstract the authors state that: "We found that word patterns followed these scaling laws irrespective of cognitive function, and that the variations in Heaps’ law were associated with cognitive function. Moreover, variations in Heaps’ law were associated with the ratio of new words taken from the other participants’ speech. These results indicate that scaling laws in language are related to cognitive processes.". This whole paragraph is contradictory. They say that "these scaling laws [are] irrespective of cognitive function", this is "these laws" are both Zipf's and Heaps', and then, they say: "variations in Heaps’ law were associated with cognitive function". The following sentence still adds more contradiction to the sentence. In any case, I guess that this is just a writing error. In line 53, the authors say that "Heaps’ law describes how new words are produced along with sentences or during conversations". This is not the meaning of this law, and I think that the mistake on the interpretation of this law implies more serious problems along the paper. Heaps' law state that the number of different words in a text is a funtion proportional to the length of the text (modulo exponent). This has nothing to do with the idea of "new" words, in the sense of words that did or did not belong to the speaker's lexicon. This law describes how the variety of different words varies when we write or speak. It may seem that the only problem of the interpretation of the meaning of this law is just that the authors use the word "new" where I use the word "different", but I will discuss it later, to show that, from this reviewer's point of view, the authors have mistaken or misused the meaning of this law. My main concern with the results shown in the paper are related to the Heaps' law results. In line 227 the authors state that: "the relationship between the exponent \\beta of Heaps’ law and cognitive scores and found a significant relationship (p = 0.002)". But there is a value that goes along this p-value, which is 0.003. What is the meaning of this value? Or, said otherwise, the fact of being statistically significant is important, but then, we need to see the slope (in case of a correlation analysis) or an extra metric that describes the nature of this significant p-value. To make it clearer, when you have a significant correlation, then you have a look at the slope, since it is not the same to have a significant correlation of a slope -> 0 than a significant correlation with a slope -> 1/-1. In this case, apart from the significance, what else can be said about the nature of both relations? This is important to clarify because of Zipf's and Heaps' laws are connected, then, the authors need to be very precise when they state that the expected behavior (assuming transitivity between the relations: cognitive score - Zipf's law, cognitive score-Heaps law, Heaps' law - Zipf's law) does not hold according to they results. In lines 231-234 the authors state that: "We confirmed a robust relationship between the exponent \\beta and all original cognitive scores, except for the digit span (Table 3)." (apart from the very liberal use of the word "robust" in this particular case) and then they say: "Thus, these results indicate that the variation in Heaps’ law could be associated with the difference in cognitive functions." Yet, you also state that: 1. there is no relationship between cognitive score and number of uttered words (line 217). 2. in figure 4 you show a relationship between cognitive score and exponent \\beta and length of text. This seems inconsistent, taking into account that transitivity should apply in these cases. I did not find any discussion about this fact in the paper. Finally, I would like to comment section "Source of new words". After the response to one of my questions, I firmly think that the definition of "new words" that they apply in this paper has nothing to do with the meaning of Heaps' law. What they do is to analyze the relation of different words only if they have been uttered before by another speaker. This is not what the law states, since the law makes no difference about the "origin" of the words or if they were already in the speaker's lexicon or they just learned it. This law measures the proportion of different words w.r.t. number of uttered words. Therefore, selecting only those words that have been uttered by someone else, the authors are biasing the analysis. I do not see any meaning on analyzing only this subset of words. Moreover, in order to compute the parameter of Heaps' law, it seems clear that the longer the text is, the more accurate the computation of this parameter will be, since this function will have a larger size span to be fitted. And *precisely* because of that, this parameter *needs* to be computed with fixed length text, if the purpose of the analysis is to see relevant differences between speakers' performance. It comes to no surprise to me that the longer the text, the higher the value of this parameter is. In fact, I would say that the longer the text is, the more *accurate* the value of this parameter is (but this is just a guess). If I had to see if this parameter had an impact or a relationship with the cognitive score, I would take some individuals with a significant low score and some with a significant high score (w.r.t. average, for instance), obtain their uttered words, set a prefix length fair for all of them (the minimum is usually taken), compute the Heaps' parameter for all of them, and then, apply a method to see if the difference (in average, for instance) is significant. Or you could group them as well (low, medium, high cognitive score), and see the statistical differences between all of them. Using you methodology, you make a mistake (from my point of view): 1. using the words that you define as "new". 2. not setting a prefix length. 3. not using more precise and clearer (and yet simple) statistical tools to find out the relation between individuals. In is not enough to see if there is a relation between the cognitive score and the length, because not taking a prefix to compute Heaps' parameter is biased by you decision of taking a fixed prefix length. In fact, the last graph may mean nothing, since you are not using all the available utterances for the analysis, and the graph B in the previous page may just mean that the more words you take, the more precise is your computation of the Heaps' parameter. That means that your statement in lines 298-299 is dubious. Reviewer #2: All comments have been addressed. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Francesc Font-Clos [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 2 |
|
Scaling laws in natural conversations among elderly people PONE-D-20-21971R2 Dear Dr. Abe, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Eduardo G. Altmann Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I apologize for blaming the authors and the editor for the poor presentation, which is solely the responsibility of the journal. I hope that this can be improved in the future, taking into account that, as I previously said, the reviewers are working for free for the journal. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No |
| Formally Accepted |
|
PONE-D-20-21971R2 Scaling laws in natural conversations among elderly people Dear Dr. Abe: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Eduardo G. Altmann Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .