Peer Review History
| Original SubmissionFebruary 4, 2023 |
|---|
|
PONE-D-23-03278Garden-path sentences and the diversity of their (mis)representationsPLOS ONE Dear Dr. Ceháková, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both reviewers agree that the study you report has been carried out rigorously and is of considerable interest to the field. The reviewers also note a number of non-trivial issues, however, which you should address in your revisions. Reviewer #1, in particular, provides many detailed, constructive suggestions as to how your manuscript and the statistical analysis can be improved. Some of Reviewer #2's comments overlap with those made by Reviewer #1, specifically regarding your end-of-trial comprehension questions and what these might measure. Looking more deeply into the latter issue may have important consequences for your theoretical conclusions. Please also attend carefully to the reviewers' more minor comments. Please submit your revised manuscript by Apr 27 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Claudia Felser, Ph.D Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. Please provide additional details regarding ethical approval in the body of your manuscript. In the Methods section, please ensure that you have specified the name of the IRB/ethics committee that approved your study. 3. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. 4. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 5. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well. 6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript ‘Garden-path sentences and the diversity of their (mis)representations’ investigates the processing of temporarily ambiguous sentences. The authors report the results from 4 web-based reading experiments and a web-based grammaticality-judgement study. The key results are that (a) an incorrect initial analysis which is activated during processing of the sentence remains active even after processing of the sentence is complete, and that (b) readers differ considerably with regard to which final interpretation they adopt. The manuscript is written in a scholarly way. The experiments are based on a solid experimental design, and motivated with reference to existing theoretical accounts of ambiguity-resolution. The key question addressed in the manuscript, i.e. the nature of the final interpretation adopted by readers, has received considerable attention in recent publications (see Fujita & Cunnings, 2020, for a detailed discussion of the issue). In this respect, the manuscript is certainly of interest for the scientific community in this field. That said, I have a number of concerns about the theoretical background, hypotheses, and interpretation of the results. In particular, while the authors motivate their study as a way to test the theoretical accounts proposed by Christianson et al. (2001) and Slattery at al. (2013) against each other, the experimental results reported in the manuscript can actually be accounted for by either of the two accounts. Further, some aspects of the experimental design, in particular the inclusion of the animacy manipulation, are not sufficiently motivated in the current version of the manuscript. I also have some concerns about particular aspects of the statistical analyses. Finally, the results sections currently include a substantial amount of technical details, which makes the manuscript hard to follow for readers. I elaborate on all these concerns in more detail below, along with suggestions for how the manuscript may be fixed. In sum, the study strikes me as in principle publishable, but requires substantial corrections and rewriting. General issues 1. The description of the ambiguity resolution account proposed by Slattery and colleagues strikes me as somewhat inaccurate. The account does not claim that reanalysis is always successful and complete. Slattery and colleagues would certainly acknowledge that the processing of a garden-path sentence (just as the processing of any other kind of sentence with a complex syntactic structure) may occasionally fail. Instead, their key point is that the initial analysis may linger even if reanalysis is successful and complete, because the initially activated, but ultimately incorrect analysis leaves a memory trace. This issue is relevant for the present manuscript because it affects the hypotheses stated on page 11: As far as I can see, the results from the vive experiments are largely consistent with both Christianson et al’s (2001) and Slattery et al’s. (2013) theoretical accounts. As a side note, with regard to Slattery et al’s. (2013) account, there is also a number of recent studies which are directly relevant, but are currently not mentioned in the manuscript. In particular, I suggest that the authors check Cunnings (2017) and Fujita & Cunnings, 2020. There may well be additional relevant work on this issue that I am not aware of. 2. The rationale behind all experiments is based on the key assumption that the answers to post-sentence comprehension questions reflect a participant’s final interpretation of the sentence. However, while this assumption has also been made in a number of previous studies, is not entirely uncontroversial. Even if reanalysis is successful and complete, the initially-activated incorrect interpretation may leave a memory trace. This memory trace may then influence answers to post-sentence comprehension questions: When a participant gets to a post-sentence comprehension question, they must remember/reconstruct the previously-encountered sentence that the question refers to. This reconstruction process may be affected by the lingering initial interpretation of the sentence. This explanation is obviously directly related to Slattery et al.’s (2013) ambiguity resolution account. I suggest that this issue is taken into account when interpreting effects of ambiguity in post-sentence comprehension questions. In sum, I am not sure whether it is possible to differentiate between the different theoretical accounts of ambiguity resolution on the basis of the current results. 3. It is not entirely clear to me why the manipulation of animacy of the disambiguating noun was included in the experimental design. Both the motivation of the experimental design on page 8 and the hypotheses on page 11 only refer to a possible main effect of animacy (i.e. a general effect of animacy which occurs irrespective of whether the sentence is temporarily ambiguous or not), which is not particularly relevant for the research questions (quote from p. 8: “...leading to a lower response accuracy in both the garden-path and the control condition.”). In this respect, the short, interference-based justification for the inclusion of the animacy manipulation provided on page 8 is not quite sufficient. My suggestion would be to discuss how animacy may potentially interact with the other independent variables, particularly with the ambiguity manipulation. For instance, it is theoretically possible that verbs such as searched generally occur more often with inanimate objects than with animate ones. If this were the case, this would lead to additional difficulty when the disambiguating noun is animate. 4. I am a bit sceptical about the way in which question type (i.e. whether the comprehension question following the experimental sentence targets (a) the correct analysis of disambiguating noun, (b) the deactivation of the initial misanalysis, or (c) the correct analysis of the ambiguous noun) is utilized in the statistical analyses. In the model analyses, the authors treat question type as an independent variable. From a purely technical point of view (i.e. if we only look at the abstract properties of the experimental design), question type does indeed constitute an additional independent variable with three levels. At a content level, however, the three types of comprehension questions refer to qualitatively different aspects of the reanalysis process: The three different question types refer to different segments of the sentence, and thus each address a somewhat different research questions. As a results, the key issue is not so much whether the three levels of question type differ from each other (which is what the current analyses in the manuscript address), but how the effects of ambiguity look like within each level of question type. My suggestion would be to present an alternative analysis which takes this property of the question type manipulation into account: As a first step, 2x2 analyses which include only ambiguity (gp vs. non-gp) and animacy (animate vs. inanimate) could be presented. This is justified given that all three question types are similar in the sense that they all target the final interpretation. Following these 2x2 analyses, you could split up the data by question type and provide additional, separate analyses of the answers to the comprehension questions for each question type, in which you look at the effect of ambiguity (and animacy) separately for ‘qcor’, ‘qmis’, and ‘qpos’ questions. 5. Experiments 3 and 5 are essentially exact replications of Experiments 2 and 4, the only difference being that they rely on word-by-word self-paced reading rather than whole-sentence presentation. In principle, self-paced reading data is certainly of interest here, because it gives us insight into what happens during processing of the ambiguous sentence. However, the current version of the manuscript does not provide sufficient motivation for re-doing the exact same experiments again with self-paced reading. Also, both the motivation for the experiments and the hypotheses currently focus strongly on the final representation after processing of the sentence is complete, and do not refer to on-line measures. I suggest that short paragraphs are added to the sections for the two experiments, in which the authors explain what insights the following self-paced reading experiments can provide above and beyond what we already know from the previous whole-sentence experiment. 6. While the number of participants tested in the experimental studies is more or less acceptable for a web-based study of this kind, the experiment is considerably underpowered with regard to the number of items per condition that each participant encountered during testing: Given that each experiment is based on a 2x2x3 latin-square design and contained 48 experimental items, each participant got to see only 48/12 = 4 items per condition. My advice would be to at least provide a justification for why it was unavoidable to work with such a small number of items per condition. 7. I find the way in which the results are reported not very straightforward to read. As they are, the results sections for the five experiments are far too detailed and contain plenty of rather unimportant or technical information. While such a detailed description is certainly useful for the review process, a lot of content should be removed from the paper and instead be included in the supplementary materials, the repository, or in the R script for the analyses. I provide a few suggestions for how the Results sections can be made shorter and easier to read here, but suggest that the authors also look for additions options to achieve this. First, the model tables at the end occur in-between the references, presumably due to a converting error. Quite a few effects and coefficients reported in the model tables are also fully repeated in the text as well. It is enough to just report the exact coefficients only in tables, and to then discuss what the respective significant effects mean in the text. Second, Table 4 refers to a purely technical aspect of the analysis, and might actually confuse readers who are not very experienced with linear-mixed effects models. The table could be included in the supplementary materials or in the R script, but feels a bit misplaced in the manuscript. Third, with respect to the reading time analyses, while it is certainly justified to conduct the model analyses on transformed scores, inverse-transformed reading times are not very intuitive for the reader. Thus, I suggest that the descriptive Figures 2 and 5 should show either untransformed mean reading times or back-transformed values. Also, given that the first experimental manipulation occurs in Region 4, it does not make much sense to show results for the first three segments (In fact, it would be extremely weird if the conditions already differed at a point where they are still exactly identical). Again, these additional results could be moved to the supplementary materials or the repository. Line-by-line comments P. 13, l. 550ff. The authors state that the random-effects structure of their models included “intercept(s) for participants and items, the by-participants random slope for sentence type and a by-item random slope for sentence type”. However, given that all experimental manipulations included in the design were both within-participants and within items manipulations, the maximal random-effects structure (which should ideally be used in analyses of this kind, as long as the model reaches convergence; see Barr, Levy, Scheepers & Tily, 2013) would contain a number of additional random slopes which are not included in the analyses reported in the paper, such as random slopes by participants and items for the animacy manipulation, or random slopes for the interaction between ambiguity and animacy. I acknowledge that such a maximal model would probably not reach convergence in this particular case, but then the random-effects structure would have to be gradually simplified through a standardized procedure, rather than to just pick a simpler random-effects structure which reaches convergence. P.13, ll. 554 ff. The paragraph should include a justification for why these particular Helmert contrasts were chosen for the question type manipulation. Especially the second contrast is quite unusual: This is the difference between qmis-questions versus hypothetical questions which are somewhere in the middle between qcor and qpos questions. I find it hard to imagine how such a hypothetical question may look like. Also, I think quite a few readers will struggle to understand this paragraph. P.16, ll. 664 ff. The analysis of reading times for the disambiguating region relies on very minimal data trimming, with only reading times above 15000 ms and below 150 ms being considered outliers. If this were a self-paced reading study conducted in a controlled lab environment, this kind of minimal data trimming would perhaps be acceptable, but the fact that this is a web-based study (which makes it likely that the results contain a substantially larger number of extremely high outliers) calls for a more fine-grained check of whether the results are potentially contaminated by the influence of a small number of extremely high values, particularly given that LMERs are quite vulnerable with regard to the effect of such outliers. Also, if it really takes a participant 14999 ms to read a single word, there is definitely something highly unusual going on, so we should conclude that this data point does not reflect natural reading. My advice would be to apply standard cut-off criteria reported in the literature (e.g. exclude all data point which are more than 2 or 2.5 SD away from the overall region mean), and to check whether the key effects remain the same irrespective of which criterion for data trimming is applied. P. 33, Tables S1 and S2, and P. 34, Tables S3 and S4 These tables are mixed with the list of references. This is probably just a minor copy+paste or .pdf conversion error though. Reviewer #2: The paper investigates the nature of representations built when comprehending garden-path (GP) sentences. This is done in a series of five experiments with varied sentence presentation mode and question types, allowing for quantitative as well as qualitative analysis of response types. The authors come to a conclusion that the resulting representations are mixed: i.e., sometimes they contain an accurate representation along with the initial misanalysis, sometimes they combine multiple locally licensed representations, and in some cases they only encompass the initial misanalysis. In my opinion, this is rigorous work that has many strengths: - The research goals are worthwhile. The qualitative analysis of GP sentence representations is a valuable contribution that brings GP research closer to understanding ‘real-world’ scenarios of comprehending grammatically complex input. - The methodology is rigorous. I appreciated the design of the project, cleverly manipulating methodologies across experiments. I particularly liked the use of open-ended questions to probe sentence representations. It is nice to see new language material and an understudied language in GP research. The authors used open science practices: the experiments have been pre-registered, the data are publicly available. Sample sizes in all of the experiments are impressive. - The paper is well-written, well-structured and coherent. However, I also have several concerns and suggestions, please see below. Major points 1) For almost all of the results (as well as earlier results by Chromy, 2022), I have been wondering whether they can be due to offline (post-processing) effects. That is, readers may arrive at a coherent representation of the sentence – but then, faced with a comprehension question, they experience a cognitive overload and fail to retrieve correct information from their working memory. In other words, the readers do form a faithful and coherent representation but then, being overloaded with this challenging task followed by the need to also process the question, cannot stand the cognitive load. The post-processing interpretation seems particularly likely to me in light of interference effects (across experiments). The authors suggest that encoding is more difficult in the high-interference condition. But perhaps the interference effect emerges at a stage later than encoding – I would consider interference in working memory at the retrieval stage, post sentence processing. I would recommend the authors to consider post-processing effects throughout the paper. 2) I have several concerns about the choice of comprehension questions. I suggest to highlight these as limitations, if my understanding is correct. - It seems that all comprehension questions targeted the critical (GP) region of the sentence. There were no filler questions targeting other regions of the sentence. That means that when a participant encounters a grammatical GP sentence (without missing or redundant constituents, as in fillers), they can strategize to only focus on the GP region and ignore the rest of the sentence content. From my point of view, this is an important issue that undermines the ecological validity of GP sentence processing in the study. - Qpos questions (e.g., “Did the storekeeper have a van/a client”) seem rather trivial. I think that these can be answered using common sense and I have difficulty imagining reasons to make an error there. This issue is lightly touched upon in Discussion but I suggest highlighting it more. 3) Statistical models across experiments differ in what random slopes they include, and the selection is not intuitive to me. For example, in Experiment 1, the models include by-participants random slope for sentence type and a by-item random slope for animacy. What motivated the choice of specifically this combination? I suggest using a uniform approach to including random slopes across experiments and explaining it in the paper. 4) Error types are only reported in descriptive terms, without any inferential statistics. Would any inferential statistics on error types across experimental conditions be possible? If generalized linear mixed-effect models or logistic regressions are not feasible due to low numbers of errors, perhaps the authors may consider something like chi-square tests on the ratios of most critical response types across conditions. Minor points 1) I am not quite clear on how Experiment 1 fits in the global logic of the study. I was particularly confused by the following reasoning: “We can also see that while garden-path sentences (especially in the high-interference condition) are rated as less acceptable, the unambiguous versions score significantly better, suggesting that potential problems with processing should be indeed attributed to the manipulated variables and not to the general unacceptability of given structures.” The paper might benefit from a better explanation of what non-trivial results were expected from Experiment 1. 2) All reasoning in the paper seems to rest on the assumption that the readers always follow the garden path and then either recover or do not. In other words, the paper interprets the comprehenders’ behavior as different patterns of recovery from being garden-pathed (with varied success). Can’t there be cases when the readers do not follow the garden path and parse the sentence correctly in the first place? I suggest considering this possibility throughout the discussion of all findings. Typos - I noticed some “Exp” instead of “Experiment” (e.g., p. 25). - Question abbreviations (qpos, qwhose, etc.) might look better in text if highlighted with italics or capital letters. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-23-03278R1Garden-path sentences and the diversity of their (mis)representationsPLOS ONE Dear Dr. Ceháková, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jul 23 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Claudia Felser, Ph.D Academic Editor PLOS ONE Additional Editor Comments: Both reviewers acknowledge that you have addressed their previous comments very well, but in their second reviews both have raised a number of additional points which I am asking you to address in a final round of major revisions. The most serious point (raised by Reviewer #1) concerns the way your data was trimmed and what consequences your choices might have for your statistical results. Reviewer #2 also makes several helpful suggestions for further improving your manuscript. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The manuscript has considerably improved with regard to a number of issues raised in the previous review round. In particular, the authors have included a justification for replicating the two experiments based on whole-sentence presentation (Experiments 2 and 4) with self-paced reading (Experiments 3 and 5). The revised version also includes a clearer discussion of how the findings relate to theoretical accounts of lingering interpretations. Finally, the authors have also made a number of useful revisions to the background section which considerably improve readability and clarity of the manuscript. However, a crucial issue which definitely has to be addressed before publication of the manuscript can be considered is how data trimming/outlier exclusion was conducted in the statistical analyses, particularly for the critical regions in the two self-paced reading experiments. The analyses currently reported in the manuscript are based on an extremely liberal criterion for outlier exclusion, with extremely long reading times which clearly do not reflect natural reading still included in the analysis. A brief reanalysis for the reading time data from Experiment 3 that I have conducted with the data set provided on OSF suggests that the results may indeed be influenced by a small number of extreme values which have survived data trimming. I elaborate on this issue in considerable detail below, and have included some suggestions for how this issue can be fixed. Another, relatively minor issue which should still be addressed is that, while readability of the manuscript overall has considerably improved, the Results sections for the five experiments are still difficult to follow and contain a multitude of tables and figures. In sum, I have read the revised version with great interest, and I believe that the five experiments reported in the manuscript potentially constitute a valuable contribution to the field. That said, the issue of outlier exclusion strikes me as particularly important: It obviously makes a crucial difference whether the garden-path effects reported in the manuscript really exist, or whether they are caused by a somewhat larger proportion of extreme values in the garden-path than in the control conditions. The issue therefore has to be addressed in a responsible manner. Outlier Exclusion - In the first version of the manuscript, the authors had relied on an extremely liberal method for outlier exclusion, and had excluded only very extreme reading times above 15000ms. Even in the revised version, the cut-offs applied (6500ms for Experiment 3, 5500ms for Experiment 5) are far too liberal for web-based self-paced reading experiments of this kind. Given that the critical disambiguating regions in these experiments consist of a single 6-8 letter word, even a reading time of more than 2 seconds is a clear indication that the measure does not reflect natural reading. For the present study, however, even a reading time of 5000 ms is not excluded from the analysis. The authors justify their approach to outlier exclusion by arguing that they have followed the recommendations from Baayen & Milin’s (2010) tutorial for reaction time analysis. However, applying the approach suggested in this tutorial to the present data set mechanically (i.e. without considering what the measures actually represent in this specific study) strikes me as not quite suitable, for at least two reasons: First, Baayen & Milin’s recommendation to only conduct minimal data trimming and to remove only clearly discontinuous reading times refers to the analysis of reaction times in general, and does not take into account what these reaction time measures represent in a concrete study (such as properties of the stimulus for which the reaction times were measured). Thus, their recommended approach is based exclusively on the distribution of data points, and does not discuss any content-based common-sense criteria for what constitutes an outlier. For the reading times measured in the present study, we know that a healthy, fully literate, adult native speaker of a language should definitely be able to read a single 7-letter word in considerably less than 2 seconds. As a result, an extreme reading time of, for instance, 5000ms should definitely be considered an outlier, irrespective of the distribution of data points. Second, Baayen & Milin’s tutorial refers to experimental reaction time studies conducted in a controlled lab environment. Data sets from such studies typically contain only a relatively small number of outliers because the controlled surroundings ensure that a participant only rarely gets distracted from the task. In the current web-based study, in contrast, the experimenter had no control of the participant’s surroundings. As a result, outliers are a potentially much more severe issue in web-based studies. In sum, I thus strongly recommend that the authors revise their analyses and rely on an established approach to outlier exclusion used in previous self-paced reading studies (e.g. first exclude extreme values which clearly do not reflect natural reading, then also exclude reading times which are more than 2 standard deviations above or below the overall mean reading time for the respective segment). Note that the influence of outliers is a particularly crucial issue here, particularly given that linear-mixed effects models are known to be fairly vulnerable when it comes to the influence of a small number of extreme data points. I thus conducted a brief analysis for the critical disambiguating segment in Experiment 3 myself, with the data provided by the authors on OSF. The results from this more conservative analysis suggest that the crucial effect of ambiguity reported for the disambiguating segment in Experiment 3 is at least partly driven by the influence of a small number of outliers in the data, and becomes considerably smaller when these outliers are excluded: If I conduct the analysis in the same way as in the manuscript (i.e. exclusion of data points above 6500 ms and below 150 ms), mean reading times by condition for the disambiguating segment (i.e. Segment 7) obviously show the very same data pattern as in Figure 2: Garden-path, animate: 694 ms (SD: 642) Non-garden-path, animate: 602 ms (SD: 462) Garden-path, inanimate: 625 ms (SD: 521) Non-garden-path, inanimate: 548 ms (SD: 376) These results show clear numerical trends for a garden-path effect, with a 92 ms difference between garden-path- and non-garden path trials for animate items and 77 ms for inanimate items (Note, however, that the huge standard deviations already suggest a presence of a small number of extreme data points.). However, if data points which are more than 2 standard deviations above or below the overall mean reading time for the segment are excluded (an established procedure for outlier exclusion used in a number of previous reading studies), the respective garden path-effects are very considerably smaller, with only a 27 ms difference for animate and 25 ms for inanimate items: Garden-path, animate: 558 ms (SD: 303) Non-garden-path, animate: 531 ms (SD: 272) Garden-path, inanimate: 534 ms (SD: 275) Non-garden-path, inanimate: 509 ms (SD: 234) In sum, all this calls for a through exploration of the potential influence of outliers in the data, for all experiments reported in the manuscript. It is crucial to determine whether the key effects reported in the manuscript really reflect longer average reading times for garden-path- than for non-garden-path sentences, or whether the effect is instead driven by a slightly higher number of outliers in the garden-path conditions. Additional minor issues: - In their response letter, the authors say that, based on reviewer suggestions in the previous review round, they have revised Figures 2 and 5 so that they show only segments after the experimental manipulation, and also display untransformed reading times instead of inverse-transformed values. However, these updated graphs are not included in the revised version of the manuscript. I suspect this is just due to some sort of converting error. - The analyses are still considerably too detailed; the revised manuscript contains far too many tables and graphs. Some of the tables should either be merged, included in the supplementary materials, moved to OSF, or deleted entirely (with the key results instead reported in the text). - As suggested in the previous review round, the authors have added an additional motivation for re-doing Experiments 2 and 4 with self-paced reading (i.e. Experiments 3 and 5) on page 19 (ll. 752-765). While it strikes me as reasonable to motivate these studies with reference to the possibility to look at garden-path effects separately for each segment, arguing that self-paced reading is interesting because it rules out the possibility to go back to a previous segment of the sentence feels a bit weird to me. This essentially characterizes self-paced reading as an artificial research method which does not allow participants to read the sentences naturally. - In the paragraph discussing issues of statistical power (particularly the number of items per condition each participant encounters during test session) in the Discussion section on page 39, it strikes me as unnecessary to characterize this as a crucial limitation. My suggestion would be to instead include a brief explanation of why it was not possible or problematic to construct a larger set of materials, for instance to do properties of the language. Reviewer #2: The authors have carefully revised the manuscript and addressed my concerns very well. I think that the manuscript has much improved, particularly in the interpretation of the evidence. I only have remaining minor comments and suggestions. Minor points 1) The manuscript has become quite lengthy. Of course, it is up to the editors to decide if that is fine, but my advice is to shorten it to improve readability. These are some examples of places that may be shortened: - Detailed descriptions of previous studies in the Introduction; - Motivation of design of Experiment 3 (lines 752-765); - Perhaps some more of the overlapping information in Methods of Experiments 3-5; - Discussion of methodological aspects in the Discussion section (lines 1421-1497) – these are important but can be expressed with fewer words. 2) I would like to clarify my minor point #2 from the previous review round. I meant not only individual differences between readers in whether they are susceptible to garden-path effects but also across-trial differences in whether the reader follows the garden path. In other words, I meant that possibly in some trials, the participants did not follow the garden path at all and parsed the sentence correctly in the first place. Perhaps the authors could consider whether any of the findings could be accounted by this processing pattern, rather than by any patterns of recovery from being garden-pathed. However, I do not insist on incorporating this into the manuscript. 3) Phrasing of some of the hypotheses has become somewhat vague: - Hypothesis of Experiment 1 (line 573-574): “there will not be a big difference any significant differences in response accuracy”; - Hypothesis of Experiment 4 (line 945): “there should be no (or little) difference in response accuracy”. I feel that it is quite vague to discuss “big differences” or “little differences” (unless we are discussing effect sizes, which is not the case here), so wording in terms of statistical significance would be more rigorous. 4) I apologize for missing this issue at my first reading but I do not agree with the authors’ description of heuristics with reference to Ferreira & Patson (2007). The authors claim that the heuristic is “the tendency to not inhibit the initial misanalysis of garden-path sentences”. But Ferreira & Parson (2007) use the term ‘heuristics’ to describe the parsing mechanisms per se (for example, late closure heuristic, minimal attachment heuristic, reliance on plausibility, etc.) rather than any mechanisms used to deal with conflicting parses. In other words, the initial misanalysis of garden-path sentences is indeed created based on heuristics but the tendency to not inhibit the initial misanalysis is a separate phenomenon. 5) I suggest some editorial changes: - Lines 1084-1085: “These results are once 1084 again in line with previously mentioned accounts of garden-path sentence processing” – Please clarify which accounts are meant. - Line 1322-1323: “Slattery and colleagues [21] do not claim that reanalysis of GP sentences does not fail occasionally” – Double negation is difficult to process, please rephrase. - Lines 1337-1339: “It has already been shown that comprehenders differ tremendously in what (if any) re-reading strategies they use during reanalysis of GP sentences [42,43]” – This sentence looks stranded in this particular spot. I would consider moving this idea to the new paragraph on individual differences (line 1410 and further). 6) Please proofread the paper for punctuation and possibly typos, for example: - Line 270: “open ended” – A hyphen is missing. - Lines 452-454: “as well as the experimental items being created specifically for the purposes of our experiments and did not come from actual usage” – Perhaps “were created” is meant. - Line 1086: “However, we would like to note, that we also detected” – There is an unnecessary comma. - Lines 1214-1215: “for example while answering the comprehension questions” – A comma is missing. - Lines 1274-1275: “(they successfully attached the 1274 disambiguating noun to the structure and they also reanalyzed the ambiguous noun.)” – The full stop should follow the bracket. - Line 1346-1347: Perhaps there should be no paragraph break here. - Line 1371: “similarity based interference” - A hyphen is missing. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 2 |
|
Garden-path sentences and the diversity of their (mis)representations PONE-D-23-03278R2 Dear Dr. Ceháková, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Claudia Felser, Ph.D Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
| Formally Accepted |
|
PONE-D-23-03278R2 Garden-path sentences and the diversity of their (mis)representations Dear Dr. Ceháková: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Claudia Felser Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .