Peer Review History
| Original SubmissionNovember 27, 2020 |
|---|
|
PONE-D-20-37326 A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data PLOS ONE Dear Dr. De Toni, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Both reviewers raised a number of important issues that must be addressed point by point. I want to add that the details of the training and validation of the machine learning methods are lacking. For instance, no details whatsoever are given as to how crossvalidation was pursued. Also, in choosing a linear regression, why do you assume that "the correlation between the influenza incidence and the page views of Wikipedia was linear"? Do you have supporting evidence for that assumption? At least provide a little bit of an intuition for that choice. One may think that the correlation is not linear, for instance that there would be more of threshold behavior, whereby only after a certain flu incidence would page views correlate, no? Please submit your revised manuscript by Mar 08 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Luis M. Rocha, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The present manuscript presents a methodology to nowcast influenza based on Wikipedia page visits. The method is interesting, the subject timely, and has the potential to be broadly applied to different languages. Using human curation to pre-select features is also a potentially relevant approach. However, the manuscript as presented has some concerning issues, discussed below. Moreover, the authors' bold claim in the discussion that they "have shown the feasibility of using automatic methods to extract relevant predictors ..." has no support. None of the feature-selection methods are new and the model performance is poor. Below is a point by point discussion of the paper. Major points 1. Nowcasting: The model is intended to perform nowcasting, which typically implies training ans testing models on past data to monitor ongoing outbreaks. However, the authors are testing the model against whole seasons where the data is standardized (at least the Wikipedia dataset is - line 82). How could this be achieved without having the full data of the season? How well would the model(s) perform in a scenario of "true" nowcasting (ie, predict the next week(s) based on the last few weeks)? 2. Quality of the models: The prediction of the peak week is more interesting from the point of view of nowcasting (although concerns about standardization remain). However, judging it in relation to the ground truth is only one of the relevant metrics. Nowadays medical systems make predictions about peaks based on epidemiological data and models. How better is your model in relation to these more traditional approaches? Unfortunately the models behave very poorly against the ground truth (table 4). This raises the question of how useful are they. Also, the Categories model misses a peak altogether in Germany (2nd panel of fig.1) and this should be discussed. 3. Differences between tools: The interpretation of the differences between the two datasets PV and PC+PV is not straightforward. The authors discuss it in terms of having more or less data, concluding that, for cycle rank and categories, more data is better while for pagerank it is worse. There is not enough evidence to say this because PC+PV not only has more data but it has different data (ex: PC includes bot visits). Also, there is no obviously better method or dataset, with them methods working differently in different countries. This raises concerns regarding generabillity. 4. Comparison to previous works: there have been many other efforts of nowcasting using Google searches, Wikipedia visits, on-call phone systems, self-reported symptom apps, etc. The authors should discuss their findings and model in comparison to previous published work and how their approach improves current thinking. Minor points 1. Page 2, starting at line 66 - It would be good to include some of Wikipedia's methodology to arrive at these numbers since provenance of visits is important for the results of the paper. 2. Why was performance measured only with mean Pearson correlation? I would like to see both R^2 and Mean squared error as well. 3. In section 6, feature analysis. It is unclear why the authors focus only on pages positively correlated with cases. Negative correlations can be just as useful (albeit more difficult to interpret). The paper has multiple typos. Here is a non-exhaustive list: Legend of table 1 should be "... German and Dutch translations..." lijne 121 "...in order to avoid making unnecessary..." line 129 till the end of that paragraph. Replace Wikipedia's pages with Wikipedia pages line 160/161 "...circular walks that start and end..." line 348 "...require conducting..." Finally, the quality of the figures is very poor but we assume it will be improved before publication. Reviewer #2: This paper describes a work aimed at applying a machine learning model on Wikipedia’s page views of a selected group of articles to obtain accurate estimates of influenza-like illnesses incidence in four European countries: Italy, Germany, Belgium, and the Netherlands. This effort falls in a decade long research line aimed at using non traditional data sources generated by digital platforms to track and now-cast the Influenza-like Illness circulation among the general population. The goal is to extend the already existing studies in the USA context to a broader Europe-focussed one. It is a very interesting attempt that, on the other hand, has still some issues. Some comments are reported in the following: - in the introduction, only a little of the decade-long efforts and scientific studies to nowcast and forecast influenza through digital data (not necessarily with machine learning) are mentioned. - how do they avoid over fitting and using features that are highly correlated? Feature selection is performed by the machine learning approach but still for each model tens of features are used. Did the authors check for collinearity? - can this approach be used for other countries/languages? Wikipedia provides the country from which the pageviews are generated, so even for pages in English or French, it is possible to disentangle the provenance of the clicks. - what is the use in public health, given the not-so-accurate estimate of the peak? Would a different regression model help on this? - would the integration of these data with other digital data or traditional surveillance data help in making a more accurate estimate of the peak? See paper by Bronwsntein https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004513 and Perrotta: https://dl.acm.org/doi/10.1145/3038912.3052670 ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-20-37326R1 A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data PLOS ONE Dear Dr. De Toni, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Thank you for thoroughly addressing the reviewer comments. The paper is much improved by addressing all the concerns. In the next version, please address the few reviewer major points still left. I will then confirm those were addressed and will not need to send paper to additional review for publications. Specifically: A1.2 A mention of whether using Wikipedia data is better than traditional compartment models A 1.3 the discussion of PV versus PV+PC a discussion of whether using more data improves the models. A 1.4 Specify what is meant by "Google searches not being available to researchers". Please submit your revised manuscript by Jun 26 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Luis M. Rocha, Ph.D. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors did a thorough job in addressing the concerns and the manuscript is significantly improved as a result. Particularly the addition of the GLM model helps justify the bolder claims. Major points: A 1.1 Lines 106 to 121 now address this concern. The mean and standard deviation used for the data transformation now come from the training dataset and not the test set. A 1.2 I am afraid we were not clear in our previous comments. The medical authorities in each country typically use compartmental models based on SIR, that rely on information from previous weeks (and/or seasons). The question is whether using Wikipedia data to predict influenza is better than these traditional models. We understand this may be beyond the scope of the paper but would like to see it mentioned in the discussion. The GLM model visibly improves predictions and was a good addition to the paper. A 1.3 the discussion of PV versus PV+PC is now more complete but did not fully address our concerns. The fact that PV and PC data are different, limits the discussion of “more vs. better”. For example, how would the results look like if only PC data was used? This is no longer a key claim of the paper, but we would still argue that the authors have not consistently shown that using more data improves the models. A 1.4 The discussion of previous work is now more complete. As a small note, we are not sure what the authors mean about Google searches not being available to researchers. It is true that the underlying method/algorithm is not known, but normalized data is: https://trends.google.com/trends/ Minor points: A1.5 Solved A 1.6 Solved A 1.7 This could be circumvented and used to improve the model but it’s a minor point, just for the author’s consideration A 1.8 We noticed some more typos (nothing like fresh eyes) and list them below hoping there are useful: line 45, should be "Center for Disease Control and Prevention (CDC)", singular. line 85, the word “only” is repeated "The Wikimedia Foundation, which runs Wikipedia's servers, only provides information about countries only at an aggregated level" line 413 Italian and German should be capitalized line 414 same for Belgian and Dutch A 1.9 Solved ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 2 |
|
PONE-D-20-37326R2 A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data PLOS ONE Dear Dr. De Toni, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Thank you for addressing the final reviewer comments. However, your response to R.1.3 (Google trends data) did not make it in any form to the text of the article, and it should. Can you summarize your points and add them to discussion in paper? On that note, I understand your second and third point (normalized data and studying Wikipedia data separately), but your first point (that Google may stop making google trends data available) is too much of a stretch... Google trends has been available for a long time. While, of course, Google can change that in future, its availability has been as stable as Wikipedia for several years and indeed scientists include GT data in many working models. So the paper is best served by acknowledging that GT data could be included, but you chose not to due to it being normalized and wanting to study Wikipedia independently. Another issue is that the paper would really profit from a final careful proofreading, ideally by a professional editor. Using the latter is your choice, but please do a careful editing for improving English readability. For instance, a passage such as: "The models trained with the PC dataset result to be worse than the models trained with either PV or PV+PC. Since the PC dataset showed to be inferior to the other two datasets, we did not perform additional analysis on the models trained with it." Could be simplified to: "Since models trained with the PC dataset perform worse than models trained with either PV or PV+PC, we did not perform additional analysis of the former." So I encourage a careful reading/editing to increase readability. Please submit your revised manuscript by Jul 17 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Luis M. Rocha, Ph.D. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 3 |
|
A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data PONE-D-20-37326R3 Dear Dr. De Toni, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Thank you for addressing the final issues (especially the GT discussion). I apologize for some delay accepting this latest version, but I was on a family vacation for the last 2 weeks. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Luis M. Rocha, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
| Formally Accepted |
|
PONE-D-20-37326R3 A general method for estimating the prevalence of Influenza-Like-Symptoms with Wikipedia data Dear Dr. De Toni: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Luis M. Rocha Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .