Peer Review History

Original SubmissionJune 15, 2020
Decision Letter - Jacopo Soldani, Editor

PONE-D-20-18285

Detection of FLOSS version release events from Stack Overflow message data

PLOS ONE

Dear Dr. Sokolovsky,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Oct 17 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Jacopo Soldani

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please clarify whether all terms and conditions of the websites, software and datasets used were complied with in the data collection and data sharing processes.

3.   We note that Figures in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (a) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (b) remove the figures from your submission:

a)   You may seek permission from the original copyright holder of Figure(s) [#] to publish the content specifically under the CC BY 4.0 license.

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. 

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

b)   If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: No

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This article proposes a logistic regression based models for the detection of micro-events from textual data using messages from SO Q&A platform to detect FLOSS version release events. The LR models utilise a feature space composed of a selected number of LDA topics and sentiment analysis features. The article is fairly well-written, particularly the Introduction and Background material which provide an ample literature review of related work. The article also provides sufficient descriptions of the models’ statistics, evaluation and analysis, which seemed to have been performed to a good technical standard. Overall, it is obvious to me that the reported study involves a significant and technically sound work, and deserves publication. However, the manuscript also needs a fair amount of revision (which in my opinion is more than a minor revision) to address the following observations:

• The article requires improvement in terms of structure, organisation and flow of information. As it is, the manuscript falls short of providing a clear and easy to follow structure of the study and proposed models/processes, and the reader has to read the article multiple times to obtain a full-picture of the whole reported work and how each process leads to the next. The descriptions seem to go back and forth in places and, hence, the authors need to make sure that the article provides, with the help of better illustrations, simple description of the various phases of the study, their sequences and associated processes, and how each leads to the next.

• The article in my opinion needs some focus in its aim: initially, and as the title indicates, the focus seems to be on developing models for a successful micro-events detector/estimator. However, later on and once the performance of the estimators was reported to be insignificant, the focus seems to shift predominantly to ‘better understand the data’ and generating synthetic data to find the detectability threshold. Again, I would like to emphasis the quality and standard of presented work here, however the authors should also declare a clear aim/focus of this article one way or another right from the beginning, reflect that in tile properly and stick to this aim.

• The developed synthetic data generator is one of the main contributions of this work. However, I am not sure how such a finely tuned and controlled synthetic dataset is going to help the development of a detector that generalises to a non-synthetic domain without fine-tuning on any real world data.

• Relating to above, I would have liked to see the authors focusing more on improving the performance of the proposed detector pipeline via, for example, attempting and comparing other topic modelling techniques such as LSA, pLSA and Deep LDA, etc., and adjusting the NLP tools of the sentiment analysis to targeted domain (issues the authors have already discussed as limitations of the study). In this respect, the authors should also try to deviate a bit from their target to limit the number of variables in the feature space (which seems to me solely based on the claim of ‘curse of dimensionality’, which is known to be dataset and algorithm dependent and has well-established remedies).

• Manuscript contains few typos and I suggest the authors proofread it again and correct (quick example is on p.8 in the Model Performance Section: ‘… we compute a mean of PR-AUC - event as a positive label and no-event is a positive label.’)

• I have noted that this exact and full manuscript currently appears on: 1) the arXiv archive by Cornell University ( at https://arxiv.org/abs/2003.14257), and on researchgate.net (at https://www.researchgate.net/publication/340331989_Detection_of_FLOSS_version_release_events_from_Stack_Overflow_message_data), and would like to advise the authors to check with PLOS ONE publications policy and terms and conditions.

Reviewer #2: The paper provides a methodology to detect floss version releases through an event detection process. Overall, the process of data collection, pre-processing steps, feature construction (i.e. sentiments, topics) and the ML algorithms used to evaluate the event detection process are presented clearly. Additionally, the event detection time window formation is understandable. However, some points need to be improved:

1. The introduction is rather like a report and contains different parts of information. I suggest the authors to improve the flow in the text and connect with logical cohesion the different parts.

2. Also, it is not mandatory, but most research approaches form more than one RQs. You could reform the structure and add more research questions i.e. evaluating which is the best ML algorithm to perform event detection in SO post.

3. Similar to the introduction section also the “Background material and related work” section needs better connection between paragraphs.

4. Additionally, for the related work it could be helpful for readers to add a comparison between different algorithms used to identify events from different online sources such as the Twitter (i.e. https://dl.acm.org/doi/10.1145/1978942.1978975). Identify which is the target source, which is the data analysis method, which option of event detection authors used etc. and compare with your own research.

5. Additionally for the bibliography part there are many works which use LDA in Stack Overflow posts (e.g., https://link.springer.com/article/10.1007/s10664-012-9231-y or https://link.springer.com/article/10.1007/s11390-016-1672-0). I would expect to see some discussion for this in the bibliography part.

6. Moreover there are work which identify from Stack Overflow posts technology releases i.e. http://das.encs.concordia.ca/uploads/abdellatif_msr2020.pdf (fig 3). So, I would expect to see a refence for this as well.

7. In “Pre-processing” subsection for sentiment analysis calculation please explain why you select to use Vader method and not use a sentiment analysis approach which is constructed for developers’ post such as https://link.springer.com/article/10.1007/s10664-017-9546-9

8. Additionally, in the same subsection you can add a graph which depicts the results of coherence measure used to select the appropriate number of topics (i.e. https://link.springer.com/article/10.1007/s10664-020-09819-6 see Fig. 12)

9. In the “Analysis pipelines” subsection the analysis for the construction of Logistic Regression and Random Forest is good. However why do you select this ML method? How about an SVM approach?

10. I suggest adding a section after “Limitation” section with implications for practitioners. For example, use case scenarios on how someone can use your research approach to find a new FLOSS version.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

We are responding to the reviewers and editor comments in a separate document which was uploaded earlier.

Attachments
Attachment
Submitted filename: response_to_reviewers.pdf
Decision Letter - Jacopo Soldani, Editor

Is it feasible to detect FLOSS version release events from textual messages? A case study on Stack Overflow

PONE-D-20-18285R1

Dear Dr. Sokolovsky,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jacopo Soldani

Academic Editor

PLOS ONE

- - - - - - - - - - - - - - - - - 

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Thank you for addressing my comments and, yes, your paper has now been improved technically and readability wise.

Reviewer #2: I checked that authors fullfilled all my comments, even the one with the topics evolution. So I believe that their work is acceptable now.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Professor Abdulhussain E. Mahdi (known as Hussain Mahdi)

Reviewer #2: No

Formally Accepted
Acceptance Letter - Jacopo Soldani, Editor

PONE-D-20-18285R1

Is it feasible to detect FLOSS version release events from textual messages? A case study on Stack Overflow 

Dear Dr. Sokolovsky:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jacopo Soldani

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .