Peer Review History

Original SubmissionNovember 22, 2019
Decision Letter - Jianjiong Gao, Editor

PONE-D-19-32466

Advancing clinical cohort selection with genomics analysis on a distributed platform

PLOS ONE

Dear Ms Smith,

Thank you for submitting your manuscript to PLOS ONE. Both reviewers found your work represent an important technical advance, but they also raised some concerns. Particularly, they suggested that example workflows and demos should be included and/or made public. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. 

We would appreciate receiving your revised manuscript by Jan 25 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Jianjiong Gao

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

1. Thank you for including your competing interests statement; "J Smith, M Lathara, H Wright, B Hill, N Ganapati, and G Srinivasa are sponsored by Omics Data Automation, Inc. Competing interests have been fully disclosed and arranged through contract and licensing agreements with UCLA David Geffen School of Medicine. The authors declare that they have no other competing interests."

We note that you received funding from a commercial source:Omics Data Automation, Inc.

Please provide an amended Competing Interests Statement that explicitly states this commercial funder, along with any other relevant declarations relating to employment, consultancy, patents, products in development, marketed products, etc.

Within this Competing Interests Statement, please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your amended Competing Interests Statement within your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper provides a nice technical write up of extending GenomicsDB to handle distributed file systems and benchmarking on a real set of genomic data associated to clinical data. Notably including cloud-native work to use S3 which provides resiliency and potential for the underlying data to be reused by other applications. Profiling was performed for both the ingest of the data, which is often overlooked, and the query times. This gave a good sense of how the system would perform in reality with microarray and whole exome data. Finally, an example of a potential application tied with clinical data querying in i2b2 is described.

Fairly minor adjustments that may benefit the paper:

1) The authors note that the microarray data has "log-scale potential" while the exome data is linear. Further insight into why this might be would be useful for the audience, especially as increasingly whole genome data is available and would be interesting to understand if the expectation of linear time would still hold?

2) It would be nice to include the specific queries run as well as the descriptions. The main use case here is for regions of genomes - were other forms of queries considered?

3) While the extension itself is in Github as part of GenomicsDB, the GDBSpark API, i2b2 and Zeppelin demo are all available "upon request". I would highly encourage the authors to make this available on Github in some form as well, even if the licensing only covers academic use.

Reviewer #2: This paper by Smith et. al. represents an important advance in the interactive analytic methods for phenotype and genotype to be mutually interacted upon and enable the basis of nearly real time query capability. The unification of i2b2 and GenomeDB means that a powerful phenotyping database and a powerful genotyping database can operate in concert to enable researchers and clinicians to actively explore the observational data flowing from clinical care.

The architecture of the application takes advantage of the strengths of both the i2b2 and the GenomeDB platforms by connecting the two using an i2b2 plug-in. This always the solution to be maintainable as the backward compatibility of the i2b2 platform is guaranteed for its web services that interconnect the plug-ins.

Overall the paper does an excellent job conveying the technical features of how GenomeDB provides a scalable and interoperable platform for genome analysis with i2b2, however it would make the paper more readable if the paper included some example workflows for specific phenotype/genotype query lines. By this is meant that example queries and analyses could be presented to more clearly show how the tools could be used in the course of discovery of new genomic variant associations with phenotypes, and analysis of GWAS and PheWAS experiments. Especially useful would be some screen shots along the way to show the evolution of the analysis to give the reader the sense of how the queries and output would look to the researcher using the tools.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

Revision 1

All responses to the reviewer can be found in the Response to Reviewers letter. The responses are provided below:

Thank you for facilitating the prompt review of our manuscript entitled “Advancing Clinical Cohort Selection with Genomics Analysis on a Distributed Platform” by Smith et al, that was submitted for consideration in the digital heath call for papers. We were very glad that you and the reviewers felt our work represented an important technical advance. Using reviewer feedback as a guide, we have made further improvements to the manuscript that we have itemized as follows:

Reviewer 1. Overall comments were very positive. Reviewer 1 did include a short list of suggested “minor adjustments that may benefit the paper”. We respond to them in order as follows:

1) Suggested clarification regarding the log-scale potential of microarray data and how these results could expand into whole genome sequencing datasets.

This is an important point, and we have extended the discussion section (page 24) to reflect this. In summary, the results suggest strong correlation between import and query time and the amount of data involved. Smaller data sizes will suffer more from start up costs without much gain, whereas the larger datasets are able to take more advantage of the distributed processing environment. Given the microarray data starts to show linear behavior for the largest query region, and that the much heavier, whole exome data continues to show linear behavior for this query, suggests that this trend will continue for heavier, whole genome data.

2) Suggested including specific queries run with the I2B2 plugin module.

We have extended the introduction (pages 5,6) with I2B2 workflow examples that can be used as motivating examples. In addition, the experiments section now describes further clinical filters that were explored (page 21), and provides an explanation for the types of genomic regions considered (page 19).

3) Making code for the I2B2 plugin and the GDBSparkAPI available for academic use was highly encouraged.

We very much agree with this sentiment. Facilitating implementation and use of these tools in a generalizable way in academic settings, has been a goal of this project from its inception. With this in mind, we have made a version of the i2b2 plugin available under the MIT license, and the GDBSparkAPI and gdb-mapping database available for academic use. These components can be found at https://github.com/OmicsDataAutomation/i2b2-oda-framework. All GenomicsDB components are available at: https://github.com/GenomicsDB/GenomicsDB for free academic use under the MIT license. I2B2 patient querying related components are available at https://www.I2B2.org/software/index.html.

Reviewer 2. Overall comments were again very positive. This reviewer also requested example queries demonstrating the potential utility of these tools to researchers. We address this in item 2) above.

Once again, thank you for the prompt and thorough reviews. We very much hope that you now find our revised manuscript to be suitable for publication in PLOS ONE. We look forward to hearing from you at your earliest convenience.

Attachments
Attachment
Submitted filename: Response To Reviewers.docx
Decision Letter - Jianjiong Gao, Editor

Advancing clinical cohort selection with genomics analysis on a distributed platform

PONE-D-19-32466R1

Dear Dr. Smith,

We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements.

Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication.

Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

With kind regards,

Jianjiong Gao

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Formally Accepted
Acceptance Letter - Jianjiong Gao, Editor

PONE-D-19-32466R1

Advancing clinical cohort selection with genomics analysis on a distributed platform

Dear Dr. Smith:

I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

For any other questions or concerns, please email plosone@plos.org.

Thank you for submitting your work to PLOS ONE.

With kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jianjiong Gao

Academic Editor

PLOS ONE

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .