Peer Review History
| Original SubmissionNovember 11, 2024 |
|---|
|
Dear Dr. Pate, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Based on the reviewers’ assessments, the manuscript presents a valuable contribution to the field of EHR analysis by introducing the rcprd R package. However, some revisions are necessary before acceptance. Required Revisions: Performance and Scalability – The authors should include a comparison of runtime and RAM usage when extracting data from an SQLite database versus looping through raw .txt files. A practical case study with real CPRD data (if feasible) would further strengthen the manuscript. Comparison with Existing Tools – A direct comparison with other relevant packages, such as rEHR and aurumpipeline, should be included to clarify the advantages of rcprd. Data Quality and Extraction Functions – The authors should clarify how unrealistic date values and general diabetes codes are handled in the extraction functions. Additionally, confirming whether SQLite supports date formats would be useful. Security Considerations – Explicitly state the importance of using a secure device when adding raw CPRD data to an SQLite database. Please submit your revised manuscript by Apr 26 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, B. Sivakumar Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements.-->--> -->-->Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at -->-->https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and -->-->https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf-->--> -->-->2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.-->--> -->-->3. Thank you for stating the following financial disclosure: -->--> This research was funded by The National Institute for Health Research (NIHR) School for Primary Care Research (SPCR) (reference: NIHR SPCR-2021-2026, grant number 648) and Endeavour Health Charitable Trust. The views expressed are those of the authors and not necessarily those of the NIHR, the Department of Health and Social Care, or Endeavour Health.-->--> -->-->Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." -->-->If this statement is not correct you must amend it as needed. -->-->Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.-->--> -->-->4. Thank you for stating the following in the Acknowledgments Section of your manuscript: -->-->Dr. Brian McMillan was heavily involved with the acquisition of funding which supports the work done in this manuscript.-->--> -->-->We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. -->-->Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: -->-->This research was funded by The National Institute for Health Research (NIHR) School for Primary Care Research (SPCR) (reference: NIHR SPCR-2021-2026, grant number 648) and Endeavour Health Charitable Trust. The views expressed are those of the authors and not necessarily those of the NIHR, the Department of Health and Social Care, or Endeavour Health. -->--> -->-->Please include your amended statements within your cover letter; we will change the online submission form on your behalf.-->--> -->-->5. In the online submission form, you indicated that your data will be submitted to a repository upon acceptance. We strongly recommend all authors deposit their data before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire minimal dataset will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption.-->?> [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ********** Reviewer #1: **Summary** The authors introduce a new R package for simplifying data extraction and cleaning data for CPRD Aurum. Building on older but no longer maintained R packages, the authors present a software package storing the data as SQLite database. This is valuable tool for researchers using large electronic health records, given the time consuming data processing with pre-built functions to extract variables and with some included algorithms for data cleaning. Overall, the paper and software package are very promising and useful but could be improved by clarifying some of the functionalities of the package and comparing the runtime to existing extraction approaches. **Comments to the authors** General comments All worked examples work well. As a researcher who is regularly using CPRD, the main advantage of using a package as the one presented by the authors would be an improvement of runtime and less local RAM use when cleaning big patient dataset. I was wondering if the authors could present some comparisons of RAM use and runtime by extracting data from the SQLite database versus by looping through the CPRD Aurum raw data .txt files? As someone using datasets with up to 20mio patients, this would be the selling point of using the new R package. Introduction As this paper is targeted for an audience which has not used SQLite before, I’d suggest to add a half-sentence to explain how an SQLite database on a fixed storage device can align with the data storage requirements of CPRD usage (e.g., can be stored on a secure sever). 2.2 Recommended process for extraction Cohort specification is often done with type 1 linkage of the data, i.e., using additional data from HES or the ONS mortality dataset. Could the authors clarify whether it is possible to use linked data for the cohort definition in Step 2.2? 3.2.1 Add individual files to SQLite database using add_to_database When creating an SQLite database, it would be good to remind the reader that this should be done on a secure device as the following functions add raw CPRD data to the database. 3.2.1 Add individual files to SQLite database using add_to_database Is it possible to store dates in a date format within the SQLite database? It can be sometimes very helpful to inspect the dataset for unrealistic date values which is more difficult if the dates are stored as a number. 3.3.1 Functions for extracting common variable types Could the authors clarify some of the functionalities around the extraction functions: • Is the index date of each function applied to the observation date or the enter date of a medical record? • Do the function check for unrealistic dates, e.g., events recorded in 1899 or before the patient was born? These are common occurrences in CPRD data and should be removed from the final dataset. Code lists are often include different categories of codes (e.g., a diabetes code list could contain codes for diagnoses, but also referrals, patient registries, examination results, etc). These different categories are often used in sensitivity analyses but also required if variables are defined based on algorithms, e.g., BMI, vaccinations, eGFR, etc. I think it would make data extraction easier if the authors allowed the user to enter a data frame for the code list instead of a vector alone. This function could extract based on the medcodeid but would merge the other columns included in the code list data frame to the final output. I know that the same can be done by applying the current functions iteratively but I think it would improve the workflow of data extraction. Algorithms for data extraction BMI I would like to flag that there are entries in CPRD in other units than cm/m and kg/stone (e.g., inches, feet, pounds, …). I would recommend the authors to double check that their algorithm includes all possible measuring units of height and weight used in CPRD. Diabates status How does the algorithm handle general diabetes codes, e.g., “diabetes mellitus”, “diabetic foot”, etc? In the current version it only allows to enter specific codes for either type 1 or type 2. However, as the generic codes are commonly used, I would recommend including them. Reviewer #2: This manuscript introduces rcprd, an R package designed to streamline the extraction, processing, and querying of CPRD data. The package tackles key challenges inherent in working with large-scale EHR data by creating an SQLite database from raw .txt files and offering a suite of functions to extract both common variable types (e.g., history of a condition, time-to-event variables, test results) and specific variables (e.g., BMI, cholesterol/HDL ratio). The manuscript provides a detailed description of each function in the package. The work is timely and promises to significantly reduce redundant efforts in processing CPRD Aurum data, potentially benefiting a wide range of health data researchers. Overall, the manuscript is well-written, well-organized, and represents a significant contribution to the field of EHR analysis. Major Comments: Comparison to other packages: Does rEHR or any other package still remain a viable alternative for preparing analysis-ready data from CPRD Aurum data? The authors discussed the differences in approach rather than performance among rEHR, aurumpipeline, and rcprd. Would it be possible to provide a comparison with existing tools? A clear comparison would help highlight the distinct improvements offered by rcprd. Scalability and practical Implementation: Although the demonstration using a simulated dataset effectively showcases the functionality, the example dataset is relatively small (containing only 12 patients and 8 .txt files). Could the authors expand on practical applications by including performance metrics or a case study using real CPRD data (if feasible) to further validate the package’s utility in real-world settings? Minor Points: L165: The authors mention the naming convention for the EHR files. Is user input required for this, or do the CPRD Aurum data files already adhere to the naming convention described in Section 2.1? L170: Instead of stating “(note, expecting CRAN submission in the next month),” could the authors provide the CRAN link if available? If not, would a specific expected availability date (e.g., year/month) be preferable? L171: Since reference 12 is the GitHub link for the R package, might it be more straightforward to directly include the GitHub URL rather than citing it as a reference? L432: When adding all relevant files at once, is there a checking step for duplicate records? Could this potentially pose an issue under certain circumstances? L581: There appears to be a typo: “ust” -> “must” ?. ********** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.
|
| Revision 1 |
|
rcprd: An R package to simplify the extraction and processing of Clinical Practice Research Datalink (CPRD) data, and create analysis-ready datasets PONE-D-24-51682R1 Dear Dr. Pate, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sreeram V. Ramagopalan Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .