Peer Review History

Original SubmissionJune 29, 2025
Decision Letter - Frederick Wangai, Editor

-->PONE-D-25-29570-->-->Comparison of the performance of four clinical prediction rules for mortality in patients with COVID-19-->-->PLOS ONE

Dear Dr. Azañero-Haro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Dec 10 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:-->

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

-->If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Frederick K Wangai, MBChB, Mmed (Int Med), FCP (ECSA), DHP

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that your Data Availability Statement is currently as follows: “All relevant data are within the manuscript and its Supporting Information files.”

Please confirm at this time whether or not your submission contains all raw data required to replicate the results of your study. Authors must share the “minimal data set” for their submission. PLOS defines the minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods (https://journals.plos.org/plosone/s/data-availability#loc-minimal-data-set-definition).

For example, authors should submit the following data:

- The values behind the means, standard deviations and other measures reported;

- The values used to build graphs;

- The points extracted from images for analysis.

Authors do not need to submit their entire data set if only a portion of the data was used in the reported study.

If your submission does not contain these data, please either upload them as Supporting Information files or deposit them to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories.

If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. If data are owned by a third party, please indicate how others may request data access.

3. When completing the data availability statement of the submission form, you indicated that you will make your data available on acceptance. We strongly recommend all authors decide on a data sharing plan before acceptance, as the process can be lengthy and hold up publication timelines. Please note that, though access restrictions are acceptable now, your entire data will need to be made freely accessible if your manuscript is accepted for publication. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If you are unable to adhere to our open data policy, please kindly revise your statement to explain your reasoning and we will seek the editor's input on an exemption. Please be assured that, once you have provided your new statement, the assessment of your exemption will not hold up the peer review process.

4. We notice that your supplementary figures are uploaded with the file type 'Figure'. Please amend the file type to 'Supporting Information'. Please ensure that each Supporting Information file has a legend listed in the manuscript after the references list.

5. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

6. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewer's Responses to Questions

-->Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. -->

Reviewer #1: Yes

Reviewer #2: Yes

**********

-->2. Has the statistical analysis been performed appropriately and rigorously? -->

Reviewer #1: Yes

Reviewer #2: Yes

**********

-->3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.-->

Reviewer #1: Yes

Reviewer #2: Yes

**********

-->4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.-->

Reviewer #1: Yes

Reviewer #2: Yes

**********

-->5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)-->

Reviewer #1: This is an excellent manuscript and thou the statistics at first glance appear complicated it simply and clearly explained. Well done.

Just to be sure,

Line 84: should it read referral or reference ?

Reviewer #2: Overview

This manuscript presents a retrospective cohort study comparing the predictive performance of four well-known COVID‑19 mortality prediction rules—q‑CSI, ISARIC‑4C, SEIMC, and CALL—in a Peruvian hospital cohort. The study design is clear and ethically sound, deriving a dataset from a larger previously published study. The authors focus on unvaccinated adults hospitalized with COVID‑19 pneumonia between March and December 2020 at Hospital Nacional Hipólito Unanue in Lima, Peru. By comparing the discriminatory and calibration performance of these scores, the study aims to address whether simpler models like q‑CSI can perform as well as, or better than, more complex scoring systems (e.g., ISARIC‑4C).

The main finding is that q‑CSI demonstrated the highest discriminatory ability (AUROC 0.85) with a favorable balance of sensitivity (86.3%) and specificity (70.9%) at an optimal cutoff identified via Youden index. ISARIC‑4C followed closely (AUROC 0.81; sensitivity 78.7%, specificity 69.1%) and showed the only acceptable calibration (Hosmer–Lemeshow p = 0.45). SEIMC and CALL had lower AUROCs and calibration performance, though each provided specific strengths—SEIMC displayed the highest specificity (76%), and CALL retained fair negative predictive value. The authors emphasize the need for simple risk stratification tools applicable in low-resource settings and discuss how their findings might inform clinical decision-making.

Strengths

Relevance and Timeliness: Assessing mortality prediction rules remains relevant because effective triage tools can optimize resource allocation during pandemic surges, especially in low- and middle-income countries. Although most global populations are now vaccinated, knowledge of score performance in unvaccinated contexts can inform responses where vaccination remains suboptimal. The focus on a Latin American cohort addresses a geographical gap in COVID‑19 prognostic research and provides data from a region heavily affected by the pandemic.

Clear Design and Ethics: The study’s retrospective design is well described and ethical approval is documented. The authors provide inclusion/exclusion criteria and sample size calculations (assuming 10% differences in sensitivity/specificity, 95% confidence level, and 80% power), then include all cases with complete data. Anonymization processes are noted. This transparency supports replicability and ethical compliance.

Comprehensive Statistical Approach: The statistical analyses are thorough. Variables were summarized appropriately based on distribution (means with SD or medians with IQR) and compared using parametric or non-parametric tests. The authors evaluate performance metrics (sensitivity, specificity, positive and negative predictive values, likelihood ratios) and discriminative ability (AUROC) with confidence intervals. They derive optimal cutoffs using the Youden index and assess calibration using decile-based plots, Spearman correlation, and the Hosmer–Lemeshow test. Pairwise comparisons of AUROCs are conducted to highlight differences relative to ISARIC‑4C.

Interpretation and Contextualization: The Discussion section contextualizes the findings by comparing them with prior validation studies across different populations. The authors note variability in AUROCs across regions and attribute differences to baseline cohort characteristics, therapeutic protocols, SARS-CoV‑2 variants, and the inherent tendency of models to perform better in their derivation cohorts. They rightly caution that calibration limitations can reduce clinical applicability even when discrimination is acceptable.

Limitations and Concerns

1. Selection Bias and Missing Data Handling: Only 1,074 of 3,074 patients (≈35%) had complete data for calculation of all four scores. Excluding two-thirds of patients risks selection bias. Missing data may not be random, and patients with incomplete records could differ systematically (e.g., severity, comorbidities, outcomes). The authors should compare baseline characteristics of included versus excluded patients and discuss how these differences might bias estimates. It would also be useful to explain why the sample sizes for each score (n=1844 for q‑CSI, 1408 for ISARIC‑4C, etc.) are larger than the final sample of 1,074—likely because each score had different missing variables. Clarifying whether multiple imputations or other strategies were considered would strengthen validity.

2. Generalizability: The cohort is single-center and exclusively unvaccinated. Given vaccination and new variants significantly change disease presentation and outcomes, the findings may not apply to contemporary COVID‑19 patients. The authors acknowledge this limitation but should expand discussion of how vaccination status, variant virulence, and changing treatment protocols affect predictive scores. Additionally, Peru’s healthcare system and patient demographics may differ from other Latin American and global contexts, limiting extrapolation.

3. Retrospective Data Quality: Retrospective chart reviews risk misclassification and documentation errors. The authors mention manual review of physical charts and data transfer to Excel, but further details about data quality control, inter-rater reliability, and training of abstractors would enhance confidence in the dataset.

4. Calibration and Model Updating: Although discrimination is the focus, calibration is crucial for clinical use. The q‑CSI, SEIMC, and CALL scores demonstrated poor Hosmer–Lemeshow fit, indicating risk predictions deviate from observed probabilities. The authors might explore recalibration or model updating tailored to their cohort. For example, logistic recalibration or refitting intercepts and slopes could improve predictive accuracy (Van Calster et al. 2019). If recalibration is out of scope, at least provide calibration intercepts and slopes or net reclassification improvement measures; these metrics are more informative than Hosmer–Lemeshow, which is sensitive to sample size (Riley et al. 2019).

5. Clinical Utility and Cutoffs: The decision to derive new binary cutoffs using the Youden index may optimize sensitivity and specificity but might oversimplify ordinal risk categories originally proposed for the scores. The authors should justify dichotomizing continuous/ordinal scores when original tools defined multiple risk strata for triage. Also, discuss how these new cutoffs would perform in alternative settings (e.g., outpatient triage) or under varying resource constraints.

6. Confounding and Unmeasured Variables: Mortality risk is influenced by many factors beyond those captured by scoring systems (e.g., time from symptom onset to admission, treatment availability, socioeconomic status, viral variants). Because the dataset originates from an earlier pandemic wave, factors like corticosteroid use or antiviral therapy may differ. The authors might discuss whether treatments or changes in standard of care during the study period confounded associations.

7. Sample Size Calculation and Power: The sample size calculation (582 patients) assumes 10% differences in sensitivity and specificity, but the reasoning could be better clarified. Since final included cases exceeded this threshold (1,074), power to detect smaller differences is likely adequate. However, specifying how missing data patterns affect effective sample size would help.

8. Language and Clarity: While the manuscript is generally well written, there are minor grammatical errors and awkward phrases. Examples include “The outcome primary was 30‑day in‑hospital mortality” (should be “The primary outcome was 30‑day...”), and “determine” is misspelled as “determinate”. The authors should proofread to improve readability. A professional language edit is recommended before publication.

9. Data Availability: The supporting information or link to a repository is not provided in the PDF. For transparency, deposit the dataset and statistical code in a public repository (e.g., Dryad, OSF) and include a DOI. If ethical or legal restrictions apply (due to patient privacy), provide contact details for an institutional data access committee.

Novelty and Contribution: Several studies have compared COVID‑19 prognostic scores; some recent meta-analyses have reviewed dozens of models (Wynants et al. 2020). The novelty here lies in directly comparing q‑CSI, ISARIC‑4C, SEIMC, and CALL in a Peruvian context. Although the unvaccinated single-center cohort limits broad applicability, the study adds local data and highlights that simple, respiratory-based scores may perform well in resource-limited settings.

Suggestions for Improvement

i) Describe excluded patients and missing data: Provide a table comparing demographics and outcomes of included and excluded cases to assess selection bias.

ii) Explain each score’s required variables and missingness: The differing sample sizes for each score (1844 for q‑CSI vs. 1408 for ISARIC‑4C etc.) suggest variable availability issues. Clarify which variables were missing and why.

iii) Consider model updating: Even a simple recalibration of intercept and slope could improve predictive accuracy; reporting these could encourage others to adopt similar updates.

iv) Expand on vaccination and variant context: Provide rationale for why unvaccinated data remain useful and discuss how the models might perform in vaccinated populations (perhaps referencing external validation studies).

v) Detail data quality control: Outline the training of data abstractors, double data entry or cross‑checks, and steps taken to minimize misclassification.

vi) Provide open data: Share anonymized data and code to comply with PLOS data policies and to allow other researchers to replicate or extend the analysis.

vii) Professional editing: Engage a native English editor to polish the manuscript and fix typographical errors.

REFERENCES

Riley, Richard D., Ewout W. Steyerberg, and Douglas G. Altman. 2019. “Better Reporting of Analyses Assessing Model Performance: Calibration Survival.” BMJ 365: l1821. https://doi.org/10.1136/bmj.l1821

Van Calster, Ben, Ewout W. Steyerberg, Maarten van Smeden, Laure Wynants, and Richard D. Riley. 2019. “Calibration: The Achilles Heel of Predictive Analytics.” BMC Medicine 17 (1): 230. https://doi.org/10.1186/s12916-019-1466-7

Wynants, Laure, et al. 2020. “Prediction Models for Diagnosis and Prognosis of Covid‑19: Systematic Review and Critical Appraisal.” BMJ 369: m1328. https://doi.org/10.1136/bmj.m1328

**********

-->6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.-->

Reviewer #1: Yes: Shastra Avendra Bhoora

Reviewer #2: Yes: Miquel Angel Rodríguez-Arias

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures

You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation.

NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications.

-->

Revision 1

A rebuttal letter

Dear Frederick K Wangai, Academic Editor, and anonymous reviewers,

Thank you sincerely for your time, the thorough reading of our manuscript (PONE-D-25-29570), and the constructive and insightful comments you have provided. We believe that your suggestions and critiques have significantly strengthened the clarity and scientific rigor of our work.

We have carefully reviewed every point raised by the Academic Editor and the Reviewers, and we have made all necessary modifications in the revised manuscript.

In the following "Response to Reviewers Letter," we address each observation point-by-point. To facilitate the review process, we have copied and pasted the original comments in bold and italics and have presented our response immediately afterward, indicating the changes made in the manuscript (including the page or line number where applicable).

We hope that the revisions fulfill the high standards of PLOS ONE, and we thank you in advance for your consideration regarding the publication of our work.

Sincerely,

Reviewer #1

Reviewer's observation #1. This is an excellent manuscript and thou the statistics at first glance appear complicated it simply and clearly explained. Well done.

Just to be sure,

Response: We are highly gratified that the statistical methodology, while potentially appearing complex at first glance, was perceived as “simply and clearly explained.” We sincerely thank the reviewer for this positive assessment.

Reviewer's observation #2. Line 84: should it read referral or reference ?

Response: We appreciate this important point regarding terminology. We have reviewed Line 84 of the original manuscript and confirm that, in the context of patient transfer or moving a patient between different levels of care, the correct term is “referral”. We have corrected the wording in the manuscript to ensure precision and clarity.

Reviewer #2

1. Selection Bias and Missing Data Handling: Only 1,074 of 3,074 patients (≈35%) had complete data for calculation of all four scores. Excluding two-thirds of patients risks selection bias. Missing data may not be random, and patients with incomplete records could differ systematically (e.g., severity, comorbidities, outcomes). The authors should compare baseline characteristics of included versus excluded patients and discuss how these differences might bias estimates. It would also be useful to explain why the sample sizes for each score (n=1844 for q CSI, 1408 for ISARIC 4C, etc.) are larger than the final sample of 1,074-likely because each score had different missing variables. Clarifying whether multiple imputations or other strategies were considered would strengthen validity.

Response:

We sincerely appreciate your detailed and highly pertinent observation regarding the potential selection bias arising from the high proportion of missing data in our retrospective cohort.

1. We fully concur that the exclusion of a significant portion of the original patient cohort represents a limitation and introduces risk of selection bias. This high rate of data incompleteness is a direct reflection of the extreme resource constraints and clinical burden faced by our hospital, and the Peruvian health system overall, during the first wave of the pandemic. The resulting compromise in the quality of clinical documentation retrospectively underscores the pressing need for improved data capture systems in low-resource settings -a limitation that we feel merits explicit discussion in the manuscript, as it represents a negative impact on the quality of healthcare records.

2. Although 3074 patients were hospitalized, 2377 satisfied our general inclusion criteria. Among them, 1074 had compete information for calculating all four scores under evaluation (see flowchart). While we recognize the value of your suggestion to compare the baseline characteristics (demographics, comorbidities, and mortality outcome) between the patient with (n=1,074) and without complete information (n=1,303), we must regretfully report that this analysis is not methodologically feasible. The exclusion process occurred early during the data abstraction phase: patients whose charts were missing the minimum variables required for any of the four prediction rules were immediately filtered out. Consequently, their full baseline clinical characteristics and, critically, their mortality outcomes, were not extracted or digitized, precluding any comparison.

3. As the reviewer pointed out, the sample sizes for each individual score are larger than the final sample of 1,074 because each score had different missing variables.

To provide clarity on the distinct sample sizes (n) reported:

• n=1,074 (Primary cohort): This is the core comparative cohort. It is defined strictly by patients who had complete data necessary to calculate ALL four prediction scores concurrently (q-CSI, ISARIC-4C, SEIMC, and CALL). This group ensures a robust head-to-head comparison of all models.

• Larger “n” for Individual Scores: The larger sample sizes cited for individual scores (e.g., n=1,844 for q-CSI) reflect the differential availability of the specific variables required for each model. The q-CSI, requiring fewer and more common clinical and basic laboratory parameters, was applicable to a significantly larger subset of the initial pool compared to the ISARIC-4C, which relies on more complex inflammatory markers often missing from retrospective charts. This demonstrates that simpler scores are inherently more robust to data incompleteness in our setting.

4. Regarding alternative strategies, we also considered methods such as multiple imputation. However, given that the percentage of excluded patients (1,303 out of 2,377) represents 55% of the eligible cohort, we determined that multiple imputation would not yield statistically robust estimates. We therefore opted for the complete-case analysis, explicitly acknowledging and discussing its inherent limitations in the Discussion section.

Actions in the Manuscript:

- We will integrate these essential clarifications into the revised methods and discussion sections of the manuscript.

2. The cohort is single-center and exclusively unvaccinated. Given vaccination and new variants significantly change disease presentation and outcomes, the findings may not apply to contemporary COVID 19 patients. The authors acknowledge this limitation but should expand discussion of how vaccination status, variant virulence, and changing treatment protocols affect predictive scores. Additionally, Peru’s healthcare system and patient demographics may differ from other Latin American and global contexts, limiting extrapolation.

Response:

We thank the reviewer for raising this concern. We acknowledge that the generalizability of our findings is a key limitation due to the study's specific temporal and geographical context.

1. We accept that the single-center nature of our cohort, coupled with the fact that the study was conducted exclusively in unvaccinated patients during the initial waves of the pandemic (March to December 2020), inherently restricts the direct applicability of our results to current clinical practice. It is well-established that the development of population immunity, the emergence of new viral variants, and the standardization of therapeutics (such as corticosteroids and antivirals) have substantially altered the natural history, clinical presentation, and mortality risk of COVID-19. This evolution may indeed affect the predictive performance of scores developed or validated in earlier phases. We have expanded upon this critical point in the Discussion section.

2. Nevertheless, we believe the value of this study remains significant for several reasons:

• This validation study focuses on a Latin American (Peruvian) cohort, a region historically underrepresented in prognostic score validation literature. This provides essential data on the performance of these tools within a specific demographic and socioeconomic context.

• Our finding that simpler scores (like q-CSI) maintained high performance is crucial for triage and decision-making in low-resource environments, where access to complex laboratory variables necessary for scores like ISARIC-4C is often a significant logistical bottleneck.

• Data from unvaccinated or immunologically naive populations hold paramount importance for planning and response strategies against future public health emergencies that may involve novel pathogens or populations lacking prior immunity.

3. Finally, we will re-emphasize that the clinical application of these scores must always be understood within the context of hospitalized patient risk stratification. By design, their extrapolation to the general population (ambulatory patients or community triage) is, and remains, limited.

Actions in the Manuscript:

• The Discussion section will be expanded to incorporate a more detailed analysis of the impact of vaccination, variants, and evolving treatments on the predictive capacity of the scores.

• The justification for the study will be reinforced, highlighting the geographical gap and the utility for resource-limited settings.

3. Retrospective Data Quality: Retrospective chart reviews risk misclassification and documentation errors. The authors mention manual review of physical charts and data transfer to Excel, but further details about data quality control, inter-rater reliability, and training of abstractors would enhance confidence in the dataset.

Response:

We agree that data quality is an essential component of methodological rigor and that retrospective review, particularly in a public health emergency context, carries an inherent risk of transcription errors or misclassification.

To ensure the quality and reliability of our dataset, we implemented a comprehensive control protocol:

1. Data abstraction was carried out by the same core team of investigators who were simultaneously working as treating physicians in the hospitalization areas designated for COVID-19 patients. Their extensive clinical experience and direct knowledge of the clinical context were crucial for the correct and consistent interpretation of clinical and laboratory variables (including those obtained from handwritten records).

2. For rigorous transcription quality control, 100% of the abstracted data was subjected to a double data entry process (double digitization). Two team members independently entered the information into separate electronic files. Subsequently, a systematic cross-verification was performed to detect all discrepancies. Any differences identified between the two entries were immediately reviewed by a senior investigator and resolved by referring to the original patient record. This process resulted in a single, final validated dataset.

3. While we acknowledge that a formal measurement of inter-rater reliability was not performed, we consider this limitation to be substantially mitigated. Our confidence in the data quality is based on the unique combination of the deep clinical experience and contextual knowledge held by the physician/investigator abstractors, coupled with the methodological rigor of the systematic double data entry and cross-verification process.

Actions in the Manuscript:

• These specific details concerning the composition of the investigation team (treating physicians/investigators), double data entry, and the cross-verification protocol will be incorporated into the "Methods: Data Collection" section to enhance methodological transparency.

4. Calibration and Model Updating: Although discrimination is the focus, calibration is crucial for clinical use. The q CSI, SEIMC, and CALL scores demonstrated poor Hosmer–Lemeshow fit, indicating risk predictions deviate from observed probabilities. The authors might explore recalibration or model updating tailored to their cohort. For example, logistic recalibration or refitting intercepts and slopes could improve predictive accuracy (Van Calster et al. 2019). If recalibration is out of scope, at least provide calibration intercepts and slopes or net reclassification improvement measures; these metrics are more informative than Hosmer–Lemeshow, which is sensitive to sample size (Riley et al. 2019).

Response:

We thank the Reviewer for highlighting the importance of calibration and for guiding us towards the use of more informative metrics. We agree that the Hosmer-Lemeshow (HL) test is highly sensitive to large sample sizes, potentially masking adequate calibration performance.

We have addressed this point comprehensively by calculating and reporting the Calibration Intercept (alpha) and the Calibration Slope (beta) for all four scores in the complete case cohort (n=1,074).

1. Our analysis using alpha and beta demonstrates that the global calibration of all four models was robust, in contrast with the conclusion of 'poor fit' derived solely from the HL test.

• Calibration Slope (beta): The slope for all four scores was found to be statistically indistinguishable from the ideal value of 1.0. This demonstrates that the risk scale is structurally correct in our cohort.

• Calibration Intercept (alpha): The alpha values for the SEIMC (alpha=-0.04) and CALL (alpha=-0.03) scores were virtually zero. While the q-CSI (alpha=-0.14) and ISARIC (alpha=-0.13) showed a slight negative trend (suggesting minor risk overestimation), this deviation was not statistically significant (p > 0.05 in all cases).

We have expanded the Discussion to emphasize that these robust metrics (Table 2) confirm the models are globally well-calibrated, validating their use for risk stratification despite the negative HL result.

2. We acknowledge the importance of recalibration (e.g., Logistic Recalibration, as suggested by Van Calster et al.) for maximizing local utility.

• Since our primary objective was the direct external validation and comparison of the published models’ original performance, model updating or recalibration falls outside the scope of this initial study. However, we have made this explicit recommendation for future research based on the minor local deviations observed in the calibration plots.

3. Regarding the suggested to include the references by Van Calster et al. 2019 and Riley et al. 2019.

We believe the Van Calster et al. reference focuses on the methodology of recalibration and model updating, which falls outside the scope of our validation study. We regret to inform that despite efforts to locate the full text, we were unable to gain access to the work by Riley et al. (2019) to integrate its methodology. Therefore, we will instead cite the principles of calibration assessment based on available methodological literature (Stevens RJ et al 2020).

Actions in the Manuscript:

• The global performance table (table 2) in the result section now includes the Calibration Intercept (alpha) and Calibration Slope (beta) with their 95% CIs for all four scores based on the complete case analysis (N=1,074).

• The Discussion has been revised to argue that the alpha approx. 0 and beta approx. 1 findings indicate a robust global calibration, thus supporting the clinical application of these scores.

5. Clinical Utility and Cutoffs: The decision to derive new binary cutoffs using the Youden index may optimize sensitivity and specificity but might oversimplify ordinal risk categories originally proposed for the scores. The authors should justify dichotomizing continuous/ordinal scores when original tools defined multiple risk strata for triage. Also, discuss how these new cutoffs would perform in alternative settings (e.g., outpatient triage) or under varying resource constraints.

Response:

We appreciate this observation, as it addresses the crucial transition from statistical findings to practical clinical utility.

1. We recognize that the original scores define multiple risk strata, but our primary objective was to perform a direct (head-to-head) and standardized comparison of the predictive capacity of the four tools for the primary binary outcome (mortality). The use of the Youden index allowed us to establish an optimal and uniform cutoff point for each score, maximizing discrimination between "high risk of mortality" and "low risk of mortality" specifically within our cohort.

2

Attachments
Attachment
Submitted filename: Response to Reviewers.doc
Decision Letter - Frederick Wangai, Editor

-->

PONE-D-25-29570R1

Comparison of the performance of four clinical prediction rules for mortality in patients with COVID-19

PLOS One

Dear Dr. Azañero-Haro,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Essential revisions

  1. Missing data and selection bias (transparency)

  • Please state explicitly in the manuscript what proportion of the source cohort was excluded due to incomplete documentation and clarify the implications for potential selection bias.
  • If excluded records were not digitized/extracted and therefore included vs excluded comparisons could not be performed, please state this plainly so readers understand that selection bias cannot be quantified.
  • Provide a brief missingness summary (preferably a supplementary table): proportion missing for key predictors used by each score and the resulting analytic sample size per score; include a simple participant flow (figure or table).

  1. Cohort definitions and “head-to-head” framing

  • Several performance metrics are presented on different denominators across models, while AUROC/calibration are reported in the complete-case cohort. This is methodologically defensible but risks reader confusion.
  • Please clearly distinguish: (a) “maximal available subcohorts per score” (not head-to-head) versus (b) the complete-case cohort used for direct head-to-head comparison, and relabel tables/Results text accordingly.

  1. Correct interpretation of sensitivity/specificity

  • Please review and correct the Discussion text to ensure sensitivity and specificity are interpreted consistently with the outcome coding (e.g., if mortality is the positive outcome, sensitivity relates to correctly identifying deaths). Confirm explicitly which class is treated as the positive outcome.

  1. Endorsement claim

  • The manuscript states that a score is supported/recommended by the World Health Organization. This is a strong claim and must be supported by a precise WHO source, or rephrased to a defensible non-endorsement statement (e.g., “widely validated and commonly used”).

Editorial/production requirements

  • Please remove all residual drafting artifacts (duplicated words, broken phrases, typographical errors) and correct any malformed in-text citation numbering. The next version should be a clean, publication-ready file.

Data availability statement

  • Please ensure the Data Availability Statement is internally consistent with what is actually provided (e.g., if a minimal dataset and code are included as Supporting Information, state this clearly and remove conflicting “upon acceptance” language unless required by journal policy).

Once these points are addressed, I anticipate the manuscript can be accepted without further external review.

==============================

Please submit your revised manuscript by May 01 2026 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Frederick K Wangai, MBChB, Mmed (Int Med), FCP (ECSA), DHP

Academic Editor

PLOS One

Journal Requirements:

1. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise.

2. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

-->

Revision 2

Dear Frederick K. Wangai, Academic Editor, PLOS ONE

Thank you for the opportunity to submit this minor revision. We appreciate your positive feedback regarding the strengthened data quality and calibration reporting. We have addressed each of the essential and editorial requirements point-by-point as follows:

1. Essential revisions

• Missing data and selection bias (transparency):

o Response: We have explicitly stated in the Results and Methods that out of a source cohort of 1,963 patients, 1,074 were included in the head-to-head analysis.

o Action: To address potential selection bias, we have added S2 Table, which compares the baseline characteristics of included (n=1,074) vs. excluded (n=889) patients.

o Clarification on Bias: The analysis in S2 Table revealed some statistically significant differences: the included cohort was slightly older (58.2 vs 56.4 years, p=0.011) and a higher prevalence of diabetes (27.9% vs 22.7%, p=0.008). We have acknowledged these differences in the Limitations section, noting that our analytic sample represents a slightly higher-risk group. However, we emphasize that since all four scores were evaluated on the exact same n=1,074 patients, the relative performance and ranking of the scores (head-to-head comparison) remain internally valid and robust for this population.

o Action: We provided a missingness summary in S1 Table, detailing the proportion missing for key predictors (e.g., CRP for ISARIC-4C score and LDH for CALL score) across the scores, which justifies the final analytic sample size and underscores the feasibility of simpler tools like q-CSI.

• Cohort definitions and “head-to-head” framing:

o Response: We have clarified the distinction between the "maximal available subcohorts" (used for feasibility) and the "complete-case cohort" (used for head-to-head comparison).

o Action: Tables and Results text have been relabeled to ensure clarity. We now clearly distinguish when referring to the total available data for a single score (e.g., n=1,844 for q-CSI) versus the core comparative cohort (n=1,074) used for AUROC and calibration metrics.

• Correct interpretation of sensitivity/specificity:

o Response: We have explicitly stated in the Methods section and in the footnotes of the results text that "30-day mortality" is treated as the positive outcome.

o Action: The Discussion (Interpretation of cutoffs) has been revised to ensure sensitivity is correctly interpreted as the ability to identify patients who will die (screening/Rule-out), and specificity as the ability to identify those who will survive (confirmatory/Rule-in).

• Endorsement claim:

o Response: We have removed the claim that any score is "recommended by the WHO" to avoid non-endorsement issues.

o Action: In the Abstract and Discussion, we rephrased this to: "a widely validated and internationally recognized reference standard often utilized in global clinical guidelines."

2. Editorial/production requirements

• Drafting artifacts and typos:

o Action: A complete manual review of the manuscript was performed. We corrected typographical errors such as “wich” to “which” and ensured all in-text citation numbering is sequential and correctly formed. The revised file is now a clean, publication-ready version.

3. Data availability statement

• Consistency:

o Action: We have updated the Data Availability Statement. S1 File now contains the complete dataset of 1,963 patients, including those with missing values, to allow for full reproducibility of our flow-chart and missingness analysis.

Attachments
Attachment
Submitted filename: Response_to_Reviewers_auresp_2.doc
Decision Letter - Frederick Wangai, Editor

Comparison of the performance of four clinical prediction rules for mortality in patients with COVID-19

PONE-D-25-29570R2

Dear Dr. Azañero-Haro,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Frederick K Wangai, MBChB, Mmed (Int Med), FCP (ECSA), FRCP Edinburgh

Academic Editor

PLOS One

Formally Accepted
Acceptance Letter - Frederick Wangai, Editor

PONE-D-25-29570R2

PLOS One

Dear Dr. Azañero-Haro,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Frederick K Wangai

Academic Editor

PLOS One

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .