calibmsm: An R package for calibration plots of the transition probabilities in a multistate model

Alexander Pate; Matthew Sperrin; Richard D. Riley; Ben Van Calster; Glen P. Martin

doi:10.1371/journal.pone.0320504

Peer Review History

Original SubmissionJuly 29, 2024
29 Jul 2024 Author Response https://doi.org/10.1371/journal.pone.0320504.r001
29 Sep 2024 Decision Letter - Md Rahaman Khan, Editor PONE-D-24-31812calibmsm: An R package for calibration plots of the transition 4 probabilities in a multistate modelPLOS ONE Dear Dr. Pate, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== Authors need to address all points raised by the reviewers as attached. Additionally, the following points need to address. Major Points: Clarity and Justification of Calibration Methods: The methods section is comprehensive but could benefit from additional clarity regarding the rationale behind choosing the three calibration methods (BLR-IPCW, MLR-IPCW, and pseudo-values). A more detailed comparison or discussion of their relative merits, particularly for different types of datasets (e.g., varying amounts of censoring), would add value. Handling of Censoring: Censoring is a significant challenge in survival analysis. The manuscript describes using inverse probability of censoring weights (IPCW) to handle informative censoring, which is a valid approach. However, further discussion on how the choice of weights impacts the calibration curves and the results should be provided. Some sensitivity analysis regarding the impact of weight selection would enhance the robustness of the findings. Sample Size and Calibration: The manuscript notes that calibration results could be improved with a larger sample size, but it would be helpful to provide more concrete guidance or examples of how sample size affects calibration estimates. Including a brief power calculation or similar quantitative justification for sample size adequacy would enhance the practical use of the software. Practical Interpretation: While the results section demonstrates the use of the calibmsm package well, a deeper interpretation of the results (e.g., how clinicians or statisticians should act based on poor calibration) would be helpful. This would provide the necessary bridge between technical calibration results and clinical decision-making. Minor Points: Software Documentation: It would be helpful to include more detailed instructions or a user manual within the manuscript or supplementary materials, especially for users less familiar with R. Notations and Definitions: Ensure consistent use of notation, particularly in sections where different methods are discussed. Some readers may find the jump between calibration techniques disorienting without a clear distinction between them. Figures and Plots: The calibration curves and scatter plots (such as those generated using BLR-IPCW and MLR-IPCW) should be labeled more clearly in the figures. This will help non-expert readers follow the results more easily. ============================== Please submit your revised manuscript by Nov 13 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Md Hasinur Rahaman Khan, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following financial disclosure: “This work was supported by funding from the MRC-NIHR Methodology Research Programme [grant number: MR/T025085/1].” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. Additional Editor Comments: Authors need to address the points as raised by the two reviewers and given in the attachments. Additionally, I have the following comments that also need to address. Major Points: 1. Clarity and Justification of Calibration Methods: The methods section is comprehensive but could benefit from additional clarity regarding the rationale behind choosing the three calibration methods (BLR-IPCW, MLR-IPCW, and pseudo-values). A more detailed comparison or discussion of their relative merits, particularly for different types of datasets (e.g., varying amounts of censoring), would add value. 2. Handling of Censoring: Censoring is a significant challenge in survival analysis. The manuscript describes using inverse probability of censoring weights (IPCW) to handle informative censoring, which is a valid approach. However, further discussion on how the choice of weights impacts the calibration curves and the results should be provided. Some sensitivity analysis regarding the impact of weight selection would enhance the robustness of the findings. 3. Sample Size and Calibration: The manuscript notes that calibration results could be improved with a larger sample size, but it would be helpful to provide more concrete guidance or examples of how sample size affects calibration estimates. Including a brief power calculation or similar quantitative justification for sample size adequacy would enhance the practical use of the software. 4. Practical Interpretation: While the results section demonstrates the use of the calibmsm package well, a deeper interpretation of the results (e.g., how clinicians or statisticians should act based on poor calibration) would be helpful. This would provide the necessary bridge between technical calibration results and clinical decision-making. Minor Points: 1. Software Documentation: It would be helpful to include more detailed instructions or a user manual within the manuscript or supplementary materials, especially for users less familiar with R. 2. Notations and Definitions: Ensure consistent use of notation, particularly in sections where different methods are discussed. Some readers may find the jump between calibration techniques disorienting without a clear distinction between them. 3. Figures and Plots: The calibration curves and scatter plots (such as those generated using BLR-IPCW and MLR-IPCW) should be labeled more clearly in the figures. This will help non-expert readers follow the results more easily. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The paper presents an R package, calibmsm, for producing calibration plots for the estimated transition probabilities in multistate regression models. For the most part, the underlying theory relating to the plots has already been published elsewhere, although the current paper extends the approach to any starting time and any starting state (rather than the initial state at time 0) using landmarking. The current paper is structured as a tutorial paper which uses a dataset on leukemia patients following bone marrow transplantation as an illustrative example. The calibration plots are a very useful diagnostic tool for assessing goodness-of-fit in multistate models with right-censoring. The package works closely with mstate, which is the principal package for fitting such models in R. Generally the paper, and the associated R package, are well written and fairly easy to follow. Main comments 1. For useability, it might be helpful for the paper to give a bit more detail on how the data frames containing the transition probability estimates are created and the requirements for the data.raw dataset. In the paper, the relevant dataframes are just directly used. However, in practice researchers would need to construct these themselves starting from a fitted mstate model. For the data.raw dataset it is unclear whether the wide form transition times are necessary, or whether it is only the covariates and event time/status for the censoring model. 2. The vignette on IPCW suggests that the default method of calculating the weights is inappropriate. In particular, the "year" variable clearly sets an upper limit on the administrative censoring (and hence overall censoring) time. It would be helpful to see what impact there is if a more appropriate set of weights is used. Minor comments (page numbers refer to the pages in the generated pdf) P9 Title page: "Surname" is included in the list of authors. P17 l194: cox -> Cox P18 l225: "All multistate models must have an absorbing state": I don't think that is necessarily true. While most applications would include death as an absorbing state, some models (e.g. models of STIs) may have two or more recurrent states and assume that death is negligble or may be treated as non-informative censoring. P32 l448: Unfinished sentence. P51 Figure 4: The plots do not appear to have any estimated calibration curves included. Reviewer #2: The authors present the R-package calibmsm for calibrationsplots of the transition probabilities in a multistate model. This is a quite interesting tool for many researchers that facilitates the application of existing methods. I appreciate the Reference list provided and the overall explanation of the package. However, I have some minor remarks that might help to improve the quality of manuscript. - Authors name of Ben van Calster seems to be entered incorrect, as in the authors list “Surname” is written - Line 30: There is a typo: psuedo -> pseudo - Line 36: “… the calibration of a multistate model developed…” I think you can say “… any multistate model” in order to underline the flexibility of this package. - Line 67-68: I think there is a grammatical error in this sentence (“which”?). Please check. - line 102-103: I would be interested in the transition probabilities into any state. This is actually what you are doing. However, in this sentence it sounds like you are only addressing the transitions out of the starting state. - Line 228: There is a typo: This is issue…-> delete is - The numbers referencing to formulas (like (1) and (2)) look the same as the numbers for the references. This is irritating, please use different styles. - Line 265 – 273: You nicely explain step by step how to estimate confidence intervals. For me as a potential user it would be nice if you could add the information where I can find some example code (maybe as supplement, or in the practical part that follows) - Section 3 and 4: o For me it was not that easy to capture the overview provided of section 3 (description of package functions and interface). o It might help, if you separate it a little bit. Maybe first state everything that is needed. And explain afterwards how it should look like. Maybe consider a step by step approach. In the end there should be a clearer structure in this section 3. o In general, for a user it is easier to directly see what is goin on (what input for what purpose etc) by including snippets of examples. You could think of combining section 3 and 4. But if you want to keep this separated you could already refer to the next section. - Figure 4: There seems to be an error when uploading the figure. When viewing the submission I there are only empty graphs (with diagonals). - Line 548: The authors mention a range of other models, that can be addressed via calibmsm, not only multistate models, e.g. competing risks models. A competing risk model is actually a simple multistate model with 3 states. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0320504.r002
Revision 1
8 Nov 2024 Author Response Hello, we thank the reviewers and editor for taking the time to read the paper carefully, and giving insightful comments. We have provided a point-by-point response to all comments, our responses in blue text, and changes to the manuscript given in green text, to help the reviewers navigate the document as easily as possible. Line numbers refer to the tracked changes version of the manuscript. Also, in order for the submission to meet PlosOne submission guidelines we have had to change the document format. Previously, it was generated from within R, as a vignette, however it is required to submit a TEX or .docx file. We have therefore converted the previously submitted pdf to a .docx file, and have applied tracked changes to this. During the conversion it has proved difficult to retain to retain the Sweave of manuscript text, R code and R output. We apologise for any inconsistencies in the formatting (i.e. some of the code + code output looks a bit ugly). Finally, we have made some changes to the package over the last couple of months. For example, there are new vignettes describing some of the sensitivity analyses we have done as a result of this peer review. The submitted manuscript here reflects the newest version of calibmsm on GitHub, not the version currently on CRAN. We will update the version on CRAN, if/when this manuscript is accepted, to avoid repeatedly updating and pushing to CRAN. Many thanks, Editor comments: Major Points: Clarity and Justification of Calibration Methods: The methods section is comprehensive but could benefit from additional clarity regarding the rationale behind choosing the three calibration methods (BLR-IPCW, MLR-IPCW, and pseudo-values). A more detailed comparison or discussion of their relative merits, particularly for different types of datasets (e.g., varying amounts of censoring), would add value. We have added the following text to highlight why these three methods were chosen: Line 120: These approaches were previously proposed and evaluated in a simulation study,24 but were restricted to assessing calibration out of the starting state at time s=0. The theory is summarised and revised to allow assessment of calibration out of any state j at any time s ofin sections 2.2 – 2.6. In our previous work we found that both the BLR-IPCW and pseudo-value methods had similar levels of bias and variance under the different censoring mechanisms considered in our simulation,1 therefore they do not have relative merits in this sense. We do know that both methods work when their assumptions are met (conditional independence given some set of baseline covariates Z). We have therefore added the following text to: 1) detail around exactly when these assumptions hold 2) highlight the benefit doing MLR-IPCW as well as either BLR-IPCW or pseudo-value (strength of calibration) 3) highlight the importance of doing both BLR-IPCW or pseudo-value: Line 794: All three methods (BLR-IPCW, MLR-IPCW and pseudo-value) have been shown to give an unbiased assessment of calibration under random censoring mechanisms, and a predominately unbiased assessment of calibration when there is a strong association between the outcome and censoring mechanisms that can be explained by baseline covariates.1 Line 813: For now, we reiterate the importance of implementing these methods in settings where the observation process/censoring mechanism does not change depending on the outcome state an individual is in. It has previously been suggested to evaluate calibration using MLR-IPCW and one of the BLR-IPCW or pseudo-value approaches because MLR-IPCW provides a stronger assessment of calibration.24 However, we now suggest to evaluate calibration using all three methods, and a comparison between the BLR-IPCW and pseudo-value approaches can be used as a proxy for assessing whether the above assumptions of either method may be violated. We also direct you towards some existing text around the computational burdens when estimating confidence intervals for each approach: Line 834: The BLR-IPCW, MLR-IPCW and pseudo-value approaches have different computational burdens. A calibration curve can be obtained reasonably quickly using the BLR-IPCW or MLR-IPCW approaches, however estimation of confidence intervals for BLR-IPCW using bootstrapping (the recommend method in section 2.6) will result in a high computational time in large validation datasets. On the contrary, obtaining the calibration curve itself using the pseudo-value approach has a high computational burden due to estimation of the pseudo-values. Once these have been calculated, a calibration curve and confidence interval can be estimated quickly using parametric techniques, meaning estimation of the confidence interval adds minimal computational burden. Handling of Censoring: Censoring is a significant challenge in survival analysis. The manuscript describes using inverse probability of censoring weights (IPCW) to handle informative censoring, which is a valid approach. However, further discussion on how the choice of weights impacts the calibration curves and the results should be provided. Some sensitivity analysis regarding the impact of weight selection would enhance the robustness of the findings. This links to a comment by reviewer 1. As a result, we have undertaken a comprehensive investigation into this and added a large vignette (Sensitivity-analysis-for-IPCWs) detailing our findings. In summary, firstly, the main analysis were repeated but censoring individuals at 5-years before fitting the model to estimate the weights (i.e. a stopped cox2). This removes the differential cap across the year groups. There is functionality within the package to do this (w_max_follow argument), and is someone we should have done in the first place. This has been detailed in section 4: Line 438: The w_max_follow=t_eval argument censors individuals at t_eval before fitting the model used to estimate the weights, i.e. a "stopped cox" approach.41 This decision was made to help meet the proportional hazards assumption as there is differential follow up for individuals in different year groups (see vignette Sensitivity-analysis-for-IPCWs27 for more details). A sensitivity analysis was then done estimating the weights using a flexible parametric model, as opposed to cox proportional hazards, and lead to similar calibration curves. Further investigation then identified the censoring mechanism, and a non-random observation process, to be the probable driving factor behind the difference in the calibration plots for state 1 and state 3. As well as the new vignette, we have detailed this in the methods section: Line 184: Note that if the censoring mechanism is not conditionally independent from the outcome process X(t) given Z, i.e. the rate of censoring changes depending on outcome state occupancy, then this approach will be invalid. Instead, the outcome history up until time t must be conditioned on when estimating the weights, as specified in equation (3). Summarised this in the results section: Line 612: We explored this theory in more detail (see vignette Sensitivity-analysis-for-IPCWs), but found little change when estimating the weights using a flexible parametric survival model. Instead, we identified that this may be caused by a difference in the censoring mechanism for individuals in the adverse event state, as it appeared these individuals were less likely to be censored. This will bias the results from the BLR-IPCW and MLR-IPCW methods unless the weights are conditional on the amount of time spent in each outcome state, something which calibmsm is not currently set up to do. Although it’s not possible to be certain that the individuals in the adverse event state were less likely to be censored purely from looking at the data, we concluded it was a strong possibility, and that the BLR-IPCW calibration curves may be biased in this particular clinical example. And added the following text to the discussion: Line 800: This is an indicator that the assumptions underpinning either one of the methods could be violated. This was explored in detail (see vignette Sensitivity-analysis-for-IPCWs27) and led to the conclusion that the BLR-IPCW and MLR-IPCW plots are likely unreliable, in particular for the adverse event state. We hypothesised this was driven by a differential censoring mechanism/observation process for individuals in the adverse event state. Simulations studies are required to 1) quantify this type of bias, and 2) explore whether this can be accounted for by estimating the inverse probability of censoring weights using approaches which are conditional on the time spent in each outcome state (for example a latent-class model). If such a study could be undertaken this would be highly valuable.45,46 For now, we reiterate the importance of implementing these methods in settings where the observation process/censoring mechanism does not change depending on the outcome state an individual is in. It has previously been suggested to evaluate calibration using MLR-IPCW and one of the BLR-IPCW or pseudo-value approaches because MLR-IPCW provides a stronger assessment of calibration.24 However, we now suggest to evaluate calibration using all three methods, and a comparison between the BLR-IPCW and pseudo-value approaches can be used as a proxy for assessing whether the above assumptions of either method may be violated. Note, we have also removed the BLR-IPCW calibration curves for predictions made at time s = 100, and focus on pseudo-value calibration curves in section 4.3. Sample Size and Calibration: The manuscript notes that calibration results could be improved with a larger sample size, but it would be helpful to provide more concrete guidance or examples of how sample size affects calibration estimates. Including a brief power calculation or similar quantitative justification for sample size adequacy would enhance the practical use of the software. This is a pertinent point. Regrettably, there is currently no established way for deriving minimum sample sizes for multistate models. This is an active research topic which a PhD student in our group is working on. We fear that doing a simplistic/brief sample size calculation (I.e. such as 10 events per predictor variable, or some other similar approach), can inadvertently lead to bad practice, if other researchers than read this as being a viable solution in the future. For example, Fishers throwaway comment about p-value of 0.05 becomes standard statistical practice for 50 years! We have therefore avoided an explicit sample size calculation, which we believe would be incorrect, and instead added a deeper discussion about how a sample size calculation could work, and have incorporated this into the future work section: Line 861: Despite sample size formulae being available for clinical prediction models predicting continuous,5 binary,6,7 time-to-event6 and multinomial outcomes8; sample size formulae do not currently exist for when developing a multistate clinical prediction model. Given the combinatorial issues with multistate models, overfitting is of particular concern as the number of individuals passing through some transitions may be small. Future work in this area is therefore paramount. A multistate model, at its core, is a network of cause-specific hazards models,9 which are no different to a normal time-to-event model. We hypothesise that existing sample size formula could be applied to each model in isolation in order to get a minimum sample size per transition, which could then be divided by the proportion of individuals expected to reach the starting state for that transition in order to derive the total number of individuals required to satisfy that transitions target sample size. The maximum across all transitions would then be taken. For clock-forward models, this may be complicated by the fact that each cause-specific model is an interval censored model, and it is currently unclear how to apply existing sample size formula6 to interval censored data. Practical Interpretation: While the results section demonstrates the use of the calibmsm package well, a deeper interpretation of the results (e.g., how clinicians or statisticians should act based on poor calibration) would be helpful. This would provide the necessary bridge between technical calibration results and clinical decision-making. We thank the editor for highlighting this important point. We are pleased that the clinical example shows how to use the package well; as such, we have kept the text around this unchanged, since this is one of the main intended contributions of this paper. However, we fully agree with the reviewer of the need to bridge technical results and clinical decision-making. As such, we have added a detailed discussion of this to a new Box (Box 1) at the end of section 4. This box is intended to outline how researchers should interpret the calibration results across all three approaches, and to give some indication of how researchers should act upon finding that the model is miscalibrated. We emphasize that miscalibration means the model should not be used to inform clinical decision-making, and that revisions to the model would be needed to allow this. We suggest some ways to achieve this (e.g., reducing model complexity to reduce overfitting), but we also note that this itself requires further methodological development. Specifically, our new Box 1 reads as follows: Line 776: Box 1: Interpretation of the calibration results. Assessing the calibration of multi-state clinical prediction models requires consideration of each of the states of the model, with a requirement for there to be good calibration across all states before the model could be used in clinical practice. We have provided three methods to assess calibration (see Section 2), and we recommend assessing calibration using each, so that the results can be compared. The calibration curves shown in Figure 3, Figure 4 and Figure 5, which consider predictions out of the recovery state a time 0, show that there is good agreement between the observed and predicted risks for some – but not all – of the states. The results tell us that the transition probabilities of remaining in state 1 (transplant) are pre-dominantly over-predicting. Specifically, the model is over-estimating the predicted risk of someone not recovering, having an adverse event, experiencing relapse or death following transplant. The transition probabilities of being in state 2 (recovery) or state 5 (adverse event + recovery) are either under or over-predicting depending on the predicted risk value. A key clinical outcome in this clinical setting is the risk of relapse (state 5), with these results showing that the model should not be used to inform risk estimation for this state, since the calibration of state 5 is extremely poor. On the contrary, the calibration of transition probabilities for state 4 (adverse event + recovery) and state 6 (death) are reasonably well calibrated. Checking for consistency in conclusion across the three calibration method is always recommended as it may reveal important insights from the analysis. Indeed, we found differences in the calibration results of state 3 across the three methods (as discussed in the main paper). This led to further investigation, which concluded that the calibration plots from the BLR-IPCW and MLR-IPCW approaches may be biased in this setting, in particular state 3 (adverse event), leading us to focus on the pseudo-value calibration plots, which indicates the transition probabilities into the adverse event state were well calibrated. The pseudo-value calibration curves in Figure 6 and Figure 7, which consider predictions out of the recovery and adverse event states at time 100, show very poor agreement between the observed and predicted risks. In our opinion, finding that there are some states with miscalibrated transition probabilities informs us that the predicted risks from the model should not be used to inform clinical decision-making. For example, it is clear that the model should not be used to aid clinical decision-making around relapse risk following transplant, especially when making predictions 100 days post-transplant. On the contrary, one ma Attachments Attachment Submitted filename: Reviewer Response 20241101.docx https://doi.org/10.1371/journal.pone.0320504.r003
20 Feb 2025 Decision Letter - Md Rahaman Khan, Editor calibmsm: An R package for calibration plots of the transition probabilities in a multistate model PONE-D-24-31812R1 Dear Dr. Pate, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Md Hasinur Rahaman Khan, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Need to check carefully the typos and PLOS ONE manuscript structures while providing the final document for potential publication. Reviewers' comments: https://doi.org/10.1371/journal.pone.0320504.r004
Formally Accepted
Acceptance Letter - Md Rahaman Khan, Editor PONE-D-24-31812R1 PLOS ONE Dear Dr. Pate, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Md Hasinur Rahaman Khan Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0320504.r005

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .