Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Preventing postoperative pulmonary complications by establishing a machine-learning assisted approach (PEPPERMINT): Study protocol for the creation of a risk prediction model

  • Britta Trautwein ,

    Roles Conceptualization, Visualization, Writing – original draft

    britta.trautwein@uniklinik-ulm.de

    Affiliation University Hospital Ulm, Ulm, Germany

  • Meinrad Beer,

    Roles Conceptualization, Writing – review & editing

    Affiliation University Hospital Ulm, Ulm, Germany

  • Manfred Blobner,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliations University Hospital Ulm, Ulm, Germany, Technical University of Munich, Munich, Germany

  • Bettina Jungwirth,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation University Hospital Ulm, Ulm, Germany

  • Simone Maria Kagerbauer ,

    Contributed equally to this work with: Simone Maria Kagerbauer, Michael Götz

    Roles Conceptualization, Data curation, Funding acquisition, Methodology, Writing – review & editing

    ☯ These authors contributed equally to this work.

    Affiliations University Hospital Ulm, Ulm, Germany, Technical University of Munich, Munich, Germany

  • Michael Götz

    Contributed equally to this work with: Simone Maria Kagerbauer, Michael Götz

    Roles Conceptualization, Data curation, Funding acquisition, Methodology, Writing – review & editing

    ☯ These authors contributed equally to this work.

    Affiliation University Hospital Ulm, Ulm, Germany

Abstract

Background

Postoperative pulmonary complications (POPC) are common after general anaesthesia and are a major cause of increased morbidity and mortality in surgical patients. However, prevention and treatment methods for POPC that are considered effective tie up human and technical resources. Therefore, the planned research project aims to create a prediction model that enables the reliable identification of high-risk patients immediately after surgery based on a tailored machine learning algorithm.

Methods

This clinical cohort study will follow the TRIPOD statement for multivariable prediction model development. Development of the prognostic model will require 512 patients undergoing elective surgery under general anaesthesia. Besides the collection of perioperative routine data, standardised lung sonography will be performed postoperatively in the recovery room on each patient. During the postoperative course, patients will be examined in a structured manner on postoperative days 1,3 and 7 to detect POPC. The endpoints determined in this way, together with the clinical and imaging data collected, are then used to train a machine learning model based on neural networks and ensemble methods to predict POPC in the early postoperative phase.

Discussion

In the perioperative setting, detecting POPC before they become clinically manifest is desirable. This would ensure optimal patient care and resource allocation and help initiate adequate patient treatment after being transferred from the recovery room to the ward. A reliable prediction algorithm based on machine learning holds great potential to improve postoperative outcomes.

Trial registration

ClinicalTrials.gov ID: NCT05789953 (29th of March 2023)

Introduction

The incidence of postoperative pulmonary complications (POPC) varies between 9–40%, depending on the surgical procedure and the definition used [1]. Therefore, the introduction of a standardised definition of the outcome “postoperative pulmonary complications”, developed by the Standardised Endpoints for Perioperative Medicine (StEP) collaboration in 2018, represents a prerequisite for future studies in this field [1]. Even supposedly minor complications have the potential to significantly increase the length of hospital stay [2]. Various preoperative risk factors are known but usually cannot be modified. As POPC are the main cause of postoperative morbidity and mortality and the reduction of perioperative mortality is dependent on early recognition and treatment [3], accurate prediction is of paramount interest. There are only a few clinical scoring systems [4]; the currently best-evaluated preoperative score for predicting postoperative pulmonary complications (ARISCAT: Assess Respiratory Risk in Surgical Patients in Catalonia, Table 1) has sufficient sensitivity but lacks specificity [5]. A first retrospective study using machine-learning (ML) methods for determining risk for pneumonia and pulmonary embolism using pre- and intraoperative routine data has shown good accuracy [6]. However, the study shows high specificity but low sensitivity, which could result in overtreatment in clinical practice. In our study, we aim to create a high precision decision-supporting tool for perioperative physicians to identify high-risk patients at an early stage.

However, ML algorithms based only on routine clinical data depend highly on data quality and comprehensiveness. Algorithms based on standardised imaging data are easier to transfer to other facilities and implement in routine clinical practice. Therefore, combining image analysis might be beneficial, especially as sonography is becoming increasingly important as a non-invasive examination method that can be performed at the bedside. Various sonographic scores and models have been developed to predict pulmonary complications [7]. Image processing methods and machine learning, particularly deep learning, are also increasingly used in ultrasound diagnostics [8,9]. Augmented algorithms using pre- and intraoperative clinical information in addition to ultrasound imaging data may provide better predictive accuracy than the respective individual methods. However, to our knowledge, combining routine clinical data and ultrasound imaging data to develop a predictive machine-learning algorithm has not yet been tested. In addition, prospective clinical evaluation of machine-learning algorithm-based prediction models, which is planned herein, lacks to date.

Measures for preventing POPC, such as postoperative non-invasive ventilation and physiotherapy, are known and considered effective [10,11] but are probably not consistently applied in clinical routine due to the increased demand, especially for human resources.

This study aims to combine pre- and intraoperative data with lung ultrasound imaging in the recovery room to develop an ML-based risk score for POPC. A precise score that reliably identifies patients at risk in the early postoperative phase and simultaneously avoids overtreatment can ensure adequate personalised treatment of postoperative patients.

Materials and Methods

Trial registration

Name of the registry: ClinicalTrials.gov

Registration ID: NCT05789953

Approval date: 29/03/2023

Ethics approval

Name: Ethics committee University of Ulm

Approval Number: 369/22

Approval date: 22/12/2022

Head of committee: Prof. Dr. Florian Steger, Oberberghof 7, 89081 Ulm, Germany

Mail: ethik-kommission@uni-ulm.de

Homepage: https://www.uni-ulm.de/einrichtungen/ethikkommission-der-universitaet-ulm/

Written informed consent to participate will be obtained from all participants.

Objectives

The main hypothesis of the PEPPERMINT study is that a patient’s risk of POPC can be reliably predicted using a machine-learning algorithm and that the predictive accuracy of the algorithm outperforms common clinical scoring systems.

The primary objective is to develop a machine-learning algorithm based on immediately postoperatively obtained lung ultrasound imaging data of adult patients undergoing surgery in general anaesthesia to predict the risk of postoperative pulmonary complications. This model is intended to provide better predictive ability than the currently best-established clinical score, the ARISCAT [5] or a machine learning model solely based on clinical routine data.

The secondary objective is to investigate whether improving model performance by adding clinical routine parameters to the imaging data is possible.

Furthermore, the optimal risk threshold for an intervention will be determined in case of a clinical application of the model.

Further objectives include identifying patient-specific risk factors for POPC through analysis of the collected routine clinical data and modification of the models created to predict the secondary endpoints of hospital length of stay, in-hospital mortality and postoperative quality of recovery.

Trial design

The PEPPERMINT study is a prospective, single-center clinical cohort study designed to develop and evaluate a risk prediction model for POPC. The study follows the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines for multivariable model development and validation [12].

A total of 512 patients will be enrolled based on sample size calculation. For each patient, both clinical data and lung ultrasound images will be prospectively collected. The primary endpoint is the occurrence of any postoperative pulmonary complication, as defined by standardised criteria.

Three predictive models will be developed

First, a model based on deep learning using lung ultrasound images, second, a model based solely on clinical variables using common frameworks like automated machine learning (AutoML), and finally, a combined model integrating both clinical and imaging features.

The dataset will be split using stratified, patient-wise sampling into non-overlapping training and hold-out test sets. Internal model performance will be assessed using k-fold cross-validation within the training set, and final evaluation will be conducted on the independent test set. Full details of the modelling and validation procedures are provided in the statistical methods.

Study setting

The PEPPERMINT study will be conducted as a single-centre study at the University Hospital Ulm in Germany. The hospital is a tertiary care and academic hospital where about 30,000 anaesthesia procedures are performed annually, including a broad spectrum of surgical disciplines and interventions. To establish a data processing pipeline for imaging data and to integrate artificial intelligence (AI) algorithms into the study setting, an interface between the ultrasound devices and the hospital´s internal Picture Archiving and Communication System (PACS) has already been established. To bundle the expertise in image processing, collection, and processing of big data, the departments of anaesthesiology and radiology cooperate for this study.

The members of the study group are predominantly physicians and specialists in the fields of anesthesiology and radiology. The leader of the research group “Experimental Radiology”, who holds a formal background in engineering, is part of the study team. His expertise lies in machine learning, deep learning, and computer vision, with a focus on medical imaging applications. Another group member holds a master’s degree in medical informatics and has substantial experience in machine learning for perioperative risk prediction. The study team is completed by Master’s and PhD students from the Department of Radiology with a background in computer science and engineering

In addition to specialised personnel, high computing capacity for fast processing and reliable storage is necessary to develop the model and will be provided by the involved departments.

As the study is designed to build decision-support systems, on-site evaluation is planned as a subsequent step, which will be covered by a separate study. The data is evaluated offline without direct interaction between medical staff and the AI algorithm. Consequently, no feedback loop is planned in this phase.

Eligibility criteria

Adult patients (≥18 years) of all sexes are eligible for the study.

Patients must meet the following inclusion criteria: Scheduling for elective surgical procedures under general anaesthesia with a planned overnight hospital stay, and written informed consent by patients.

If they meet any of the following exclusion criteria, they will not be included in the study: Younger than 18 years of age, outpatient surgery, planned postoperative admission to intensive care unit (ICU), need for intensive care treatment before surgery and emergency surgery.

Secondary exclusion criteria are: Unplanned hospital discharge/transfer on the day of surgery, which does not allow examination of the primary outcome; cancellation/postponement of index surgery; and immediate unplanned postoperative admission to the ICU due to an intraoperative complication.

Furthermore, the Inclusion criteria for input data are: At least two adjacent ribs and the pleura or corresponding pathologies (e.g., pneumothorax, pleural effusion) must be visible on the ultrasound image. The ultrasound examination must cover all 12 defined areas (details in chapter “Study measures”).

The exclusion criteria for input data are: Ultrasound images on which the leading structures ribs and pleura or alternative pathologies cannot be depicted, or incomplete visualisation of the 12 previously defined examination areas in the thoracic region, e.g., in the case of immobile patients or inaccessibility due to a bandage or drains. Importantly, decisions regarding potential exclusion due to poor image quality will not be made by the examining physician, but rather by an independent, blinded radiologist, thereby minimising the risk of subjective selection bias.

Clinical data is collected from patients throughout their hospitalisation. An employee who is not involved in data collection will carry out a plausibility check and random comparison with the medical records at regular intervals. Implausible data is removed from the data record.

Patients who cannot be visited postoperatively and for whom no endpoint could be defined are excluded from the study; however, their data may be used for secondary analyses. Incomplete pre-operative routine data in the electronic patient record are not a reason for exclusion if the patient can be visited postoperatively

Recruitement and informed consent

The University Hospital Ulm is a tertiary care hospital where about 30,000 anaesthesia procedures are performed annually. The PEPPERMINT study evaluates a general surgical population; therefore, eligibility criteria were set as low as possible. Approximately 50 patients are seen in the pre-anaesthesia outpatient clinic for preoperative evaluation each working day, of which about 80% will fulfil the inclusion criteria. The abovementioned 3 risk groups, according to the ARISCAT score, will be recruited equally, which means that after about 60 patients in one risk group, it will be paused, and the other risk groups will be prioritized until the same number is present in all risk groups. The equipment and staffing of the recovery rooms and the existing expertise in lung sonography allow the inclusion of 3–5 patients daily so that patient recruitment should be possible without any problems within one year.

Informed consent from trial participants or authorised surrogates will be obtained by a physician from the Department of Anaesthesiology and Intensive Care Medicine of the University Hospital of Ulm. The informed consent discussion will be part of the scheduled informed consent discussion for general anaesthesia. According to the usual routine preoperative procedure, the optimal anaesthesia for the patient is planned based on previous diseases, previous anaesthetic and surgical procedures, and personal preference. In case of doubt, the senior physician will be consulted. If the inclusion criteria are met and the patient gives consent, an informed consent discussion about the PEPPERMINT study will take place. Written informed consent will be obtained from the physicians, who will explain the hospital’s policy on data collection and storage, the general process, and the goals of the study. Additional consent for the collection of participant data, which is routinely obtained during standard pre-, intra- and postoperative anesthesiologic care, will be obtained as well.

A participants schedule is shown in Fig 1.

thumbnail
Fig 1. SPIRIT schedule PEPPERMINT trial.

Schematic diagram presenting the schedule for participants, based on SPIRIT schedule; LUS = Lung ultrasound; POPC = postoperative pulmonary complications.

https://doi.org/10.1371/journal.pone.0329076.g001

Study measures

Study-related measures to acquire the imaging data include performing a standardised lung ultrasound on each included patient immediately postoperatively in the recovery room or post-anaesthesia care unit (PACU). Sonographic examination of the lungs is a common, noninvasive bedside procedure. It can be performed without additional positioning in supine position and takes approximately 5 minutes. The examination is performed using a standardised, previously published method [13]. Thereby, a convex probe (5 MHz) and a predefined preset of the ultrasound device (the default “lung” preset of the device) are going to be used. Each hemithorax will be divided into 6 areas, separated by the anterior and posterior axillary lines (anterior, lateral, and posterior) and a superior and inferior area. For each patient, 12 pictures and 12 videos, one in each area, will be captured. This 12-zone protocol showed the highest intra-class correlation coefficient compared to other protocols [14]. A graphical representation of the areas can be found in Fig 2.

thumbnail
Fig 2. Graphic representation of the lung ultrasound areas.

Lung ultrasound will be performed in 12 areas, 6 in each hemithorax. Areas are separated by the anterior and posterior axillary line into an anterior, lateral and posterior zone and a superior and inferior area.

https://doi.org/10.1371/journal.pone.0329076.g002

The criteria for including imaging data are described above (“Eligibility Criteria”). If these conditions are not met, the interpretability of the ultrasound is considered insufficient.

To maximize inclusiveness and minimize selection bias, operators are encouraged to adjust technical parameters (e.g., depth and gain) during image acquisition in order to meet these quality criteria. A convex transducer was selected to allow visualization of deeper structures across a wide range of patient anatomies.

The performing physicians are experienced in perioperative medicine and critical care and will be trained in ultrasound methodology before the start of the study.

After transmission and storage in the PACS of the Department of Radiology, image pre-processing and artifact correction are performed before the data serves as input for the machine learning model. Images are not labeled by human experts; only the pre-defined endpoint POPC serves as labels for the imaging data.

The clinical data obtained for the study correspond to parameters routinely collected during anesthesiologic preoperative evaluation, the course of anaesthesia during surgery, and postoperatively in the recovery room. Plausibility checks are carried out on the numerical data; if these deviate from the valid value range, they are removed. Missing data may be imputed by different methods as described below.

Based on the collected data and the outcome parameters (description in chapter “Outcome”), machine learning algorithms will be trained to predict POPC by outputting an individual patient’s percentage risk of suffering a postoperative pulmonary complication.

Modifications, adherence and concomitant care

Regarding the study protocol, the following scenarios allow a change in the study protocol: (1) In case of patient withdrawal of consent, the study protocol will be stopped immediately, and the patient will be excluded from the study. (2) If it is not possible to perform an ultrasound examination that meets the abovementioned quality requirements, the patient will be excluded from the study. (3) Postoperative lung ultrasound will take place in the recovery room; dates might be rescheduled depending on the postponement of the surgery. (4) Postoperative visits on the normal ward will be on postoperative days 1,3 and 7. If the patient is discharged from the hospital or transferred to another hospital within the first 7 days, the study ends on the day of discharge.

To improve workflow, the recruiting anaesthetist in the pre-anaesthesia outpatient clinic receives a simple checklist with eligibility criteria and the ARISCAT score (S1 File), and study information will be handed to patients in the waiting area. To improve adherence to the lung ultrasound protocol in the recovery room, a group of selected anaesthetists will undergo personal training in ultrasound methodology and receive detailed instructions in written form, which are also attached to the ultrasound device. Additionally, pocket cards with brief instructions will be distributed among the responsible physicians.

Postoperative ward visits will be performed by qualified study nurses and trained medical students with a tablet to simplify the visits and enhance data management.

During the trial, the patient will undergo routine perioperative care as per standard. Therefore, the patient receives the concomitant or intervention as per his physicians’ decision, and no concomitants or interventions are prohibited during the trial. Relevant information regarding POPC that results from postoperative diagnostics or interventions will be recorded. Supposed complications that have not yet been treated are noticed during the postoperative visit. In that case, the study staff will inform the responsible ward physician to initiate any necessary therapy.

Sample size

Predicting POPC for the clinician is difficult on a case-by-case basis. Therefore, several scoring systems have been developed in the past. The most common of these, the ARISCAT score, has an AUROC of 0.83 [15]. It should be noted that the current literature does not provide any further metrics such as area under the precision-recall curve (AUPRC) or F1 score on the ARISCAT [15,16]. We therefore used the AUROC as the reference metric. The model we aim to create should be significantly better than the ARISCAT score and thus have at least an AUROC of 0.93. Achieving this seems realistic since, in the preliminary work of our research group, prediction models for various postoperative complications have already been created, whose predictive accuracy is in this range [17,18]. With a significance level of 0.05 and a power of 80%, 512 patients would be needed to create the database based on the method described by Hanley and McNeil for comparing ROC curves [19].

Patients are stratified according to the risk criteria determined by ARISCAT so that approximately equal numbers of patients are included in each of the low-risk (ARISCAT < 26 points), intermediate-risk (ARISCAT 26–44 points), and high-risk (ARISCAT ≥ 45 points) groups.

Outcomes

The primary outcome of the PEPPERMINT study to be predicted by the machine learning model is the risk of developing postoperative pulmonary complications after surgery in general anaesthesia between postoperative days 1 and 7. POPC will be defined and graded by severity according to the standardised criteria of the StEP collaboration [1]. Complications not further described by the StEP collaboration will be defined as from the EPCO (European Perioperative Clinical Outcome) task force [20] as listed in Table 2. POPC, as a composite outcome, summarises atelectasis, pneumonia, acute respiratory distress syndrome (ARDS), pulmonary aspiration, pulmonary embolism, pleural effusion, pneumothorax, and bronchospasm [1]. POPC is assumed as soon as at least one of the listed events occurs and will be detected during the postoperative visit or chart review after the patient is discharged.

thumbnail
Table 2. Definition of postoperative pulmonary complications.

https://doi.org/10.1371/journal.pone.0329076.t002

Outcome assessment will be performed by qualified study staff consisting of three study nurses and two doctoral students on postoperative visits on days 1,3 and 7. To detect complications, visits will include a questionnaire (pulmonary symptoms and mental state), a clinical examination (pulmonary auscultation), the collection of vital parameters (Heart rate, blood pressure, oxygen saturation, breathing rate, temperature) as well as a chart review (oxygen supply, medication, signs of aspiration, admission to ICU). If available during the postoperative course, the following measures will be included: laboratory (c-reactive protein (CRP), procalcitonin (PCT), leukocytes, partial pressure of arterial oxygen (paO2), and partial pressure of arterial carbon dioxide (paCO2), thoracic imaging (chest radiography or computed tomography) and respiratory support (CPAP, non-invasive or invasive ventilation). All potentially collected parameters are described in Table 3. To check postoperative mental status, the Mini-Cog test will be used. This test consists of a 3-word recall task and the clock drawing test [21].

Additionally, after hospital discharge, relevant pulmonary imaging, diagnoses or complications are extracted from the discharge letter.

Postoperative recovery and patient satisfaction as a secondary outcome parameter are going to be evaluated with the Quality of Recovery-9 (QoR-9) questionnaire (Table 4) on day 1,3, and 7 [22]. The score ranges from 0 to 18, with a higher score indicating a better subjective recovery.

Other secondary outcome parameters, hospital length of stay and in-hospital mortality, will be determined by a chart review after hospital discharge.

Data management

A FileMaker™ database is used to record postoperative outcomes. Database constraints avoid duplicate entries and values outside the valid range of clinical routine data. The ARISCAT score will be determined pre-operatively with the help of a paper-based checklist. A detailed description of the score is provided in “Background and rationale”. The score has been validated in several European populations [15,23]. The primary endpoint, POPC, is defined according to the consensus definition of the StEP collaboration [1] and complemented by the definition of the EPCO task force [20]. The definition is described in detail in the chapter “Outcome”. As mental status is part of the StEP criteria, a brief cognitive screening test (Mini-Cog test) will be performed regularly during the post-operative visits. The sensitivity of this test is 0.76–0.99, and the specificity is 0.83–0.93 [24]. As a secondary endpoint, the Quality-of-Recovery 9 questionnaire will be applied to assess subjective recovery after surgery. The test is highly sensitive (0.92 ± 0.01) with a high negative predictive value (0.93 ± 0.01) in a German study collective [22]. The tests and the questionnaire are detailed in the chapter “Outcome”.

To promote participant retention, all outcome data will be assessed while the patient is still hospitalised. If a patient drops out, for example, because of withdrawal of consent or unexpected surgery rescheduling and therefore missed ultrasound, the study protocol will be stopped immediately, no further data will be collected, and already gathered data will be deleted. In case of patient discharge from the hospital before performing postoperative visits on days 3 and/or 7, the study protocol will continue, and the available data will be processed if the postoperative visit on day one is documented.

The data are collected and stored in different formats. The inclusion criteria as well as the ARISCAT score, are going to be collected paper-based in the pre-anaesthesia outpatient clinic. Lung ultrasound is performed with the SonoSite PX (Fujifilm SonoSite Inc., Bothell, Washington, USA). The imaging data is transferred to the hospital’s internal PACS and stored and processed in Digital Imaging and Communications in Medicine (DICOM) format within the Department of Radiology. Routinely collected perioperative data are stored in the hospital’s internal patient data management system (PDMS) in an Oracle8i database. The data is accessible via Structured Query language (SQL) and can be exported in csv format. Postoperative visits are conducted in the form of structured questionnaires using tablets. Data collection and processing are done with FileMaker Pro software (Claris, version 19.6.3.302). In the electronic data entry form for the postoperative visits, no user input is possible outside the valid value ranges for numerical data; dichotomous questions (yes/no) are documented with the help of radio buttons. The clock is drawn by the patient on the tablet and also stored in the database as a drawing. Together with the performance from the word-recall task, the score of the Mini-Cog test is calculated. Regarding the correct evaluation of the drawn clock, the score is cross-checked by a physician before it is finally transferred to the database. After patient discharge, discharge notes are searched for relevant diagnoses and complications. The total volume of data collected will be merged into a comprehensive FileMaker database. Data storage occurs pseudonymized and de-identified.

No laboratory evaluations or biological specimens outside the clinical routine will be obtained or stored as part of this study. For the detection of postoperative pulmonary complications, diagnostic tests and relevant results performed by the respective departments will be collected during a chart review and directly transferred to the pseudonymized data collection.

Any documents with identifiable information will be collected in paper-based form and stored in a locked cabinet at the study centre, where only authorised personnel will have access to them. This includes original informed consent, a checklist with eligibility criteria, the ARISCAT score, and the patient identification list. The documents will be kept there until completion of the study. The identification list will be stored separately, and only authorised study personnel will have access to it. After completion of the study, all paper records will be stored in a central archive for at least ten years according to the clinic’s specifications and legal requirements. Information collected during the postoperative visits will be saved without any identifiable information on password-protected study tablets. After transferring the data to a hospital-internal computer, they are deleted from the tablets.

Pseudonymised data received from PDMS, chart review, and postoperative visits will be securely stored in the hospital’s internal server infrastructure according to GDPR requirements. Imaging data is exclusively stored within the clinic’s radiology information system.

Statistical methods

The primary objective of the PEPPERMINT study is to develop and evaluate prediction models for postoperative pulmonary complications (POPC) using prospectively collected clinical and imaging data. Model development and validation will be conducted in accordance with the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines. The statistical analysis plan is provided in S3 File.

Overview of modelling approaches

Three predictive modelling strategies will be pursued. First, an imaging model will be created using deep learning techniques applied to lung ultrasound images. This model will be developed in Python using the PyTorch framework. Transfer learning will be employed, utilizing pretrained convolutional neural network (CNN) architectures such as ResNet or DenseNet. These models will be fine-tuned on the study-specific dataset. We will also assess the performance of medical foundation models and contrastive learning-based pretraining to optimize feature extraction from the ultrasound images.

Second, a clinical model will be developed using only structured patient data. This model will be trained using frameworks like H2O AutoML within R/RStudio as well as built-in R functions, which allows for the systematic evaluation and tuning of various machine learning algorithms, including gradient boosting machines, random forests, and neural networks.

Third, a combined model will integrate both clinical and imaging data. Various fusion strategies will be explored, including early and late fusion, to determine the most effective method for combining heterogeneous data types. The final architecture for this combined model will be selected based on performance observed in a hold-out test dataset.

Model training and evaluation

The primary model development will be based on a dataset of 512 patients. Within this cohort, we will implement stratified k-fold cross-validation (typically 5- or 10-fold, depending on the final class distribution) where possible to assess model robustness and mitigate overfitting. Image-based deep learning models require extensive computational resources, therefore, cross-validation will be applied more selectively. We will therefore use a fixed training/validation/test split for imaging data, ensuring non-overlapping patient groups. Early model experiments will be conducted using internal validation on the training data with hold-out validation to optimize architecture and hyperparameters. Final model evaluation will be performed on an independent hold-out test set.

This hold-out test set will be compiled prospectively during the course of the study after model development, using newly enrolled patients not included in the training dataset. This temporal separation will ensure unbiased performance evaluation of the final model.

Model performance will be evaluated on the independent hold-out test set using several complementary performance metrics. These will include the area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), overall accuracy, sensitivity, specificity, F1-score, as well as positive and negative predictive values. Calibration will be assessed using appropriate measures, such as the Brier score, Hosmer-Lemeshaw-Test and calibration curves. Clinical decision-making utility can additionally be evaluated using decision curve analysis (DCA).

Importantly, our models are designed to output individualized risk probabilities, not dichotomous predictions. As such, the selection of a cut-off point for potential clinical decision-making will be a post-modelling step, and risk thresholds for high-risk patient classification will not be predefined. Instead, various thresholding strategies will be explored post hoc, including Youden’s index (for balanced sensitivity and specificity), cut-offs optimized for specific clinical priorities (e.g., maximizing sensitivity), and thresholds guided by DCA.

Addressing overfitting and class imbalance

Overfitting will be addressed through the implementation of several regularization techniques commonly used in deep learning. These include dropout layers within the network architecture, L2 weight regularization to penalize large weights, and data augmentation techniques applied to ultrasound images (e.g., rotation, flipping, and zooming). Model complexity will also be restricted as needed.

To better estimate the prevalence and clinical relevance of POPC, we conducted a preliminary single-center observational study involving 259 patients undergoing surgery under general anesthesia. The cohort included 106 female patients (41%) and 62 current smokers (24%) with a median age of 66 years. Overall, 111 patients (43%) experienced at least one POPC, indicating that POPC is a relatively frequent complication in the targeted patient population [25].

Based on this preliminary observational data, only moderate class imbalance is to be expected. Nevertheless, additional strategies will be employed to ensure balanced learning. These include maintaining class ratios across cross-validation folds, monitoring class-wise performance metrics, and using class weighting techniques if imbalances are observed in specific subgroups.

Missing values and data imputation

With regard to imaging data, only complete data sets will be accepted for analysis.

Missing values in the clinical dataset will initially be handled using the default preprocessing pipeline in the case that H2O AutoML is used. This pipeline applies median or mean imputation for numerical variables and mode imputation for categorical variables (https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-munging/imputing-data.html). In addition to this standard approach, we will evaluate alternative imputation strategies, particularly for approximately normally distributed numerical variables. These include, e.g., k-nearest neighbors (kNN) imputation, which leverages similarities across observations in the multivariate feature space and multiple imputation using chained equations (MICE), which incorporates multivariable relationships and reflects the uncertainty associated with imputed values.

Given the prospective nature of the study, we anticipate that most missing data will not be missing at random (NMAR), but rather occur systematically, for instance, due to early patient discharge or an inability to perform assessments as a result of clinical deterioration. In such cases, the absence of data may itself be informative with respect to the patient’s risk for postoperative complications. Therefore, in addition to the imputation techniques mentioned above, we will explore modelling strategies that explicitly account for informative missingness. These include the incorporation of missingness indicators (binary flags denoting whether a variable was observed or missing) and sensitivity analyses to assess the robustness of model predictions under different assumptions about the missing data mechanism.

The final approach to handling missing data will be selected based on a combination of model performance, diagnostic checks, and clinical plausibility.

Secondary models and endpoints

In addition to the prediction models, we aim to extract specific predictive markers which might be evaluated in an additional study. Once established and evaluated with additional studies, both the AI models and the marker might help to identify high-risk patients, allowing for the adaptation of treatment at an early point of care.

In addition to the primary binary classification model for the presence or absence of POPC, the study will explore the development of additional models that aim to distinguish between different types and severities of complications. These extended models will incorporate more granular outcome definitions, including the timing of complication onset and the presence of multiple complications in a single patient.

Moreover, secondary predictive models will be constructed to estimate outcomes such as quality of postoperative recovery, length of hospital stay, and in-hospital mortality. Important features will be identified based on variable importance metrics. Comparisons between patients with and without complications will be conducted using appropriate statistical tests, including t-tests or Mann-Whitney U tests for continuous variables, and chi-squared or Fisher’s exact tests for categorical variables.

External and subgroup validation

While model development and internal validation will be performed using data from a single clinical centre, further validation is planned at a second site of our hospital that predominantly treats gynaecological and ENT patients. This cohort will serve as a distinct population for assessing model generalisability across surgical disciplines.

Additionally, the model will be evaluated in a separate high-risk subgroup comprising patients undergoing urgent or emergency procedures, such as elderly individuals with hip fractures. These patients were intentionally excluded from the initial development cohort and will offer insight into the model’s performance in more acute care settings.

Interim analysis

An interim analysis will be conducted after the enrollment of approximately 50 patients (around 10% of the planned sample size). This analysis will assess the technical feasibility, data quality, and operational workflow. As this is a non-interventional study, early termination is not planned regardless of interim results.

Oversight and monitoring

The Coordinating investigators are responsible for study design, funding, and creation of the study protocol. They take over the coordination and communication between the two involved departments, the Department of Anaesthesiology and Intensive Care Medicine and the Department of Radiology, and the persons involved in the study. Furthermore, they are part of the trial steering committee.

The trial steering committee is responsible for adhering to the study protocol, conducting the planned patient enrollment, and compiling the patient identification list. They will monitor the study’s progress and, if necessary, agree on changes to the procedure and study protocol. The committee meets once every 2 months in an in-person or online meeting.

The Lead investigator is responsible for eligibility, consent, and enrollment of patients as well as imaging and data collection, and therefore supports the practising physicians on a day-to-day basis.

The present study is not a study according to the German Medical Product Act (AMG) or the Medical Devices Regulation (MDR). The study is monocentric, and no intervention will take place. No risks for the patients are to be expected. For these reasons, no external data monitoring committee will be set up.

However, the coordinating investigators will be responsible for the creation of the database, supporting data entry, data verification, and quality management. Data monitoring and outcome reports will take place every 8 weeks.

Adverse events and harms

Patients will undergo routine perioperative care as per standard during this trial; responsible for patient care will be the attending physicians and departments. Additionally, patients will receive (1) a lung ultrasound, considered non-invasive and without side effects, and (2) a clinical examination and interview at up to three time points postoperatively. All other procedures are part of general anaesthesia or usual perioperative management and are completed even without study participation. Therefore, we do not expect any complications or harm from trial participation.

Nevertheless, patients can report adverse events or other unintended effects of the trial to the study hotline or email address. The trial steering committee will process the reports.

Due to the relatively small size of the data set, the prediction model will be developed using the whole dataset. Cross-validation will be used to evaluate the performance of the model. Furthermore, validation is planned on a temporally independent data set obtained during the period of model training. Therefore, the study will be classified as Type 1b according to the TRIPOD Statement [12].

Error analysis will be carried out with the help of a confusion matrix after threshold determination. The cases of incorrect predictions will be analyzed in more detail, in particular, to determine whether certain characteristics correlate with the errors. External validation will take place in future studies.

In the current study, we are not focusing on the identification of possible confounders for model performance. This will be done in a follow-up study, investigating the impact of possible confounders such as different raters, imaging devices, bad image quality, resolutions, etc., in a possibly multi-centric study.

Discussion

A major target of clinical research in the perioperative field is to reduce the occurrence of postoperative complications. In times of skills and resource shortage, personalized medicine is getting more important, which includes the application of required treatment but avoids overtreatment. Machine learning algorithms might improve risk prediction as a prerequisite for personalised medicine. POPC represent a large proportion of the overall postoperative complications and occur about twice as frequently as cardiac complications. POPC are not only common, but they are also responsible for increased morbidity and mortality. Furthermore, they contribute to increased hospital length of stay and a higher frequency of hospital readmissions. Therefore, they occupy more healthcare resources and cause higher healthcare costs [2629].

Despite these facts, there are only a few scores for evaluating pulmonary risk, which have not yet become standards in clinical routine, even though pulmonary complications could be controlled and avoided by specific, however, personnel-intensive measures.

Lung ultrasound is a non-invasive, bedside diagnostic screening measure that has recently become increasingly popular, not at least due to the COVID-19 pandemic. Standardised protocols and guidelines mean lung ultrasound is becoming increasingly important in clinical medicine [30]. Increasingly, machine-learning models based on ultrasound examinations are being developed that deliver high diagnostic accuracy [31] and already exceed the currently best evaluated conventional score, the ARISCAT Score [32].

The PEPPERMINT study aims to develop a tailored machine-learning model to reliably predict the risk for POPC, based on lung ultrasound imaging performed in the recovery room and perioperatively assessed clinical data. We hypothesize that the accuracy of the prediction model outperforms common scoring systems. Early identification of patients at risk helps to target scarce resources and apply adequate therapy in the sense of personalised medicine.

Limitations

We wanted to set a specific time frame for the post-operative visits. Therefore, in-person visits occur exclusively during the first 7 postoperative days. After that timepoint, the survey of findings is limited to a chart review. However, since the majority of POPC occurs within the first week [26], this pragmatic approach is justified.

Secondly, risk identification by the model will take place only after the surgery. Therefore, preoperative assessment and optimization are not the subject of our study. However, if one considers the criteria that are relevant in preoperative risk assessment scores [5], for example, age, respiratory infection, or expected surgery duration and incision, most of the criteria are related to the underlying disease or planned surgical procedure, and are therefore not amenable to preoperative modifications. Consequently, in this study we would like to develop a tool that reliably predicts complications in the early postoperative phase in order to be able to provide the patient with adequate postoperative treatment and monitoring.

Strengths

The PEPPERMINT study will be the first study to combine ultrasound imaging data with clinical data in an artificial intelligence prediction model. We, therefore, hope to achieve a highly accurate risk prediction that can be applied in clinical practice. Besides POPC, we also investigate secondary endpoints that are of interest to the healthcare system, like the length of in-hospital stay and endpoints that are relevant to the subjective feelings of patients, like the quality of recovery.

In perspective, the suitability of the algorithm will be tested in a clinical intervention study. Therefore, a higher number of patients will be screened with the created model. High-risk patients receive a multimodal training and therapy program postoperatively to reduce the rate of POPC. This includes, among other things, non-invasive ventilation in the recovery room, physiotherapy, respiratory training, a nutrition plan to prevent malnutrition, fluid balancing to prevent overhydration, and special oral hygiene. All included patients will be again visited on the ward on days 1,3, and 7 and examined for signs of pulmonary complications. The aim is a reduction of pulmonary complications with measurable clinical benefit. Clinically measurable success parameters are a shorter hospital stay, a lower rate of unplanned intensive care admissions, and a higher quality of life.

Precise risk assessment using a machine-learning algorithm combined with targeted preventive and therapeutic measures for identified high-risk patients, therefore, has great potential to improve patient outcomes and could also help to reduce health care costs.

Dissemination plans

Trial results will be communicated via publication in international, peer-reviewed journals and at international congresses in the fields of anaesthesia and radiology. Positive as well as negative results will be published.

Protocol amendments

All important protocol modifications will be communicated to the necessary parties through the trial steering committee via direct contact or online meeting. Necessary changes in trial registries and ethics committee will be carried out as soon as possible.

Trial sponsor

Prof. Dr. Bettina Jungwirth

University Hospital Ulm , Albert-Einstein-Allee 23 , 89081 Ulm

mail: ains@uniklinik-ulm.de

Prof. Dr. Meinrad Beer

University Hospital Ulm, Albert-Einstein-Allee 23 , 89081 Ulm

mail: sekretariat.radiologie1@uniklinik-ulm.de

This is an investigator-initiated trial. The funding source had no role in the design of this study and will not have any role during its execution, analyses, interpretation of the data, or decision to submit results.

Trial status

  • Protocol version: 2
  • Issue Date: 21/07/2025
  • Protocol Amendment Number: 0
  • First day of recruitment: 25/04/2023
  • Included patients: 150
  • Approximate date of completed recruitment: 12/2025

Supporting information

S2 File. SPIRIT-AI checklist.

Recommended items to address in a protocol and related documents for clinical trials evaluating AI interventions.

https://doi.org/10.1371/journal.pone.0329076.s002

(PDF)

Acknowledgments

We acknowledge F. Scheffenbichler, B. Ulm and A. Podtschaske for their support and advice in the implementation of the study. We acknowledge K. Lukas-Jazwinski, S. Hoheisen, F. Branz, G. Frömmichen, P. Leibinger and P. S. Sam for data acquisition and H. Hillenhagen and T. Bader for the preliminary work on the machine learning model. We also want to thank the patients for their willingness to participate in this study.

References

  1. 1. Abbott TEF, Fowler AJ, Pelosi P, Gama AM, Moller AM, Canet J, et al. A systematic review and consensus definitions for standardised end-points in perioperative medicine: pulmonary complications. Br J Anaesth. 2018;120(4):705–11. pmid:29576111
  2. 2. Fernandez-Bustamante A, Frendl G, Sprung J, Kor DJ, Subramaniam B, Martinez Ruiz R, et al. Postoperative Pulmonary Complications, Early Mortality, and Hospital Stay Following Noncardiothoracic Surgery: A Multicenter Study by the Perioperative Research Network Investigators. JAMA Surg. 2017;152(2):157–66. pmid:27829093
  3. 3. Ghaferi AA, Birkmeyer JD, Dimick JB. Variation in hospital mortality associated with inpatient surgery. N Engl J Med. 2009;361(14):1368–75. pmid:19797283
  4. 4. Ball L, Pelosi P. Predictive scores for postoperative pulmonary complications: time to move towards clinical practice. Minerva Anestesiol. 2016;82(3):265–7. pmid:26344668
  5. 5. Nithiuthai J, Siriussawakul A, Junkai R, Horugsa N, Jarungjitaree S, Triyasunant N. Do ARISCAT scores help to predict the incidence of postoperative pulmonary complications in elderly patients after upper abdominal surgery? An observational study at a single university hospital. Perioper Med (Lond). 2021;10(1):43. pmid:34876228
  6. 6. Xue B, Li D, Lu C, King CR, Wildes T, Avidan MS, et al. Use of Machine Learning to Develop and Evaluate Models Using Preoperative and Intraoperative Data to Identify Risks of Postoperative Complications. JAMA Netw Open. 2021;4(3):e212240. pmid:33783520
  7. 7. Szabó M, Bozó A, Darvas K, Soós S, Őzse M, Iványi ZD. The role of ultrasonographic lung aeration score in the prediction of postoperative pulmonary complications: an observational study. BMC Anesthesiol. 2021;21(1):19. pmid:33446103
  8. 8. van Sloun RJG, Demi L. Localizing B-Lines in Lung Ultrasonography by Weakly Supervised Deep Learning, In-Vivo Results. IEEE J Biomed Health Inform. 2020;24(4):957–64. pmid:31425126
  9. 9. Brusasco C, Santori G, Tavazzi G, Via G, Robba C, Gargani L, et al. Second-order grey-scale texture analysis of pleural ultrasound images to differentiate acute respiratory distress syndrome and cardiogenic pulmonary edema. J Clin Monit Comput. 2022;36(1):131–40. pmid:33313979
  10. 10. Miskovic A, Lumb AB. Postoperative pulmonary complications. Br J Anaesth. 2017;118(3):317–34. https://doi.org/10.1093/bja/aex002 pmid:28186222
  11. 11. Ferreyra GP, Baussano I, Squadrone V, Richiardi L, Marchiaro G, Del Sorbo L, et al. Continuous positive airway pressure for treatment of respiratory complications after abdominal surgery: a systematic review and meta-analysis. Ann Surg. 2008;247(4):617–26. pmid:18362624
  12. 12. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015;162(1):55–63. pmid:25560714
  13. 13. Bouhemad B, Mongodi S, Via G, Rouquette I. Ultrasound for “lung monitoring” of ventilated patients. Anesthesiology. 2015;122(2):437–47. pmid:25501898
  14. 14. Tung-Chen Y, Ossaba-Vélez S, Acosta Velásquez KS, Parra-Gordo ML, Díez-Tascón A, Villén-Villegas T, et al. The Impact of Different Lung Ultrasound Protocols in the Assessment of Lung Lesions in COVID-19 Patients: Is There an Ideal Lung Ultrasound Protocol?. J Ultrasound. 2022;25(3):483–91. pmid:34855187
  15. 15. Kokotovic D, Degett TH, Ekeloef S, Burcharth J. The ARISCAT score is a promising model to predict postoperative pulmonary complications after major emergency abdominal surgery: an external validation in a Danish cohort. Eur J Trauma Emerg Surg. 2022;48(5):3863–7. pmid:35050387
  16. 16. Kiyatkin ME, Aasman B, Fazzari MJ, Rudolph MI, Vidal Melo MF, Eikermann M, et al. Development of an automated, general-purpose prediction tool for postoperative respiratory failure using machine learning: A retrospective cohort study. J Clin Anesth. 2023;90:111194. pmid:37422982
  17. 17. Andonov DI, Ulm B, Graessner M, Podtschaske A, Blobner M, Jungwirth B, et al. Impact of the Covid-19 pandemic on the performance of machine learning algorithms for predicting perioperative mortality. BMC Med Inform Decis Mak. 2023;23(1):67. pmid:37046259
  18. 18. Graeßner M, Jungwirth B, Frank E, Schaller SJ, Kochs E, Ulm K, et al. Enabling personalized perioperative risk prediction by using a machine-learning model based on preoperative data. Sci Rep. 2023;13(1):7128. pmid:37130884
  19. 19. Hanley JA, McNeil BJ. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology. 1983;148(3):839–43. pmid:6878708
  20. 20. Jammer I, Wickboldt N, Sander M, Smith A, Schultz MJ, Pelosi P, et al. Standards for definitions and use of outcome measures for clinical effectiveness research in perioperative medicine: European Perioperative Clinical Outcome (EPCO) definitions: a statement from the ESA-ESICM joint taskforce on perioperative outcome measures. Eur J Anaesthesiol. 2015;32(2):88–105. pmid:25058504
  21. 21. Borson S, Scanlan J, Brush M, Vitaliano P, Dokmak A. The mini-cog: a cognitive “vital signs” measure for dementia screening in multi-lingual elderly. Int J Geriatr Psychiatry. 2000;15(11):1021–7. pmid:11113982
  22. 22. Anetsberger A, Blobner M, Krautheim V, Umgelter K, Schmid S, Jungwirth B. Self-Reported, Structured Measures of Recovery to Detect Postoperative Morbidity. PLoS One. 2015;10(7):e0133871. pmid:26207620
  23. 23. Mazo V, Sabaté S, Canet J, Gallart L, de Abreu MG, Belda J, et al. Prospective external validation of a predictive score for postoperative pulmonary complications. Anesthesiology. 2014;121(2):219–31. pmid:24901240
  24. 24. Fage BA, Chan CC, Gill SS, Noel-Storr AH, Herrmann N, Smailagic N, et al. Mini-Cog for the detection of dementia within a community setting. Cochrane Database Syst Rev. 2021;7(7):CD010860. pmid:34259337
  25. 25. Trautwein B. Preventing postoperative pulmonary complications after general anaesthesia in adult surgical patients – an interim analysis. Euroanaesthesia Congress 2025.
  26. 26. Shander A, Fleisher LA, Barie PS, Bigatello LM, Sladen RN, Watson CB. Clinical and economic burden of postoperative pulmonary complications: patient safety summit on definition, risk-reducing interventions, and preventive strategies. Crit Care Med. 2011;39(9):2163–72. pmid:21572323
  27. 27. Lawrence VA, Hilsenbeck SG, Mulrow CD, Dhanda R, Sapp J, Page CP. Incidence and hospital stay for cardiac and pulmonary complications after abdominal surgery. J Gen Intern Med. 1995;10(12):671–8. pmid:8770719
  28. 28. Lawrence VA, Hilsenbeck SG, Noveck H, Poses RM, Carson JL. Medical complications and outcomes after hip fracture repair. Arch Intern Med. 2002;162(18):2053–7. pmid:12374513
  29. 29. McAlister FA, Bertsch K, Man J, Bradley J, Jacka M. Incidence of and risk factors for pulmonary complications after nonthoracic surgery. Am J Respir Crit Care Med. 2005;171(5):514–7. pmid:15563632
  30. 30. Demi L, Mento F, Di Sabatino A, Fiengo A, Sabatini U, Macioce VN. Lung Ultrasound in COVID-19 and Post-COVID-19 Patients, an Evidence-Based Approach. J Ultrasound Med. 2022;41(9):2203–15. https://doi.org/10.1002/jum.15902 pmid:34859905
  31. 31. Dave C, Wu D, Tschirhart J, Smith D, VanBerlo B, Deglint J, et al. Prospective Real-Time Validation of a Lung Ultrasound Deep Learning Model in the ICU. Crit Care Med. 2023;51(2):301–9. pmid:36661454
  32. 32. Li P, Gao S, Wang Y, Zhou R, Chen G, Li W, et al. Utilising intraoperative respiratory dynamic features for developing and validating an explainable machine learning model for postoperative pulmonary complications. Br J Anaesth. 2024;132(6):1315–26. pmid:38637267