Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

The Safety INdEx of Prehospital On Scene Triage (SINEPOST) study: The development and validation of a risk prediction model to support ambulance clinical transport decisions on-scene

  • Jamie Miles ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Centre for Urgent and Emergency Care, School of Health and Related Research, The University of Sheffield, Sheffield, United Kingdom

  • Richard Jacques,

    Roles Formal analysis, Methodology, Supervision, Writing – review & editing

    Affiliation Design, Trials and Statistics, School of Health and Related Research, The University of Sheffield, Sheffield, United Kingdom

  • Richard Campbell,

    Roles Data curation, Project administration, Writing – review & editing

    Affiliation Centre for Urgent and Emergency Care, School of Health and Related Research, The University of Sheffield, Sheffield, United Kingdom

  • Janette Turner,

    Roles Conceptualization, Supervision, Writing – review & editing

    Affiliation Centre for Urgent and Emergency Care, School of Health and Related Research, The University of Sheffield, Sheffield, United Kingdom

  • Suzanne Mason

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    Affiliation Centre for Urgent and Emergency Care, School of Health and Related Research, The University of Sheffield, Sheffield, United Kingdom


One of the main problems currently facing the delivery of safe and effective emergency care is excess demand, which causes congestion at different time points in a patient’s journey. The modern case-mix of prehospital patients is broad and complex, diverging from the traditional ‘time critical accident and emergency’ patients. It now includes many low-acuity patients and those with social care and mental health needs. In the ambulance service, transport decisions are the hardest to make and paramedics decide to take more patients to the ED than would have a clinical benefit. As such, this study asked the following research questions: In adult patients attending the ED by ambulance, can prehospital information predict an avoidable attendance? What is the simulated transportability of the model derived from the primary outcome? A linked dataset of 101,522 ambulance service and ED ambulance incidents linked to their respective ED care record from the whole of Yorkshire between 1st July 2019 and 29th February 2020 was used as the sample for this study. A machine learning method known as XGBoost was applied to the data in a novel way called Internal-External Cross Validation (IECV) to build the model. The results showed great discrimination with a C-statistic of 0.81 (95%CI 0.79–0.83) and excellent calibration with an O:E ratio was 0.995 (95% CI 0.97–1.03), with the most important variables being a patient’s mobility, their physiological observations and clinical impression with psychiatric problems, allergic reactions, cardiac chest pain, head injury, non-traumatic back pain, and minor cuts and bruising being the most important. This study has successfully developed a decision-support model that can be transformed into a tool that could help paramedics make better transport decisions on scene, known as the SINEPOST model. It is accurate, and spatially validated across multiple geographies including rural, urban, and coastal. It is a fair algorithm that does not discriminate new patients based on their age, gender, ethnicity, or decile of deprivation. It can be embedded into an electronic Patient Care Record system and automatically calculate the probability that a patient will have an avoidable attendance at the ED, if they were transported. This manuscript complies with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement (Moons KGM, 2015).


In the emergency care system, pressure is rising amidst the growing quantity of patients accessing front door services such as the ambulance service, Emergency Department (ED) and General Practice (GP). This demand is rising at around 5% per annum [2, 3]. For the ambulance service, this means that patients who are transported to hospital may be held in a queue of other ambulances waiting to hand their patients over. In 2019/2020 in England alone there were 137,009 delays in ambulance handover of between 30 and 60 minutes [4]. When these delays occur and ambulances are queueing, it has the potential to cause harm to those in the queue. A recent report from the Association of Ambulance Chief Executives (AACE) found that 80% of ambulance patients that queued for more than an hour experienced some level of harm [5]. Studies have been more specific in identifying harm that has occurred with certain diseases. It has been shown that delayed handover in patients with non-traumatic chest pain is associated with a greater risk of 30-day mortality [6]. There are also potential consequences for prehospital patients still waiting to be assessed in the community.

The case mix of these patients is not always life-threatening emergencies. Previous reports have demonstrated that the majority of prehospital patients have no immediate life-threatening care need and their actual need could be managed in the community [7, 8]. However, some of these patients are still transported to the ED and this can lead to an avoidable ED attendance.

When paramedics make decisions on-scene to transport a patient to hospital, it is often the most complex decision they make [9]. As such, the decision is not always accurate. Studies have found that there are between 9 and 32% of ambulance transports to ED that could have been avoided [7, 1012]. It is recognised that in some systems transport decisions are not clinician-made and patient-centred, but financially driven through payment policies [13, 14]. However, these policies are beginning to adapt to the modern case-mix of patients and as such, the adoption of a transport decision support tool would be of high benefit and importance.

Existing transport decision support tools that are in practice have all been designed not to miss a higher acuity patient, which has led to significant over-triage of patient acuity. They have also failed to demonstrate significant benefit over clinician decision making. A vignette-based survey by Miles et al. found that conveyance decisions had a sensitivity of 0.89 (95% CI 0.86–0.92) and a specificity of 0.51 (95% CI 0.46–0.56) [15]. This is comparable to existing decision support tools such as the paramedic pathfinder [16, 17]. A systematic review into whether machine learning computerised decision support could offer an improvement on triage found that certain methods such as decision trees, neural networks and logistic regression all were able to provide accurate discrimination between different acuity levels. A limitation of the included studies was that they were often predicting high acuity [18].

If current clinical judgement is already sensitive to identifying high-acuity patients, the benefit of a decision support tool is on triaging the mid- and low-acuity. If accuracy is improved at this level of triage, the benefit would be a reduction in the avoidable transportation of patients to an ED.


Primary research question.

In adult patients attending the ED by ambulance, can prehospital information predict an avoidable attendance?

Primary objectives.

  1. 1) Extract prehospital variables from ambulance service electronic patient care records
  2. 2) Link the data with ED electronic patient care records
  3. 3) Identify low acuity patients in the dataset using the ED information
  4. 4) Build a predictive model using prehospital variables
  5. 5) Measure the success of the model in predicting an avoidable attendance using prehospital variables.

Secondary research questions.

What is the simulated transportability of the model derived from the primary outcome?

Secondary objectives.

  1. 6) Test spatial validation
  2. 7) Test model discrimination of protected characteristics


Source of data

This retrospective cohort study analysed a sample of ambulance service patients transported to the ED between the 1st July 2019 and the 29th February 2020. Each episode had an ambulance electronic Patient Care Record (ePCR) created which contained all demographic and clinical information. The Yorkshire Ambulance Service (YAS) provided this data. The outcome was generated using two ED-based data products from NHS Digital, which were then subsequently linked to the ambulance data. The two products were the Hospital Episode Statistics Accident and Emergency (HES A&E) and the Emergency Care Data Set (ECDS).


In this study, all patients who were over the age of 18 that had a face-to-face paramedic contact from Yorkshire Ambulance Service (YAS) with a completed ePCR were eligible for inclusion. For the development of the prediction model, each transported instance was linked to its respective ED record. A total of 17 EDs were included in the study and a full list of these can be found in S2 Appendix. The patients were not selected by any specific demographic or disease. This was to ensure the model could be applied to all patients. Children were excluded from the model as they are a cohort who are confounded by ambulance service policy.


The outcome is an avoidable conveyance attendance at the ED, which is an experienced based definition initially described by O’Keeffe et al. as “first attendance with some recorded treatments or investigations all of which may have reasonably been provided in a non-emergency care setting, followed by discharge home or to GP care” [12]. This was operationalised into a data-driven definition and can be found in the protocol publication [18].


All candidate variables were measured whilst the ambulance crew was with the patient prospectively. Data was retrieved after the data collection period, and no ambulance crew was aware of the study during data collection. Variables can be broadly categorised into demographic, clinical, social, and interventional. S7 Appendix displays all candidate variables, example values, justification for inclusion and assigned parameters within each variable. The only demographic variable included was incident location as a categorical variable. Age was initially included however, after initial model building it was found to introduce a bias and was removed. Incident location is user inputted by the ambulance crew depending on whether the patient is at a domestic address, public place, care home, work or other. Clinical variables formed most of the candidate variables. When a paramedic arrives on scene, they will first undertake a primary survey. This records whether the patient has a catastrophic haemorrhage, if their airway is clear, if they are breathing normally, or if there are any obvious circulation issues. These are all recorded as categorical variables. The patient will then have physiological variables recorded to assess how serious their medical complaint may be. Pulse rate is measured in beats per minute (bpm) and is the frequency at which the heart beats in a minute. Traditionally this is measured by palpation of the pulse, however technology allows this to be measured using medical equipment. Respiratory rate is measured as respirations per minute (rpm) and is a manual count of the number of breaths the patient takes in one minute. Temperature is a continuous variable measured in °C using a tympanic thermometer. The peripheral capillary oxygen saturation in the blood (SpO2) is measured using medical equipment as a percentage. Blood sugar levels are also recorded using a machine that takes a small blood sample. The results are recorded as mmol per litre. Blood pressure is recorded using millimetres of mercury (mmHg). Two measurements are recorded, the systolic blood pressure and the diastolic blood pressure. The level of consciousness is calculated using a four-scale system (AVPU) in the primary survey and the Glasgow Coma Scale (GCS) in the physiological observations. GCS is a composite score of labelled scales. The minimum score is three and maximum fifteen [19]. Baseline oxygen demands, and current oxygen demands are recorded as binary variables. All the physiological variables are combined to calculate a National Early Warning 2 score (NEWS2) [20, 21]. The NEWS2 score has been included as a candidate predictor and treated as categorical. The NEWS2 assigns points between 0 and 3 to physiological variables depending on how deranged the values are. The minimum NEWS2 score is 0, and the maximum is 20 [21]. Clinical variables include pain scores out of ten, subsequent measurements of observations and feature engineered intervals between primary measurements and subsequent ones. All sixteen clinical interventions (e.g., cannulation, intubation etc.) were included as binary variables. The patient’s mobility was recorded depending on what resource they required, i.e., self-mobile, stretcher needed, carry chair needed etc. This variable was how the patient was able to move to and from the ambulance and was a categorical variable. Clinical impression was also included as a categorical variable with 99 different values to possibly select. Examples include ‘head injury’, ‘shortness of breath’, and ‘abdominal pain’. Social variables were included as binary variables. These were included as surrogates to determine the level of external support the patient has. These include variables such as GP details recorded, social worker recorded etc. It also included referral variables if the patient was referred to a service such as falls, safeguarding or diabetes clinic etc.

Sample size

The sample size was calculated using the ‘pmsampsize v1.1.0’ for R v3.6.1 for windows [22]. Two studies by Riley et al. also informed the sample size calculation [23, 24]. Previous studies have found a conservative estimate of the outcome prevalence to be 0.085 [12]. A meta-analysis found that the average C-statistic was 0.8 [25]. A preliminary analysis of a separate dataset found that there was potentially 637 parameters in the ambulance service dataset. This gave an estimated sample size of 55,676 with an anticipated 4733 event and an events per parameter (EPP) of 7.43.

Missing data

The strategy for handling missing data was to first elicit if missing values in each variable were the negative class. For example, the clinical procedure of intravenous cannulation is only recorded in the ePCR if the patient was cannulated. Therefore, it is logical, in the absence of a positive recording to assume the patient was not cannulated and the missing data can be transformed into the negative class. Once this has been completed, any variable with more than 30% missing data was excluded from the analysis. The rationale for this is that it may not be routinely, or accurately completed in the ePCR and to include them could lead to model failure in practice.

Statistical analysis methods

The full statistical analysis plan has been published in the study protocol [18]. In this study, an XGBoost algorithm was used for model development. Recursive feature elimination was used to subset the candidate variables into only the most important that provided the most accurate prediction model. Then the algorithms hyperparameters were tuned to prevent model overfitting. The model was first evaluated for its calibration using Spiegelhalter’s Z-test. Then, model discrimination was assessed using the C-statistic (area under the ROC curve). The optimal threshold was identified by finding the closest top left point of the ROC curve. This was then used to assess accuracy statistics. Once the full model was completely developed, symmetrical procedures were undertaken using different Emergency Departments as held-out test sets with all remaining data as the training data. This in effect created a full model and seventeen other models which could then be meta-analysed. The summary statistics generated in a random effects meta-analysis were then used to update the final model for its performance. In the protocol paper, the full procedures are outlined in detail [18]. This study is a development study with internal-external validation using a meta-analysis of ED clusters. There is no external validation.

Data linkage and dataset creation.

YAS identified and extracted all eligible ePCR records from its information system between 1st July 2019 to 29th February 2020. These dates were bound by two time points. The 1st of July 2019 was when YAS launched the regional role out of the ePCR. The 29th of February 2020 was the last date possible, before the COVID-19 pandemic would confound the sample. This extract was partitioned into two datasets: one that included identifiable fields but no clinical fields, which was transferred to NHS Digital; and a second composed of the same records with clinical data (directly identifiable fields removed), that was transferred to the University of Sheffield project team. Both datasets contained a common identifier field to enable linkage. NHS Digital attempted to trace patients’ identities based on the combinations of identifiers they received from YAS. Records for the cohort successfully traced by NHS Digital were extracted from the requested datasets (HES A&E and ECDS) and sent to the project team. Previous data linkage methodology with NHS Digital used an eight stage hierarchical probabilistic matching algorithm [26]. However, the ECDS data product could only be linked using the unique identifier of NHS number, which renders the linkage process to be largely deterministic. As a result, all patient records sent to NHS Digital with an NHS number were successfully linked, whereas those without an NHS number were not. This resulted in 195078 (66%) of the total cohort excluded from the analysis. A comparison of the successfully linked cohort and the unlinked cohort revealed no fundamental differences.

YAS and NHS Digital both removed records that belonged to patients who had registered an NHS national data opt-out. Duplicate records were also removed from both datasets to ensure that a single person’s records did not appear more than once.

All three received datasets (YAS ePCR, HES A&E, and ECDS) were linked using a consistent patient-level identifier. For ambulance incidents linked to HES A&E attendances only the earliest A&E attendance record with a datetime after the latest (by datetime) ambulance incident datetime and no more than 6 hours later were retained. This ensured that the link between an ambulance incident and HES A&E record remained one-to-one.

To link HES A&E data to ECDS data (and therefore ECDS to ambulance incidents), it was chosen to link via the common identifier—arrival time pairs. If there were multiple records from this linkage, the "most complete" record was chosen. The most complete was determined by the presence of fields that are used to calculate if an attendance is of low acuity. A graphical representation of data flow and linkage can be found in S1 Appendix.

Ethics statement.

This study underwent extensive ethical review. It was first reviewed and approved by the South Yorkshire NHS Research Ethics Committee (REC) on the 20th December 2019. It was also reviewed and approved by the NHS Confidentiality Advisory Group (CAG) on the 14th July 2020. During the data sharing agreement stage, it was further reviewed and approved by the NHS Digital Independent Group Advising on the Release of Data (IGARD) team on the 15th Feb 2021. This study used patient data without written or verbal patient consent as it was not feasible to achieve this with the large volume of retrospective data. To mitigate this, the patient identifiers were first screened against the NHS National data opt out. This removed all patient episodes where the patient had previously stated they did not want their data used for the purposes of research. To further mitigate this, privacy notices were shared on both the Yorkshire Ambulance Service NHS Trust and the University of Sheffield websites. These contained contact details to remove participants from the study, prior to pseudonymisation.



There were 101,522 individual patient episodes included in the analysis. Of these, 7228 (7.12%) were defined as having an avoidable ambulance conveyance to the ED. Table 1 provides key demographic information between those with, and without the outcome. It also shows physiological observations as a surrogate for comparative patient acuity. In the supplementary material, the table is extended to show the clinical impression fields.

Model development

Dataset preparation.

During the preparation of the dataset there were 215 possible candidate variables for inclusion which comprised of 190 categorical variables (including 169 binary variables), and 25 continuous variables. After one hot encoding there were 452 candidate predictors in the final dataset. During recursive feature elimination, the ideal set of variables was found to be only 90 of the total candidate variables. These condensed down into 19 variables, comprising of 14 clinical variables, 3 interventional and 2 demographics. A full list of included candidate variables can be found in S3 Appendix

Model performance.

In an XGBoost algorithm, the hyperparameters that control how the model is built prevents the model overfitting the training data. Therefore the apparent validity can be perceived as less optimistic from the outset [27]. Table 2 is a brief summary of the performance measures being used to evaluate the model.


Calibration was assessed using Spiegelhalter’s Z-test and calculated using the Rmisc package v1.5 [28]. The interpretation of this Z-test is such that a statistically significant test result means the model is miscalibrated as the null hypothesis is a well calibrated model. The initial model was miscalibrated with a Spiegelhalter’s Z-test of -3.668 (p = 0.001). Therefore, the weighting of the positive class was tuned to two decimal places to yield the smallest Z-test with no statistical significance. The optimum value for scale_pos_weight was 0.95 which gave a Spiegelhalter’s Z-test of 0.111 (p = 0.912). The ratio of the observed and expected (O:E) was 1.042 (95% CI 1.02–1.07). The full calibration plot with intercept and slope can be found in Fig 1.


The C-statistic for the full model was 0.82 (95% CI 0.815–0.824). The optimum cut point was 0.121, which gave a specificity of 0.87 and a sensitivity of 0.54. The ROC curve with different thresholds including the optimal threshold (marked with a star) can be found in Fig 2. The threshold was chosen as the ’closest top left’ point mathematically. Experiments were performed by maximising specificity, but the model was unstable, and the sensitivity decreased by such a significant amount that it would miss-classify far more often than it would classify.

Using the optimal cut point, the full model had an accuracy of 0.85 (95% CI 0.847–0.852). The model had a preference towards specificity as it was predicting health and not disease. The positive predictive value (PPV) was 0.25 (95% CI 0.24–0.25) and the negative predictive value was 0.96 (95% CI 0.96–0.963).

Model updating.

The meta-analysis was undertaken using the framework by Debray et al. and used the metamisc package v0.2.5 [29, 30]. In the meta-analysis of clusters, the C-statistic was found to be 0.81 (95%CI 0.79–0.83). The prediction interval was between 0.73 and 0.87. Fig 3 shows the forest plot of C-statistic results for each cluster. The hyperparameters of each model can be found in S4 Appendix. The meta-analysed O:E ratio was 0.995 (95% CI 0.97–1.03) with a prediction interval between 0.93 and 1.06. In S5 Appendix, there are calibration plots and ROC curves for each model developed.

Fair machine learning analysis.

In the analysis of fair machine learning, each demographic was assessed on two criteria. The first was comparing the probability distribution of each category within the variable and the second was examining how many were misclassified in each category. If age is left in as a candidate variable, the model becomes more accurate but introduces a bias towards younger patients. When excluded, the model slightly decreases in performance but removes the bias. There were no significant differences in the mean probabilities, distributions, or misclassification for any of the demographic variables assessed. This included ethnicity, gender, and social deprivation. More information can be found in S6 Appendix.

Misclassification analysis.

There were 3880 (3.8%) true positive predictions where the model correctly identified an avoidable ambulance conveyance and 82,340 (81.1%) true negatives where it identified an unavoidable conveyance. There were 11,954 (11.8%) false positives and 3348 (3.3%) false negatives. This gave a misclassification rate of (0.151).

Variable importance.

Variable importance can be broken down into three features—frequency (weight), coverage and gain. Frequency represents how many times a particular feature appears in the trees of the full model as a percentage of all the frequencies. Coverage is the number of instances that are contained within a feature when it is used as a split. Gain is the relative contribution of each feature to the whole model. Figs 46 show the frequency, coverage and gain for the model.

Fig 4. Top 20 variables used in the full model by frequency.

Fig 5. Top 20 variables with the greatest number of instances when splitting (cover).

Fig 6. Top 20 variables with the highest relative contribution to the full model (gain).


This study used a large sample of conveyed ambulance patients linked to their ED record to derive a clinical decision support model. Two different systematic reviews concluded that the most effective clinical decision support should be computer-based, providing support as part of the natural workflow, offering practical advice and being available at the time of decision making. Computerised Clinical Decision Support (CCDS) in the prehospital system increasingly plays an important role in delivering efficient care that can meet the needs of its users. In an environment where information is difficult to obtain but decisions are crucial and time limited, CCDS tools appear to offer a potential solution. In a Department for Health and Social Care review of operational productivity of ambulance services in England, the first recommendation for future contracting was for ambulance services to have ‘technology, processes and systems in place to support clinical decision making’ [31].

Computerised decision support is relatively novel to clinicians on scene. This is owing to the requirement of electronic patient care records. In Yorkshire Ambulance Service, ePCRs were only fully launched in July 2019, and this formed a barrier to data availability. However, evidence is mounting about the benefits of on-scene CCDS, and the results in this study could have the greatest benefit if a prospective tool is used on scene with the patient.

One of the more neoteric advancements of on scene CCDS is predicting end diagnosis to expedite specialist care or to instigate earlier treatment. As an example, The Japanese Urgent Stroke Triage Score using Machine Learning (JUST-ML) predicted a major neurological event such as a large vessel occlusion, subarachnoid haemorrhage, intracranial haemorrhage or cerebral infarction better than any other available model [32]. The benefit of predicting a major neurological event in the pre-hospital phase of care is that it can steer transport destination decisions to ensure the right patients go to a stroke unit for specialist care. Predicting a downstream outcome has been seen in many clinical conditions including Acute Coronary Syndrome (ACS) and major trauma [3335]. The results of this study cannot extend to predicting an end diagnosis, however, they support the idea of modifying a care plan according to the outcome of a CCDS tool. The model has demonstrated it can predict avoidable ambulance conveyances and contributes evidence that computerised decision support can not only predict a high acuity outcome, but also low.

In the SAFER1 trial, the computerised decision support tool was embedded into the ePCR [36]. In the qualitative evaluation, it was found that the paramedics who had access to the tool were twice more likely to refer patients to a falls service than those without. However, the paramedics only applied the tool in 12% of eligible patients. One of the barriers to implementation identified in the qualitative element to the study included the labour involved in accessing and using the tool. This resonates with the work of Kawamoto et al [37]. In their systematic review, they were aiming to identify key features of success in the implementation of clinical decision support systems. The most important feature was automation and ensuring that the effort on the end user was minimised. The reason that machine learning algorithms were considered for developing the SINEPOST model was their potential accuracy and ability to be embedded in an electronic healthcare system. Whilst the Occam’s razor approach of making the model as simple as possible was the intended philosophy of the SINEPOST model, machine learning algorithms can be complicated, if needed, and still provide automated prediction.

Decision support systems that are already in place for triaging patients include the paramedic pathfinder and the Manchester Triage System (MTS) [16, 17, 38]. The outcomes of these tools are different, and so it would be inappropriate to compare performance between them. The intended use of these tools was to risk stratify patients to support non conveyance decisions.

This study could have adopted a different strategy, taking a non-conveyed sample and a conveyed sample to create a prediction model predicting non-conveyance. However, the gold standard used would be paramedic decision making, and therefore the model would only be as good as what is already out there. This is a limitation in both the paramedic pathfinder and the MTS. The strength in this study was taking information that the ambulance crew would not know and predicting that information for them to use whilst they were on scene. The results of this study have demonstrated that using the prehospital variables, it is entirely possible to predict the experience they may have if they were transported to ED. This brings with it a benefit to paramedic decision making. One Canadian study demonstrated it was feasible to use a computer algorithm to redirect nonemergent patients away from the ED towards sub-acute centres such as walk-in centres. This had both system and patient benefits (such as satisfaction) [39].

In the study by Miles et al. they explored paramedic decision making using a mixed methodology [15]. In the qualitative part, it was found that paramedics either framed a decision around the scene, or the ED. When they framed the decision around the scene, their language would often be why it is not safe to be left at home, or that the patient requires a GP appointment (for example). When it was framed around the ED, the justifications would be anchored to the patient either receiving a certain benefit from attending, or that the ED would probably not find anything abnormal [15]. The findings from this study have the opportunity to support those who use the ED to frame their decisions. By knowing what the predicted probability is, it provides new information to them that would not have been available for decision making. However, perhaps the largest benefit to transport decision making on scene from this study is the revealing of clinically important variables that should be accounted for in making such a decision.

Feature importance

A unique and novel finding from this study was the identification of six clinical impressions that were important in predicting avoidable conveyances. There were six clinical impressions that featured in the top twenty. The most important was patients presenting with psychiatric problems. This could be a reflection on the experience of mental health presentations at the ED. They rarely require investigations or treatments that physical health presentations may require. The main purpose of the ED for these patients is to offer a place of safety and access to a mental health practitioner who can better meet their care need. Other clinical impressions were allergic reactions, cardiac chest pain, head injury, non-traumatic back pain, minor cuts, and bruising. These have been previously identified in observational studies as being associated with a non-urgent ambulance conveyance [10, 40]. All physiological observations appeared in the top twenty, however the NEWS2 score did not. Only three NEWS2 scores were included in the full model. A NEWS2 score of 0 appeared as the 31st variable, a score of 1 as the 58th, a score of 2 as the 78th and a score of 5 as the 93rd. This may mean that low NEWS2 scores are not strong predictors of an avoidable conveyance attendance. This is an interesting finding, as the decision tree should have associated higher NEWS2 scores with information gain of ruling out an avoidable conveyance. Conversely, it omitted most NEWS2 scores during recursive feature elimination. In the full model and all the clusters, the frequency, cover and gain did not change rank order, which shows the stability of their importance. When predicting high acuity, it is often easier to find significant variables as physiological observations such as pulse rate and respiration rate will change when patients are acutely unwell. However, when predicting avoidable conveyances, physiological observations will often be normal. Interestingly, there were clinical variables more important than physiological observations, which have featured in other triage models as main candidate predictors [41, 42]. In the development of decision tree models, splits are made based on the information gained. This can be either gain in deciding what an avoidable conveyance patient is or gain in deciding what an avoidable conveyance patient is not. As such, variables associated with higher acuity appear high in variable importance as they rule out necessary attendances. The algorithm has identified signals of higher acuity patients with high prevalence of completion within the ePCR. For example, delivering advanced life support to someone in cardiac arrest does not often happen in the overall case-mix of ambulance patients. Therefore, the skills and procedures associated with undertaking ALS were rarely captured and were not identified as important. However, far more patients had the clinical procedure of intravenous cannulation or monitored by ECG, and it appeared as the fifth and eighth most important variables. This theory can be extended to the patient’s mobility. In the model, a patient’s mobility status is important, as being stretcher bound, self-mobile or needing a carry chair all featured in the top twenty.

Model performance

The model was well calibrated with a meta-analysed O:E ratio of 0.99 (95% CI 0.96–1.02). This means that the model is making accurate predictions across all values. The model is also successful in distinguishing between an avoidable ambulance conveyance and one that needed transport to hospital with a C-statistic of 0.81 (95% CI 0.79–0.83). The optimal threshold for classification was 0.125 which appears low, but so is the proportion of avoidable ambulance conveyances and this reflects the class imbalance. The model provided many false negatives with a sensitivity of 0.58, meaning that 42% of patients who were classified as needing ED care were avoidable conveyances. The choice of threshold is a point of discussion. It could be adjusted to a higher or lower value, but this would influence the sensitivity and specificity. To illustrate, the ROC curve in Fig 2 shows the thresholds above 0.2 have limited effect on the specificity but a large effect on sensitivity. If the threshold was changed to 0.2 for example, the sensitivity drops dramatically to 0.28. The optimum threshold was chosen to be the highest specificity with the highest sensitivity. Also known as a balanced approach. It was also possible to take the Youden index, which would place the threshold at the nearest point to the top left corner, but this placed too much of a penalty on specificity to create a functioning tool.

The meta-analysis of clusters revealed that there were no significant performance differences between test sets in urban areas, rural areas, or coastal areas. There were significant differences in the calibration slopes as seen in S7 Appendix; however, this was at the latter part of the plots where predicted outcome was rare. They all produced O:E ratios that were acceptable except for two smaller test sets (Dewsbury hospital and James Cook University Hospital) who had significant under-triage. Sheffield, Leeds, York, and Hull are all large teaching hospitals, and as illustrated by Fig 3, there were no significant differences between these. Furthermore, there was no significant difference between the large teaching hospitals and the smaller district general hospitals.

There was only a prevalence of 7% for avoidable conveyance attendances in the study sample. This is fewer cases than the literature had previously reported (9–13%) [7, 1012]. This may appear low; however, the quantity of high acuity patients is similar, indicating that to predict avoidable conveyance and high-acuity would be predicting the two tails of a normal distribution. Future studies should examine the mid-acuity patients and begin to unpick differences between these patients to improve on the outcome definition of a patient who is unlikely to gain a clinical benefit from being transported to a higher-acuity clinical setting than community care.


This study has its limitations. It was a retrospective, observational study using routine data. A strength of using routine data is the ability to use large volumes of patient episodes, which can produce accurate models. A limitation, however, is that it is not feasible to tailor data collection to the project. It is only possible to use what is routinely collected, which unfortunately relinquishes any control over missing data. Another limitation is the computational expense of selecting an algorithm with many hyperparameters. It would take a significant amount of time to be able to scan all combinations of hyperparameters through a grid search every time a model was developed. As such, the grid was restricted. The anticipated impact of the restricted grid search is expected to be minimal as the differences in AUC performance (the evaluation metric of choice) had a narrow interval of between 0.7 and 0.85. The validation does not benefit from true external validation, and it would be a sensible conclusion to revisit the definition of an avoidable ambulance conveyance, or indeed the taxonomy of how prehospital care systems classify their patients based on their need before further validation of the SINEPOST model.


This study can conclude that it is possible, with good accuracy to predict an avoidable ambulance conveyance to the ED using prehospital clinical data. The XGBoost model developed here, known as the SINEPOST model, can discriminate between those with non-urgent needs and those without. It can also accurately provide what the probability of an avoidable conveyance is. The model does not bias different ages, ethnicities, genders, or Indices of Deprivation. It is robust to all different prehospital settings. If this Fig was applied to national level data in England, the predictive model could support 85,560 conveyance decisions per month to change to non-conveyance. This is based on the latest NHS England Ambulance Quality Indicators which identified 372,002 ambulance transports to the ED in November 2021 [43]. However, to maximise its potential if it was to be transformed into a computerised clinical decision support tool; there needs to be a more robust definition of what an avoidable conveyance should be. It is recommended to revise the taxonomy of prehospital patients according to the care setting they need, as opposed to the paradigm of describing patient acuity. This has shown success in Canada already, with a computer algorithm demonstrating it is possible to redirect nonemergent patients away from the ED towards sub-acute centres such as walk-in centres. This had both system and patient benefits (such as patient satisfaction) [39].

It would also be beneficial to undertake studies into the risk tolerance of policy makers, ambulance services and the public when it comes to transporting low- or mid-acuity patients to the ED.

Supporting information

S2 Appendix. List of Emergency Departments included in this study.


S5 Appendix. ROC and calibration curves for the IECV models.


S7 Appendix. All candidate variables from the ePCR dataset.



  1. 1. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): Explanation and elaboration. Ann Intern Med. 2015;162: W1–W73. pmid:25560730
  2. 2. National Audit Office. NHS Ambulance services. 2017.
  3. 3. Coster JE, Turner JK, Bradbury D, Cantrell A. Why Do People Choose Emergency and Urgent Care Services? A Rapid Review Utilizing a Systematic Literature Search and Narrative Synthesis. Academic Emergency Medicine. 2017. pp. 1137–1149. pmid:28493626
  4. 4. NHS England [online]. Statistics » Urgent and Emergency Care Daily Situation Reports. [cited 15 Feb 2021]. Available:
  5. 5. Association of Ambulance Chief Executives. Delayed hospital handovers: Impact assessment of patient harm. London; 2021.
  6. 6. Dawson LP, Andrew E, Stephenson M, Nehme Z, Bloom J, Cox S, et al. The influence of ambulance offload time on 30-day risks of death and re-presentation for patients with chest pain. MJA. 2022;217. pmid:35738570
  7. 7. Andrew E, Nehme Z, Cameron P, Smith K. Drivers of Increasing Emergency Ambulance Demand. Prehospital Emerg Care. 2020;24: 385. pmid:31237460
  8. 8. O’Cathain A, Knowles E, Long J, Connell J, Bishop-Edwards L, Simpson R, et al. Drivers of ‘clinically unnecessary’ use of emergency and urgent care: the DEUCE mixed-methods study. Heal Serv Deliv Res. 2020;8: 1–256.
  9. 9. O’Hara R, Johnson M, Hirst E, Weyman A, Shaw D, Mortimer P, et al. A qualitative study of decision-making and safety in ambulance service transitions. Heal Serv Deliv Res. 2014;2: 1–138.
  10. 10. Miles J. 59 Ambulance over-conveyance to the emergency department: a large data analysis of ambulance journeys. BMJ Open. 2018.
  11. 11. Patton GG, Thakore S. Reducing inappropriate emergency department attendances—A review of ambulance service attendances at a regional teaching hospital in Scotland. Emerg Med J. 2013;30: 459–461. pmid:22802457
  12. 12. O’Keeffe C, Mason S, Jacques R, Nicholl J. Characterising non-urgent users of the emergency department (ED): A retrospective analysis of routine ED data. PLoS One. 2018;13: 1–14. pmid:29474392
  13. 13. Munjal K, Carr B. Realigning reimbursement policy and financial incentives to support patient-centered out-of-hospital care. JAMA. 2013;309: 667–668. pmid:23423411
  14. 14. Morganti KG, Alpert A, Margolis G, Wasserman J, Kellermann AL. Should payment policy be changed to allow a wider range of EMS transport options? Ann Emerg Med. 2014;63: 615–626.e5. pmid:24209960
  15. 15. Miles J, Coster J, Jacques R. Using vignettes to assess the accuracy and rationale of paramedic decisions on conveyance to the emergency department. Br Paramed J. 2019;4: 6–13. pmid:33328823
  16. 16. Newton M, Tunn E, Moses I, Ratcliffe D, MacKway-Jones K. Clinical navigation for beginners: The clinical utility and safety of the Paramedic Pathfinder. Emerg Med J. 2013;31: e29–e34. pmid:24099831
  17. 17. North West Ambulance Service. Paramedic Pathfinder and Community Care Pathways. 2014; 52. Available:
  18. 18. Miles J, Jacques R, Turner J, Mason S. The Safety INdEx of Prehospital On Scene Triage (SINEPOST) study: the development and validation of a risk prediction model to support ambulance clinical transport decisions on-scene—a protocol. Diagnostic Progn Res. 2021;5: 18. pmid:34749832
  19. 19. Allan D. Glasgow coma scale. Nurs Mirror. 1984;158: 32–34.
  20. 20. RCP London. Royal College of Physicians. “National early warning score (NEWS) 2.” 2012. Available:
  21. 21. Royal College of Physicians. National Early Warning Score (NEWS) 2: Standardising the assessment of acute-illness severity in the NHS. Updated report of a working party. 2017.
  22. 22. Ensor J, Martin EC, Riley RD. Package “pmsampsize”: Calculates the Minimum Sample Size Required for Developing a Multivariable Prediction Model. 2020.
  23. 23. Riley RD, Snell KI, Ensor J, Burke DL, Jr FEH, Moons KG, et al. Minimum sample size for developing a multivariable prediction model: PART II—binary and time-to-event outcomes. Stat Med. 2019;38: 1276–1296. pmid:30357870
  24. 24. Riley RD, Debray TPA, Collins GS, Archer L, Ensor J, van Smeden M, et al. Minimum sample size for external validation of a clinical prediction model with a binary outcome. Stat Med. 2021;40: 4230–4251. pmid:34031906
  25. 25. Miles J, Turner J, Jacques R, Williams J, Mason SM. Using machine-learning risk prediction models to triage the acuity of undifferentiated patients entering the emergency care system: a systematic review. BMC Diagnostic Progn Res. 2020. pmid:33024830
  26. 26. Digital NHS. Linked datasets supporting health and care delivery and research. 2018; 1–14. Available:
  27. 27. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. [cited 26 Aug 2021]. Available:
  28. 28. Hope RM. Package “Rmisc” v.1.5. 2016.
  29. 29. Debray TPA, Damen JAAG, Riley RD, Snell K, Reitsma JB, Hooft L, et al. A framework for meta-analysis of prediction model studies with binary and time-to-event outcomes. Stat Methods Med Res. 2019;28: 2768–2786. pmid:30032705
  30. 30. Debray T, de Jong V. Package “metamisc” Title Meta-Analysis of Diagnosis and Prognosis Research Studies. 2021 [cited 26 Nov 2021].
  31. 31. Lord Carter of Coles. Operational productivity and performance in English NHS acute hospitals: Unwarranted variations. Dep Heal. 2018; 87. Available:
  32. 32. Uchida K, Kouno J, Yoshimura S, Kinjo N, Sakakibara F, Araki H, et al. Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML). 1: 3.
  33. 33. Dixon J, Burkholder T, Pigoga J, Lee M, Moodley K, de Vries S, et al. Using the South African Triage Scale for prehospital triage: a qualitative study. BMC Emerg Med. 2021;21: 1–10. pmid:34715794
  34. 34. Wibring K, Lingman M, Herlitz J, Ashfaq A, Bång A. Development of a prehospital prediction model for risk stratification of patients with chest pain ☆. 2021 [cited 3 Jan 2022]. pmid:34662785
  35. 35. Knoery CR, Heaton J, Polson R, Bond R, Iftikhar A, Rjoob K, et al. Systematic Review of Clinical Decision Support Systems for Prehospital Acute Coronary Syndrome Identification. Crit Pathw Cardiol. 2020;19: 119–125. pmid:32209826
  36. 36. Snooks HA, Carter B, Dale J, Foster T, Humphreys I, Logan PA, et al. Support and assessment for fall emergency referrals (SAFER 1): Cluster randomised trial of computerised clinical decision support for paramedics. PLoS One. 2014;9. pmid:25216281
  37. 37. Kawamoto K, Houlihan CA, Balas A, Lobach DF. Improving clinical practice using clinical decision support systems: a systematic review of trials to identify features critical to success. [cited 4 Jan 2022].
  38. 38. Scholes S. Can Paramedics use the Manchester Triage System to triage and refer patients into clinical pathways of care from scene?
  39. 39. Strum RP, Tavares W, Worster A, Griffith LE, Costa AP. Emergency department interventions that could be conducted in subacute care settings for patients with nonemergent conditions transported by paramedics: a modified Delphi study. C open. 2022;10. pmid:35017171
  40. 40. Miles J. 17 Exploring ambulance conveyances to the emergency department: a descriptive analysis of non-urgent transports. Emerg Med J. 2017.
  41. 41. Dugas AF, Kirsch TD, Toerper M, Korley F, Yenokyan G, France D, et al. An Electronic Emergency Triage System to Improve Patient Distribution by Critical Outcomes. J Emerg Med. 2016;50: 910–918. pmid:27133736
  42. 42. van Rein EAJ, van der Sluijs R, Voskens FJ, Lansink KWW, Houwert RM, Lichtveld RA, et al. Development and Validation of a Prediction Model for Prehospital Triage of Trauma Patients. JAMA Surg. 2019;154: 421–429. pmid:30725101
  43. 43. NHS England [online]. Statistics » Ambulance Quality Indicators. In: NHS England [Internet]. 2021 [cited 17 Dec 2020]. Available: