Predictive performance of a multivariable difficult intubation model for obese patients

Background A predictive model of scores of difficult intubation (DI) may help physicians screen for airway difficulty to reduce morbidity and mortality in obese patients. The present study aimed to set up and evaluate the predictive performance of a newly developed, practical, multivariate DI model for obese patients. Methods A prospective multi-center study was undertaken on adults with a body mass index (BMI) of 30 kg/m2 or more who were undergoing conventional endotracheal intubation. The BMI and 10 preoperative airway tests (namely, malformation of the teeth in the upper jaw, the modified Mallampati test [MMT], the upper lip bite test, neck mobility testing, the neck circumference [NC], the length of the neck, the interincisor gap, the hyomental distance, the thyromental distance [TM] and the sternomental distance) were examined. A DI was defined as one with an intubation difficulty scale (IDS) score ≥ 5. Results The 1,015 patients recruited for the study had a mean BMI of 34.2 (standard deviation: 4.3 kg/m2). The proportions for easy intubation, slight DI and DI were 81%, 15.8% and 3.2%, respectively. Drawing on the results of a multivariate analysis, clinically meaningful variables related to obesity (namely, BMI, MMT, and the ratio of NC to TM) were used to build a predictive model for DI. Nevertheless, the best model only had a fair predictive performance. The area under the receiver operating characteristic curve (AUC) was 0.71 (95% confidence interval 0.68–0.84). Conclusions The predictive performance of the selected model showed limited benefit for preoperative screening to predict DI among obese patients.


Introduction
The reported incidence of difficult intubation (DI) among obese subjects varies from 1.8% to 14.3% [1][2][3]. These figures are much higher than the incidence reported for the general patients enrolled in the Perioperative Anesthetic Adverse Events in Thailand (PAAD THAI) Study, which was only 8:10,000, or 0.08% [4]. Obese patients experience a range of physiological alterations, including an increased oxygen consumption, a decrease in chest wall compliance and a reduction in functional residual capacity [5]. As expected in difficult situations, a long intubation time or low oxygen saturation may result in a high risk of perioperative adverse events, including death, persistent brain damage, unnecessary tracheostomy and unanticipated Intensive Care Unit admission [6].
Difficult intubation is commonly predicted using the Mallampati classification, thyromental distance, sternomental distance and interincisor gap. Nevertheless, the pooled sensitivity of each method is poor to moderate (range: 22%-62%) [7]. A combination of each test, or building risk scores, may provide high sensitivity; in other words, the model would have the ability to discriminate obese patients who have no outstanding features of problematic patients in non-difficult conditions. Some existing models, namely, the Nuguib and Arné models, have revealed a good performance in the prediction of difficult intubations [8,9]. The initial version of the Arné model comprised variables obtained from patient histories and physical examinations. The selection of those variables may lead to multicollinearity or interaction between the variables in the model because patients' diseases are commonly the single factor determining the airway pathology. For example, severe diabetic mellitus is related with limited joint ability or stiff joint syndrome. Acromegaly is related with macroglossia, prognathism and abnormal glottic structures, while rheumatoid arthritis is related with cervical spine abnormalities. In addition, some variables in the model were subjective, making them difficult to be interpreted by trainees or non-anesthesiologist personnel and leading to limitations in the model's application in clinical practice. As for the Neguib model, the authors have provided the model with the highest sensitivity to date for predicting unanticipated DI. However, some variables in the final model were not related to obese patients. Therefore, the current study set out to establish and assess the predictive performance of a new, practical, multivariable DI model for patients suffering from obesity.

Materials and methods
This prospective, observational, multi-center study involved 1 university hospital and 4 tertiary-care hospitals. It was authorized by the Siriraj Institutional Review Board, and patients gave their informed consent in writing. The enrolled patients comprised adults who were obese (defined as a BMI ! 30 kg/m 2 ) and scheduled to undertake elective surgery requiring general anesthesia using standard endotracheal intubation. Any patient with an obvious upper airway malformation or a history of difficult or failed intubations was excluded.
In order to consider all airway assessment tests which could be used to predict a difficult intubation, a literature review was undertaken. The search terms utilized were ("difficult intubation" OR "difficult airway") AND ("prediction" or "risk factor" OR "predictive model") AND airway assessment AND obesity, and other such combinations. Table 1 summarizes the definitions of airway assessment which were not specific to obese patients obtained from the literature. Five anesthesiologists, each with at least 5 years' clinical experience, developed clear definitions for each of 10 preoperative airway assessment methods (malformation of the teeth in the central part of the maxilla; modified Mallampati classification; hyomental, thyromental and sternomental distances; interincisor gap; range of motion, circumference and length of the neck; and upper lip bite test). Before the study commenced, 10 research assistants were trained in the examination modes, utilizing 5 volunteers who were obese and sets of photographs for that purpose. The instruction sessions continued until the interobserver reliabilities of the principle investigator and the 10 research assistants exceeded 0.7.
We specified the following four desirable attributes of a predictive model. Firstly, the selected predictive factors should be obtained from a physical examination. In addition, the tests must be easy enough to undertake to allow assessment at a bedside or in a preoperative clinic. Moreover, only a tape measure and a ruler should be required for the assessment; no complex apparatus should be needed. Lastly, the final score should be calculated easily and be user friendly.

Anesthetic protocol
Standard monitoring, namely, the use of a pulse oximeter, an electrocardiogram and non-invasive blood pressure, was employed before administering anesthesia. All tracheal intubations were conducted by anesthetists or anesthesiologists who had at least 2-years' experience in a Table 1. Description of airway assessment tests reported for general and obese patients.

Tests Definition
Malformation of teeth Buck [19], protruded or missing central teeth in the upper jaw [20].

Interincisor gap
The maximal distance between the upper and lower incisors, measured while patients sit in the neutral position [21].
Upper Lip Bite test Class I: lower incisors can bite the upper lip above the vermillion line. Class II: lower incisors can bite the upper lip below the vermillion line. Class III: lower incisors cannot bite the upper lip [22].

Modified Mallampati test
The patients sit upright with the head in the neutral position, and open their mouths as wide as possible and protrude their tongues to the maximum, and without phonation. Class I: the soft palate, fauces, uvula and pillars can be seen. Class II: the soft palate, fauces and uvula can be seen. Class III: if only the soft palate and base of the uvula can be seen. Class IV: if the soft palate is not visible [22].

Hyomental distance
The distance just above hyoid bone to the tip to the anterior-most part of the mentum in the neutral position [23].

Neck circumference
The level of the cricoid cartilage, perpendicular to the long axis of the neck [24].

Length of neck
The length from the external occipital protuberance to the vertebra prominens, as well as the circumference at the level of the cricoid cartilage anteriorly and spinous process of the sixth cervical vertebra posteriorly [19].
Neck mobility testing Sagittal flexion: the subjects are required to make a ''double chin" (suboccipital flexion) and then flex fully forward. Sagittal extension: nodding the head back and then fully extending it [25].
Thyromental distance The straight line between the thyroid notch and the bony point of the mentum with the head fully extended, measured in the supine position with the head fully extended and the mouth closed [22,26].
Sternomental distance The straight distance between the upper border of the manubrium sterni and the bony point of the mentum, measured in a seated position with the head fully extended and the mouth closed [22]. fulltime capacity and were blinded to the details of the individual patient assessments. In addition, the staff chose the laryngoscope position and the technique for intubation that they deemed would provide the best achievable visualization. The first laryngoscopy employed size 3 or 4 Macintosh laryngoscope blades. Patients were positioned with pillows supporting their heads and with their necks extended. The termination of intubation, i.e., the point when it was decided to cease standard intubation or to select alternative medical devices for airway management, was determined by the anesthesiologists in-charge. All patients were preoxygenated with 100% oxygen via a facemask for at least 3 minutes. The induction of the general anesthesia was achieved with either 1.5-2.5 mg/kg propofol or 5-7 mg/kg sodium thiopental, coupled with intubating dosages of muscle relaxants.

Definition of difficult intubation
The intubation difficulty scale (IDS) was used in this study to avoid possible misunderstandings with the term "difficult intubation". The IDS score is comprised of seven variables that have been reported as being related to difficult intubation (Fig 1). The parameter N1, which represents the number of intubation attempts, is most commonly associated with difficult intubation. The grading of the laryngeal view, described by Cormack and Lehane, is also an IDS score component (N4) [10]. This classification scheme is regarded as a standard tool for the description of views of the glottis. Researchers and clinicians use it to share their views on the degrees of intubation difficulties. The IDS score can also be used for comparisons of intubation difficulty levels in a variety of conditions, either through the summation of the scale's 7 components or by examining specific variables. The score has already been used for comparisons between the degree of intubation difficulty experienced by obese and non-obese patients [3]. A summation score of "0" represents an ideal, or easy, intubation. More specifically, it is one that is accomplished without noticeable physical exertion and in one attempt, is administered by a single operator, uses only one technique, and finds no impediment in the tube passage. The numerical value of the score climbs as further attempts are made; an impossible intubation is represented by the score of 1 [11]. For our study of obese Thai patients, we defined a difficult intubation as having an IDS score ! 5 [12].

Statistical analysis
The sample size estimate for our study drew upon the recommendations by statisticians for the performance of multiple logistic regression analysis [13,14], namely, that the number of obese subjects with DI should be 5 to 10 times the (in this case, 11) risk factors in the multiple logistic model (namely, BMI; malformation of teeth; upper lip bite test; Mallampati classification; neck mobility, length and circumference; thyromental, hyomental and sternomental distances; and interincisor gap). Given that, 55-110 subjects with DI were required. As a previous incidence report demonstrated that there was some degree of DI in around 15% of cases, a sample size of around 1,000 cases was deemed adequate for developing a model [12].
SPSS Statistics for Windows, version 18.0 (SPSS Inc., Chicago, Ill., USA), and MedCalc Statistical Software, version 17.6 (MedCalc Software bvba, Ostend, Belgium) were employed for the statistical analyses (S1 File). The baseline demographic data were summarized according to the data type: continuous data by their means and standard deviations, and categorical data by the percentages of individuals falling into each category. The model was developed using data obtained solely from the derivation cohort. All variables that were known to be related with DI in obesity were considered. DI or no DI was compared with either the Chi-squared test or the independent samples t-test. Factors which had clinical meaningfulness and/or a p-value < 0.2 from the univariate analysis were then used for the multiple logistic regression model.
The regression coefficients obtained from the multivariable model were used to develop the predictive model. The model's calibration, or its fit to the data, was subsequently assessed with the Hosmer-Lemeshow test; this was determined by the degree of agreement between the risk score probabilities that had been predicted by the model, and the probabilities that were actually observed [15]. We estimated the model's prognostic ability to discriminate patients with, or without, a risk of difficult intubation using the receiver operating characteristic (ROC) curve; the estimated shrinkage factor was then tested for the performance of difficult intubation. The optimal cut-point of the predictive score was identified by the ROC curve's shape, and the area under the curve (AUC) allowed for an estimation to be made of the degree of a test's discriminative power. The AUC could have a value from 0 to 1, and it was a satisfactory indicator of the goodness of a test. For a diagnostic test to be regarded as perfect, it would have an AUC of 1.0; in comparison, the AUC of a nondiscriminating test would be 0.5 [16]. In addition, the maximum value of the Youden's index was considered. This global measure of performance is used to assess a diagnostic procedure's overall discriminative power and to compare it with others [17,18]. Finally, the ROC curve was presented to demonstrate the performance of difficult intubation for the best cut-off point in terms of Youden's index, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (LR+), negative likelihood ratio (LR-), AUC and 95% confidence interval (CI).

Results
The study enrolled 1,015 obese patients during the period from May 1, 2013 to August 31, 2016. The data of 500 cases from a university hospital and of 515 additional cases from 4, nonuniversity hospitals were collected. The patients' mean age was 48.3 years, with around threequarters being female. Their average BMI was 34.2±4.3 kg/m 2 (range: 30-68.4 kg/m 2 ), and two-thirds had at least one coexisting disease, the most frequent being hypertension, diabetes mellitus and dyslipidemia. Thirty-one also had obstructive sleep apnea. Details of other demographic data, the preoperative airway assessment tests, the surgical procedures employed and the sites of the surgical areas are at Table 2.
The number of easy intubations (IDS 0-1), slight DIs (IDS 2-4) and DIs (IDS ! 5) were 822 (81%), 161 (15.8%) and 32 (3.2%), respectively ( Table 3). The distributions of the scores components are at Fig 1. Almost all successful intubations (99.2%) were done using direct laryngoscopy. This study had no incidents of failed intubations. The Cormack and Lehane laryngoscopic view distributions were 62.6% for grade I, 29% for grade II, 7.6% for grade III and 0.9% for grade IV. During the intubation period, a 1.3% incidence of brief desaturation was recorded. Oral structure injuries were reported by 2.7% of patients; a further 4.1% reported a sore throat. There were no reports of serious complications, such as death, brain damage or aspiration. Table 1 outlines the 11 factors associated with difficult intubation that have been identified in literature [19][20][21][22][23][24][25][26]. The incidence of DI among obese Thai patients is low. The rule of thumb is that only 3 factors should be selected to build a model (Table 4). Three difference binary logistic regression models were fitted.  (Table 5). Nevertheless, the predictive performance of the selected model was only fair.
As demonstrated at Fig 2, the AUC for the best-performing equation stood at 0.72 (95% CI 0.62-0.81). As to the cut-off point to discriminate between a high and low probability of difficult intubation, the optimal cut-off point was > 21.06. This cut-off point demonstrated the highest value of Youden's index of 0.43; the best AUC of 0.72 (95% CI 0.62-0.81); an optimal value of sensitivity (68.75%) and specificity (74.47%); a PPV of 23.0; an NPV of 95.5; an LR+ of 2.69; and an LR-of 0.42 (Table 6).

Discussion
The discrimination ability of risk prediction modeling depends on the factors selected for the target population. We compared our model to two models which had been derived from studies conducted among the general population. The three equations had similar factors: the first,  It was found that the three equations have factors associated with DI in terms of the Mallampati classification. Other possible predictors of DI in obese patients, namely, neck circumference [27], thyromental distance, BMI [3] and NC/TM [24], have been reported in other studies. However, three factors (MMT, BMI, and NC/TM) were selected for the final Thai obese model. Peduzzi et al. recommended that 10 events per predictor variable be used to prevent the major problem with the logistic model of a lack of validity. The interincisor gap was not selected to create our obese model because it is not related to obesity, even though there is a statistical significance [13]. Fig 3 shows three ROC curves representing how well the equations separate obese patients with and without DI. Overall, the accuracy of discrimination of the three presenting model equations were fair (Table 7).
Different groups of patients are associated with the selection of the factors predicting DI. With emergency ward patients, DI have been predicted using the PreDAIT model. The cut- point was ! 2 of the derivation set, and the validation set had an AUC of 0.68 (95% CI 0.64-0.73) and 0.63 (95% CI 0.58-0.68), with specificities of 91.5% and 87.7%, while the sensitivities were reduced to only 27.1% and 28.9%, respectively. The selection of factors was associated with emergency patients. The most parsimonious model included 5 factors: a score > 3 on the Glasgow Coma Scale (GCS); limited movement of the neck; inability to palpate the neck landmarks; trismus; and blood and/or emesis in the airway [28]. Regarding ICU patients, factors related to DI resembled factors identifiable in the operating room. The MACOCHA score draws upon the following items: MMT scores III and IV, obstructive sleep apnea syndrome, reduced cervical spine mobility, mouth opening < 3 cm, a GCS < 8, severe hypoxemia (< 80%) before intubation, and intubation by a non-anesthesiologist. The simplified scoring system had high discriminative ability, with an AUC for the validating model of 0.86 (95% CI 0.76-0.96) [29].
With regard to the characteristics of a model for screening tests, the best equations must be able to distinguish between difficult and easy intubations among patients. Consequently, a prediction model's sensitivity is more critical than its specificity; given that, sensitivity should be weighted as being more important when deciding which model is the most appropriate. The classification by AUC for a diagnostic test, developed by Zhu et al., [30] may be summarized as excellent: 0.9 < AUC < 1.0; good: 0.8 < AUC < 0.9; worthless: 0.7 < AUC < 0.8; and not good: 0.6 < AUC < 0.5. The separation of DI among obese patients using the ROC curve showed that the AUC was 0.71, which was classified as acceptable. The estimated shrinkage factor was 0.83, which was less than 0.85 [31]. Additionally, the predictive value depends on a disease's prevalence in the population group that is being diagnosed [32]. A good model must have sufficient prevalence, high sensitivity and high specificity, and should allow diagnosis before the patient has symptoms [32,33]. In conclusion, the prevalence of DI among obese Thai patients was low, and the predictive performance of the selected model showed limited benefit for preoperative screening to predict DI. Further studies should discover other factors that could be added to develop an improved model for predicting the likelihood of DI. Those factors could be obtained from physical examinations as well as radiologic imaging techniques, such as ultrasonography, computerized tomography scans or magnetic resonance imaging of the neck and upper airway. The optimal cut-point of DI was ! 21.06.

Case #1
A 52-year-old man, weighing 109.5 kg and 167 cm tall, was scheduled for septoplasty. On airway examination, his interincisor gap was 4.5 cm, the MMT was class III, the neck circumference was 54.5 cm, and the thyromental distance was 9.2 cm. He had no malformation of the teeth, the upper lip bite test was grade I, and the hyomental distance was 5.7 cm. The values for this patient can be entered in the predictive model: Y ¼ ½ð10:94 Â 1Þ þ ð10:89 Â 5:92Þ À 39:26 As the values of the discriminant function (Y) are over 21.06, the model correctly predicted difficult intubation. Nevertheless, according to the predicted probability of the model, there was only a 9.7% likelihood that this would occur (S1 Fig).

Case #2
A 44-year-old female, weighing 89 kg and 160 cm tall, was scheduled for a thyroidectomy. On bedside examination, her interincisor gap was 5.4 cm, the MMT class I, the neck circumference was 39.5 cm, and the thyromental distance was 9.5 cm. She had no malformation of the teeth, the upper lip bite test was grade I and the hyomental distance was 5.0 cm.
As the values of the discriminant function (Y) are below 21.06, the model correctly predicted an easy intubation. According to the predicted probability of the model, there was only a 1.9% chance that this was likely to occur (S1 Fig).