Intra-individual variation of upper airway measurements based on computed tomography

The aims of this study were (1) to quantify the intra-individual variation in the upper airway measurements on supine computed tomography (CT) scans at two different time points; and (2) to identify the most stable parameters of the upper airway measurements over time. Ten subjects with paired CT datasets (3–6 months interval) were studied, using computer software to segment and measure the upper airway. The minimum cross-sectional area of the total airway and all its segments (velopharynx, oropharynx, tongue base, and epiglottis) generally had the largest variation, while the length of the total airway had the lowest variation. Sphericity was the only parameter that was stable over time (relative difference <15%), both in the total airway and each subregion. There was considerable intra-individual variation in CT measurements of the upper airway, with the same patient instruction protocol for image acquisitions. The length of the total airway, and the sphericity of the total upper airway and each segment were stable over time. Hence, such intra-individual variation should be taken into account when interpreting and comparing upper airway evaluation parameters on CT in order to quantify treatment results or disease progress.


Introduction
Over the past decades growing awareness of the detrimental effects of obstructive sleep apnea (OSA) has increasingly raised interest in morphometric evaluation of the upper airway [1][2][3]. Traditionally, upper airway morphology imaging consisted of a two-dimensional (2D) lateral cephalogram [4,5]. However, due to the technical advancement of computed tomography (CT), this imaging modality has gained increasing popularity [5,6]. Compared with a 2D lateral cephalogram, CT exhibits the capacity to analyze the upper airway three-dimensionally [7,

CT image acquisition
The included spiral CT scans of head and neck were acquired between 2018 and 2019 using the following scanning protocol (SOMATOM Force, Siemens Medical Solutions, Erlangen, Germany): 120 kV, 380 mAs, max. FOV 300 mm, pitch 0.85, slice thickness 1.0 mm, slice increment 1.0 mm, image matrix 512×512, window W1600/L400, hard-tissue kernel H60s. During the imaging procedure, the patients were in supine position and were instructed to remain still with maximum intercuspation, to breathe gently, and not to swallow.

CT measurements
Reference frame. The Digital Imaging and Communications in Medicine (DICOM) files of the CT were imported in Maxilim software (version 2.3.0, Medicim NV, Mechelen, Belgium) for measurements. A hard-tissue reconstruction was created at 300 Hounsfield units (HU) and a soft-tissue reconstruction at -400 HU. To standardize the measurements and minimize the measurement error, the Frankfort Horizontal (FH) plane was constructed for reorientation of the 3D images at T0 [19]. The T1 dataset was superimposed on the T0 dataset, using voxel-based matching on the structures of the cranial base [20,21].
Landmarks. After re-orientation and superimposition of the paired CT scans, four anatomical landmarks (Fig 1) were identified for segmentation of the regions of interest: posterior nasal spine (PNS), tip of uvula (TUV), tip of epiglottis (TEP), and base of epiglottis (BEP). The reliability of these landmarks has been validated in a previous study [9]. Based on TUV and TEP, the midpoint between them (MUE) was then calculated and localized (Fig 1). Because PNS is a bony landmark and thus an unaltered position between scans, it was localized only once for the T0 scan and re-used for the T1 scan; the other four landmarks were identified on both scans.
Boundary. The soft-tissue model was imported into Blender software (version 2.81, Blender Foundation, Amsterdam, The Netherlands) for further analysis. The superior boundary of the upper airway was defined as the plane through the PNS parallel to the FH plane [9,22]. The inferior boundary was the plane through the BEP parallel to the FH plane [9,22]. The lateral and posterior boundaries consisted of the pharyngeal walls and the anterior boundary was composed of the soft palate, base of tongue, and anterior wall of the pharynx, with a cutoff at PNS point [10,23].
Segmentation. Based on the identified landmarks, the upper airway was segmented into four distinct regions (Fig 1): velopharynx region (between PNS and TUV), oropharynx region (between TUV and MUE), tongue base region (between MUE and TEP), and epiglottis region (between TEP and BEP). Cutting planes were parallel to the FH plane.
Upper airway parameters. One operator (CK), with extensive experience with Blender, performed the measurements in all 20 datasets. The operator was blinded to the measurement results of T0 scans during the measurement for T1 scans. To quantify the inter-operator reliability, a second operator (RS) repeated the entire measurement protocol in five randomly selected datasets. The operators were blinded to each other's results. The upper airway parameters of interest were volume, length, surface area, minimum cross-sectional area (MCA), and lateral dimension (LAT) and anteroposterior dimension (AP) of the MCA. These parameters were measured for the total airway and for the individual segments (Table 1). Before measuring the MCA, the "islands" (loose air parts) and "dead space" (space in mouth and space between tongue base and epiglottis) were removed from the upper airway model (Fig 2).
Derived airway parameters. Based on these parameters, the following derived parameters were calculated: mean cross-sectional area (meanCSA) [24] for the size of the total airway and each segment; LAT/AP ratio in MCA, airway uniformity [10], and sphericity [10] for the shape of the total airway and of each segment separately (Table 1).
Outcome variables. The following outcome variables were derived in this study: • Intra-individual variation (number of patients = 10; number of CT datasets = 20): the relative difference in the measurements between two scans (T0 and T1) of an individual by operator 1.
• Intra-individual repeatability (number of patients = 10; number of CT datasets = 20): the intra-class correlation coefficient (ICC) for the measurements between two scans (T0 and T1) of an individual by operator 1.  • Inter-operator variation (number of CT datasets = 5): the relative difference between the measurements by operator 1 and operator 2 at T0/T1.
• Inter-operator reliability (number of CT datasets = 5): the ICC for the measurements by operator 1 and operator 2 at T0/T1.
• Agreement and smallest detectable difference (SDD) in the measurements between two scans (T0 and T1) of an individual (number of patients = 10; number of CT datasets = 20) by operator 1.

Statistical analysis
All data were analyzed using SPSS software (version 26, IBM Corp., Armonk, NY, USA). Descriptive statistical analysis was performed for all demographic and outcome variables. The intra-individual repeatability and inter-operator reliability of upper airway measurements were evaluated using ICC [25]. Values of ICC less than 0.40, between 0.40 and 0.75, and greater than 0.75 are indicative of poor, fair to good, and excellent reliability, respectively [25]. The relative difference was used to estimate the intra-individual variation and inter-operator variation, which was calculated with the formula: (absolute difference/mean) � 100%. Bland-Altman analysis was used to determine the agreement of the airway measurements between two different scans and to obtain the precise confidence interval for paired difference [26]. Based on Bland-Altman's method, SDD in the airway measurements between two scans of an individual was calculated with the formula: (1.96 � SD T0-T1 ).

Results
Descriptive statistics of all measurements, intra-individual variation estimated by relative difference, intra-individual repeatability estimated by ICC, inter-operator variation estimated by relative difference, and inter-operator reliability estimated by ICC are presented in Table 2. Of the 50 upper airway parameters, the ICC values of intra-individual repeatability were greater than 0.75 for 26, between 0.40 to 0.75 for 19, and less than 0.40 for 5. For the inter-operator reliability estimated by the ICC, all the parameters showed excellent reliability (ICC 0.832-0.999). As for the intra-individual variation in the total airway, the mean relative difference  Table 2 ; and MCA at all levels. The relative differences of the sphericity between two scans in the total airway and each segment were all below 15%. Table 3 shows the results of Bland-Altman analysis of differences between the paired scans (T0-T1; mean, SD, and 95% limits of agreement), as well as the absolute value of differences (|T0-T1|; mean and SD) and SDD values.

Discussion
This is the first study to evaluate the intra-individual variation of linear, areal, and volumetric measurements of the upper airway in CT scans acquired at two different time points. Because of the short time interval between T0 and T1 (3-6 months), the absence of airway-influencing intervention during or between scans, no airway-influencing pathology or disease present in the patient, and the same position protocol between CT acquisitions, no airway alteration was expected within the scan pairs in our study population. Nevertheless, our findings suggest that different degree of variation exists in each segment of the upper airway between T0 and T1. Although patients with an airway-altering disease (i.e., OSA) or intervention were excluded, this finding may be especially important for evaluating change in these patients as a method to quantify diseases progress or treatment effects. Regarding the intra-individual variation of the upper airway measurements between T0 and T1 (see Table 2), we found that the MCA of the total airway and of each segment separately generally showed the largest variation, with a relative difference of approximately 30%. Such variation could have two causes. Firstly, the location of MCA is not always constant during the dynamic upper airway movement due to breathing. Secondly, errors or variation in determining the location of the MCA may exist. Although several studies have found that MCA is the most important characteristic of the upper airway that may contribute to distinguishing OSA cases from non-OSA cases [27,28], caution is thus warranted in interpreting this finding or applying it in clinical practice due to the natural variation found for MCA in the present study.
A significant limitation in CT analysis of the upper airway is differentiating the boundaries of soft tissues and empty spaces (air) by using limited difference in grey levels between them. However, the measurement of upper airway length is not affected by this as it is determined by a user-generated plane. Increased airway length has been suggested to be correlated with the presence and severity of OSA [10]. For consistency and reproducibility, we used a bony landmark having shown excellent reliability in previous studies-PNS-to define the superior boundary of the upper airway [9,29]. In our study, the length of the total upper airway showed the least variation (relative difference: 4.9%) and it may therefore be regarded as a stable evaluation parameter for the upper airway.
Airway shape may contribute to the development of OSA [1,10]. Recently, a derived variable, that is sphericity of the upper airway, was suggested and investigated [10,30]. Klazen et al. found that less sphericity was the main predictor for OSA in patients with craniofacial macrosomia [30]. It is interesting to note that sphericity had low ICC values for intra-individual repeatability; however, it also showed low variation between T0 and T1 in both the total airway and each segment, all the relative differences being below 15%. This may be explained by the fact that ICC is a ratio between inter-unit variability and total variability (intra-unit and inter-unit) [31]. In this study, minor inter-unit variabilities of the sphericity measurements were indicated by the extremely low SDs, which could explain the low ICC values. Therefore, this parameter should not be disregarded based on ICC value alone. The mean relative differences between two CT scans of the volumes of the total airway, velopharynx, oropharynx, tongue base, and epiglottis were 21.3%, 15.9%, 34.4%, 29.8%, and 25.4%, respectively. Obelenis Ryan et al. [11] evaluated CBCT scans of 27 patients obtained at two time points and reported that the mean relative differences of the volumes of the nasopharynx, oropharynx, and hypopharynx were 9.8%, 17.8%, and 12.0%, respectively. However, care should be taken in comparing the results between the two studies because of the different methodology in the upper airway segmentation. Moreover, differences between CT and CBCT evaluation of the upper airway should be noted. CT are performed when the patient is in the supine position, while most CBCT units acquire images with the patient in the upright position [32]. Soft tissue contrast resolution on CBCT imaging is inferior to CT imaging and therefore segmentation results are different [33].
There are several studies describing the morphometric evaluation of the upper airway [23,24]. To date, however, there is no methodological standardization in 3D analysis of the upper airway [34]. Chen et al. [9] proposed a method of landmark localization for 3D upper airway measurements, which showed excellent intra-and inter-operator reliability. In the present study, four of the six landmarks proposed by Chen et al. were utilized: PNS, TUV, TEP, and BEP. Additionally, a derived landmark, MUE, was localized at the midpoint between TUV and TEP. Through the experience of over 8,000 drug-induced sleep endoscopy (DISE) examinations, Kezirian et al. [35] found four structures, namely velum, oropharyngeal lateral wall, tongue base, and epiglottis, which play a prominent role in upper airway obstruction. Accordingly, they proposed the VOTE classification system, which has been widely used for characterizing DISE findings. In 3D evaluation of the upper airway, various subregion definitions of the airway have been used in previous studies [11,22,23]. However, structure-based assessment for the upper airway cannot be achieved in these methods. Therefore, based on the work by Kezirian et al. [35], the upper airway was divided into four subregions corresponding to the VOTE classification system. Because PNS, TUV, TEP, and BEP demonstrated excellent intra-and inter-operator reliability in the study of Chen et al. [9], the segmentation of the upper airway based on these landmarks may be considered reliable.
In the current study, all the parameters showed excellent inter-operator reliability. Zimmerman et al. conducted a study to assess the reliability of upper airway analysis with CBCT [34]. Interestingly, in contrast to our results, they found that the MCA and total airway volume showed poor inter-operator reliability. It needs to be noted that in Zimmerman et al.' study, six examiners of varying levels of education and clinical experience separately performed the upper airway analysis, and the reliability improved with the examiner education and experience. In our study, the measurement protocol was conducted by two experienced examiners, which may explain the discrepancy of reliability between the two studies. In addition, unlike their study, we used a fixed threshold for the selection of the upper airway. In this way, the operator's subjectivity in the threshold sensitivity selection was eliminated. Since it is generally accepted that the inter-operator reliability of the airway measurements is lower than the intraoperator reliability [34], it was decided to evaluate only the inter-operator reliability. Given that the measurement method of the upper airway used in this study is considered to be reliable, it was possible to evaluate the variation of upper airway measurements between repeated CT scans.
For the upper airway analysis, the primary confounding factors during 3D radiographic image acquisition include the individual's body, head, jaw, and tongue position, as well as the respiratory phase [5,14,15]. A systematic review on the effect of head and tongue posture on the dimensions and morphology of the pharyngeal airway concluded that altered head, body, and jaw position had a significant effect on the upper airway dimensions, particularly on the retro-palatal and retro-glossal regions of the oropharynx [14]. In another study by Gurani et al. [5], five sagittal MRI scans from ten subjects in different head and tongue positions were measured. They found that with the head in supine neutral position, the retropalatal, oropharyngeal, and total volumes increased significantly when the tongue was altered from a resting position to the tip of the tongue in contact with the posterior edge of the hard palate (P �0.05). Schwab et al. [15] investigated the effects of respiration on the upper airway size using cine-CT in 15 normal subjects, 14 snorer/mildly apneic subjects, and 13 patients with OSA, all of whom were scanned in the supine position during awake nasal breathing. In all three groups, there were significant dimensional changes at all anatomic levels of the upper airway during the respiratory cycle, especially in the OSA groups. Therefore, 3D assessment of the upper airway cannot be considered reliable unless all the above confounding factors are controlled during image acquisition. In this study, even with the same patient instruction during CT acquisition, different upper airway readings were found between two repeated CT scans within the same individual, which emphasizes the need for a more standardized patient instruction in terms of posture and breathing phase during image acquisition for evaluation. This needs to be developed and validated in future studies. As recommended by the American Association of Orthodontists White Paper [36], three-dimensional imaging of the airway is a snapshot of a specific moment of the breathing cycle and such technique currently does not represent a proper and reliable risk assessment tool for OSA. The results of the current study reinforce this recommendation.
This study can provide better insight into the real effects of potentially airway-altering procedures on airway size and morphology, such as orthognathic surgery and orthodontics treatment. The differences in the upper airway measurements caused by orthognathic surgery, such as maxillomandibular advancement for OSA treatment, are probably larger than those between two distinct CT scans in our study. However, minor differences in the upper airway measurements should be interpreted cautiously, in particular when quantifying the effect of treatment on the upper airway parameters in a single individual. The SDD provides the amount of potential variation that should be taken into account when interpreting the measurement changes over time at individual level (see Table 3). For example, a SDD of the MCA at the total airway of 61.3 mm 2 was found in the present study. This suggests that a change in MCA can only be considered to represent a real change if it is larger than 61.3 mm 2 .
Our study has several limitations. First, the sample size might be considered limited. However, it should be mentioned that the sample size is sufficient to demonstrate the considerable intra-individual variation in upper airway measurements. This variation is not expected to decrease with a larger sample size; only its estimate will be more precise [37]. Second, although patients were provided with standardized instructions during CT acquisition, the retrospective nature of the data collection makes it impossible to verify this. While in theory this study could have been performed prospectively, using an enlarged field-of-view, this would have exposed patients who do not need imaging of the complete airway to a larger radiation dose, including vital structures, raising ethical objections to a prospective set-up. This is the reason why we tried to make use of this set of existing radiographic examinations. The fact that most of published studies on 3D evaluation of the upper airway are retrospective studies with various patient instruction protocols, emphasize the difficulty of this issue. Our study highlights that caution should be taken when interpreting the results of upper airway comparison and evaluation using CT, and that a strict protocol is required for repeated measurements and subsequent imaging sessions. Further studies with a larger sample size should be performed to re-determine the natural intra-individual variation of the airway between two CT scans acquired at different time points, using a standardized patient instruction protocol.

Conclusion
Our study demonstrates that the dimensions and morphology of the upper airway in CT scans can vary considerably within an individual at different time points, even if the same patient instruction protocol for image acquisition is used. The MCA of the total airway and all its segments generally had the largest intra-individual variation, with relative differences of approximately 30%. The length of the total airway had the lowest intra-individual variation, with relative difference of 4.9%. The relative differences of the sphericity between two scans in the total airway and each segment were all below 15%. The length of the total upper airway, and the sphericity of the total airway and each segment were stable over time. Therefore, such intra-individual variation should be considered when interpreting the results of upper airway comparison and evaluation using CT, and the smallest detectable difference is necessary to detect true differences in upper airway measurements over time at individual level.