Validation of the OrthoGnathicAnalyser 2.0—3D accuracy assessment tool for bimaxillary surgery and genioplasty

Orthognathic surgery is a widely performed procedure to correct dentofacial deformities. Virtual treatment planning is an important preparation step. One advantage of the use of virtual treatment planning is the possibility to assess the accuracy of orthognathic surgery. In this study, a tool (OrthoGnathicAnalyser 2.0), which allows for quantification of the accuracy of orthognathic surgery, is presented and validated. In the OrthoGnathicAnalyser 2.0 the accuracy of the osseous chin can now be assessed which was not possible in the earlier version of the OrthoGnathicAnalyser. 30 patients who underwent bimaxillary surgery in combination with a genioplasty were selected from three different centers in the Netherlands. A pre-operative (CB)CT scan, virtual treatment planning and postoperative (CB)CT scan were required for assessing the accuracy of bimaxillary surgery. The preoperative and postoperative (CB)CT scans were aligned using voxel-based matching. Furthermore, voxel-based matching was used to align the pre-operative maxilla, mandible and rami towards their postoperative position whereas surface-based matching was used for aligning the pre-operative chin towards the postoperative position. The alignment resulted in a transformation matrix which contained the achieved translations and rotations. The achieved translations and rotations can be compared to planning values of the virtual treatment plan. To study the reproducibility, two independent observers processed all 30 patients to assess the inter-observer variability. One observer processed the patients twice to assess the intra-observer variability. Both the intra- and inter-observer variability showed high ICC values (> 0.92) and low measurement variations (< 0.673±0.684mm and < 0.654±0.824°). The results of this study show that the OrthoGnathicAnalyser 2.0 has an excellent reproducibility for quantification of skeletal movements between two (CB)CT scans.


Introduction
In orthognathic surgery, suboptimal facial appearance and function may be improved by correcting dentofacial deformities [1]. Three-dimensional (3D) imaging has enhanced the potential and accuracy of the orthognathic surgery workflow [2]. The introduction of cone-beam computed tomography (CBCT) in combination with virtual imaging software enables diagnostics, planning and evaluation in 3D. This has improved quantification of, formerly difficult to measure, characteristics of dentofacial deformities. These include rotations in the axial plane (yaw) or frontal plane (roll or occlusal cant) [3]. Additive manufactured occlusal splints are based on a virtual surgical planning (VSP) and are used to accurately execute the VSP during surgery [4]. These new 3D techniques have led to more predictable postoperative outcomes and a reduction of surgical error [5].
Similar to VSP, postoperative accuracy of the performed surgery should be evaluated in 3D. The result of orthognathic surgery was traditionally assessed in two dimensions, using preand postoperative lateral radiographs [6]. Contemporary software enables automatic matching, also called registration, of two 3D imaging datasets. Voxel-based matching (VBM) is the registration method of preference, due to its higher accuracy and user independency [7]. In this technique, the two volumes of interest (VOI) are aligned by maximizing the overlap of the greyscale values of the individual voxels [8]. After aligning the VOI, the translations and rotations in the sagittal, coronal and axial planes (six degrees of freedom) realized by the orthognathic surgery, can be computed and compared to the VSP [9,10]. The systematic review of Gaber et al. [7], has reviewed several 3D postoperative assessment methods of virtually planned orthognathic surgery. The OrthoGnathicAnalyser (OGA), as described in our previous study [11], was identified as the 3D assessment tool of choice, due to the application of VBM and the semi-automatic approach. Over time, the OGA has already been applied in large clinical studies [12,13], demonstrating its clinical applicability.
After validation of the first version, the development of the OGA continued and has resulted in OGA 2.0. While the former version only enabled analysis of the mandible, maxilla, and the ramus, the new version also allows analysis of the chin segment. In addition, the efficiency of the workflow has been improved, requiring less manual interaction and computing time. The software is compatible with various VSP software, such as IPS CaseDesigner (KLS Martin Group, Tuttlingen, Germany) and Dolphin 3D (Dolphin Imaging & Management Solutions, Chatsworth, USA). The purpose of this study was to present and validate the new version of the OGA (2.0) in patients who underwent bimaxillary surgery in combination with a genioplasty. Because different centers use different imaging protocols and hardware from different manufacturers for their preoperative and postoperative imaging, a multicenter approach was chosen to assess the robustness of the software tool.

Workflow of OrthoGnathicAnalyser 2.0
The workflow of OGA 2.0 was based on the workflow described in the previous article [11] and is illustrated in Fig 1. In preparation for the surgery, the acquisition of a preoperative (CB) CT scan is required. This scan was used for the virtual planning of the subject with planning software. After surgery, a postoperative (CB)CT scan was acquired.
The assessment of discrepancy between VSP and postoperative outcome was performed semi-automatically, using the OGA module which was implemented in the in-house developed 3D analysis software called 3DMedX (version 1.2.4.1, 3D lab Radboudumc, Nijmegen, the Netherlands). 3DMedX is a standalone software tool based on the C++ OpenInventor Toolkit (version 9.9.14, Thermo Fisher Scientific, Waltham, Massachusetts, USA). To start the analysis with OGA, the raw preoperative and postoperative (CB)CT scans (in Digital Imaging and Communications in Medicine (DICOM) format) were imported in the software. From the VSP the following files were necessary: the original and planned 3D models (as Standard Tessellation Language (STL) files) and a transformation matrix (in extensible markup language (XML) format). The transformation matrix described the transformation of the virtual models to the natural head position (NHP) on which the VSP was based. When no transformation matrix was available, the software provided a wizard-tool to identify the NHP.
Next, the user was asked to indicate four rotation points, which were used as reference points for the calculation of translations and rotations in subsequent analyses. The first point was the upper incisor point, defined as the most mesial point on the incisal edge of element 11. The second point was pogonion, as described by Swennen et al [14]. The third and fourth points were the most cranial points of the left and right condylar head. The upper incisor point was utilized as the origin (and thus rotation point) to align the 3D models to NHP.
To compute the six degrees of freedom in VSP, the preoperative STL models were automatically matched to the planned STL models using surface-based matching (SBM). The resulting transformation matrix was calculated to represent the planned rotations and translations around the four previously indicated rotation points.
In the next steps, voxel-based matching (VBM) was used to register the individual bony segments. In VBM, a region of interest (ROI) in both scans is selected, which will be subsequently aligned based on the greyscale values [15]. First, the pre-and postoperative (CB)CT scans are aligned based on the ROI, unaffected by surgery, which consisted of the anterior cranial base, zygomatic arches and forehead [16]. For the maxilla, mandible and the left and right ramus,

PLOS ONE
Validation of the OrthoGnathicAnalyser 2.0 ROI boxes were selected to match the osteotomized bone segments. The transformation matrices, describing the translations and rotations from the preoperative models to the postoperative models, were recorded.
For registration of the chin segment, SBM was implemented instead of VBM (see Fig 2). Surface models representing the chin were generated from the preoperative and postoperative DICOM datasets. The preoperative and postoperative chin segments are roughly aligned manually, after which SBM was performed by using the unaltered caudal part of the chin, excluding the area of osteosynthesis plate. The transformation matrix obtained after SBM of the preoperative model on the postoperative model was recorded.
The resulting transformation matrices were calculated to represent the surgically achieved rotations and translations around the four previously indicated rotation points for each segment. Finally, the differences between the planned and achieved movement of each segment in six degrees of freedom (translation and rotation in sagittal, coronal and axial plane) were calculated. For the chin evaluation, the difference between the planned movement of the chin and its postoperative position was calculated and corrected for the postoperative mandibular position. This excluded the potential mandibular error from the accuracy result of the chin. For the rami, only rotations were computed, assuming the condylar heads were not translated. (A video is available showcasing the workflow of the OrthoGnathicAnalyser2.0).

Validation study
Thirty subjects were enrolled in this multicenter retrospective validation study, in three centers: Amsterdam University Medical Center (UMC), Location AMC, MKA Kennemer & Meer, location Haarlem and Radboud University Nijmegen Medical Center. Per center, ten subjects with dentofacial deformities who underwent bimaxillary surgery in combination with a genioplasty between 2016 and 2020 were considered for inclusion. Availability of pre-and postoperative (CB)CT data was required. Exclusion criteria were the use of different imaging modalities (i.e. a preoperative CT scan with postoperative CBCT scan or vice versa), previous history of surgery in the maxillomandibular region and high complex cases such as multi-piece Le Fort I or cleft lip and palate cases. Prior to data analysis, all subject data were anonymized. The study was approved by the local ethics committee of Amsterdam UMC, location AMC (W20_127). All patients signed an informed consent at the start of treatment.

PLOS ONE
Validation of the OrthoGnathicAnalyser 2.0 Image acquisition. The preoperative scan and postoperative scan were acquired according to the clinical protocol of the individual centers. The dental arches were recorded by making a CBCT scan of dental casts. Image acquisition parameters are described in Table 1.
Surgery planning. All cases were virtually planned in IPS CaseDesigner, version 2.0.4.2 (KLS Martin Group, Tuttlingen, Germany). A 3D virtual hard-tissue and soft-tissue model were rendered and oriented in the NHP of the subject. The maxilla, mandible, chin and rami were repositioned towards their desired position. The required 3D models and transformation matrix were exported.
Clinical validation and evaluation. Two independent observers (FB and JS) analyzed the (CB)CT data sets of all cases in order to validate OGA 2.0. To determine inter-observer variability, both observers performed the OGA workflow for each subject independently. For intra-observer variability, one of the observers repeated the workflow on all cases a second time and in a random order, with an interval of at least two weeks between both assessments.
Statistical analysis. The absolute inter-observer and intra-observer difference was calculated. One-way multivariate analysis of variance (MANOVA) was used to determine statistical differences between centers. For the evaluation of the inter-observer and intra-observer correlation and agreement, the intra-class correlation coefficient (ICC) was calculated with two-way random and two-way mixed models respectively. Statistical data analyses were performed with IBM SPSS software, version 26.0 (IBM Corp., Armonk, NY, USA).

Results
A total of thirty subjects (ten subjects per participating center) were included in this study. During data analysis, four subjects of the Amsterdam UMC and one subject of the MKA Ken-nemer&Meer had to be excluded because of motion artefacts (n = 2), corrupt DICOM data (n = 2) or incorrect field of view (n = 1). This resulted in a study population of 25 subjects. The demographics of the population are presented in Table 2.

PLOS ONE
Validation of the OrthoGnathicAnalyser 2.0

Validation of OGA 2.0
The inter-observer and intra-observer intraclass correlation coefficients (ICC) and the mean differences for the maxilla, mandible, chin, left and right ramus are reported in Tables 3-6 respectively. The mean inter-observer and intra-observer translational and rotational differences of the maxilla and mandible were all below 0.3 mm and 0.5 degrees. The least observer dependent was the anteroposterior translation of the mandible, for which an inter-observer and intraobserver ICC of 0.996 and 0.999, respectively, were found ( Table 4). The differences between the centers were non-significant, except for the inter-observer difference of the yaw of the maxilla (p = 0.047) and the intra-observer difference of the autorotation of the right ramus (p = 0.046). Table 5 provides the results of the chin analysis. Concerning the translational differences of the chin, the superoinferior direction was slightly more user dependent than the anteroposterior and right-left directions (0.251 mm versus 0.213 mm and 0.223 mm, respectively). The highest difference between users was reported in the pitch with 0.654 degrees inter-observer and 0.604 degrees mean intra-observer difference. Table 3. Intra-observer and inter-observer intraclass correlation coefficients (ICC) and mean differences for measurements of the maxilla.

Inter-observer
Intra-observer With regard to the left and right ramus, the autorotation of the left ramus and the flare of the right ramus were reported to be most user dependent (with maximal errors of 0.673 degrees and 0.622 degrees). Also, the reported inter-and intra-observer ICCs were all above the 0.94.

Discussion
The OGA 2.0 presented in the current study is a successor of the OGA presented in an earlier study [11]. Drawbacks of the previous OGA version were the absence of the possibility to assess the postoperative accuracy of the osseous chin, the dependence on a specific virtual planning software and the need for SBM for accurate matching of the rami. In the newly presented tool, the postoperative accuracy of the rami is assessed using VBM instead of SBM. Next to that, the postoperative accuracy of the osseous chin can be assessed and the OGA 2.0 is no longer dependent on any planning software and can be used as a stand-alone program. The OrthoGnathicAnalyser 2.0 is developed to objectively quantify the movements of the individual segments of orthognathic surgery.

Validation results
The results of this multicenter validation study demonstrated a good reproducibility of the calculated results, with a maximum translational error of 0.26 mm and rotational error of 0.67 Table 4

PLOS ONE
Validation of the OrthoGnathicAnalyser 2.0 degrees, and corresponding high ICCs (>0.92). The current results of the maxilla were comparable to the results described in literature, with an inter-observer and intra-observer ICC of >0.97 and >0.98, for translation and rotation respectively [11,17]. Stokbro and Thygesen used VBM for measuring the movements of the maxilla and found high ICC values similar to this current study [18]. The translational and rotational results of the mandible showed excellent reproducibility (ICC>0.99 and ICC>0.95 respectively) and were also comparable to previous results [11]. For the different centers, only the inter-observer difference of the yaw of the maxilla (p = 0.047) and the intra-observer difference of the autorotation of the right ramus (p = 0.046) were statistically significant. These differences where however below 0.7 degrees and were therefore considered clinically insignificant. The OGA 2.0 is a robust tool as minimal differences between the centers, and thus different manufacturers of scanners, were reported. For the matching of the chin, preliminary tests were executed to evaluate which registration technique would perform best. During these tests it was observed that voxel-based matching resulted in less accurate alignment in the sagittal plane due to a deviation in the pitch. It was hypothesized that the result of the voxel-based matching algorithm was affected by the combination of the relatively small volume of the chin and the high-density fixation material. For this reason, it was chosen to implement SBM instead of VBM. This has resulted in a reproducible evaluation of the deviations of the chin segment, with low intra-observer and inter- Table 5

PLOS ONE
Validation of the OrthoGnathicAnalyser 2.0 observer differences (below 0.25 mm or 0.7 degrees). As these results for the analysis of the osseous chin are clinically acceptable it is worth noting that the inter-observer difference for the chin is systematically higher than the maxilla, mandible and rami. Underlying reason for this higher inter-observer difference could be the use of SBM, which required more input of the user.

Advantages current method
In our previous study [11], the matching of the left and right ramus was performed with SBM to counteract the image artifacts as a result of the sagittal split osteotomy. This technique has resulted in observer differences of more than one degree. Because of the reported difference and the described user dependency in literature [6], the matching technique of the rami was changed to voxel-based matching in OGA 2.0 as there was an updated version of the voxelbased algorithm available. Without correcting the aforementioned image artifacts, the reported maximum error was almost halved to 0.6 degrees. Using VBM instead of SBM is more reproducible which is in line with the findings of Almukhtar et al. [6]. It is also more time-efficient as the input from the user is minimized as the user only selects a ROI instead of manually coloring the surface on which the registration should be performed. In the previously validated OGA, three landmarks for each jaw segment were required to construct a virtual triangle to allow for the calculation of the clinically relevant translational and rotational movements. Multiple landmark identification has been eliminated by voxelbased registration of the jaw segments. In the new version of OGA, a total of four landmarks needs to be identified instead of the twelve (three for each segment) in the previous OGA version. Identification of only these four landmarks still enables the computation of the required calculations. As a consequence, the workflow becomes more efficient and further eliminates the inaccuracies as a result of multiple landmark identification [19].
The analysis of the chin segment is an important addition in the OGA 2.0. With an easy and reproducible chin segment analysis, studies towards the accuracy or relapse of genioplasty will be more accessible. Furthermore, the added value of using sawing and drilling guides in genioplasty can be objectified.

Study limitations
The error caused by identification of the landmarks ranged from 0.02 to 2.47 mm [19][20][21]. Ideally, the manual identification step would be completely eliminated in the software. A promising development is the automatic 3D landmarking using artificial intelligence. Some recent studies have reported errors below 2 mm [22,23], making automatic 3D landmarking a potential alternative. However, as the landmarks are not used for matching but only function as rotation points, the identification of the landmarks has become of little concern. The high ICCs and low intra-and inter-observer variations support this statement.
The results indicated that the pitch of the chin was still relatively more user dependent than the other variables. It should be explored whether the voxel-based matching method could be adapted to facilitate selection of greyscale values (i.e. selection of the upper threshold) or reorientation of the ROI box to enable exclusion of the high-density fixation material.
For the assessment of the accuracy of the mandible it is important that the postoperative (CB)CT scan was acquired with a correct postoperative occlusion, with relaxed mandibular muscles. For this retrospective study, some scans were acquired in a suboptimal occlusion, which led to an overestimation of the discrepancy in the planned and postoperative outcome. Since the main goal of this study was to validate the novel software, it was chosen not to analyze the surgical outcomes and focus on the validation of the software. For any clinical study, it is imperative to provide proper instruction to the patient before the postoperative scan in order to be able to accurately assess the surgical outcome of the mandible.

Conclusion
In conclusion, the reported results of this study demonstrated an excellent reproducibility (ICC of >0.92) of the quantification of the skeletal movements between two (CB)CT sets by the OrthoGnathicAnalyser 2.0. By implementing the chin analysis in this software tool, all surgical bony segments can now be objectively evaluated and compared to the preoperative virtual plan. The OrthoGnathicAnalyser 2.0 allows an increased number of evaluations of orthognathic procedures.