Performance evaluation of survival regression models in analysing Swedish dental implant complication data with frailty

The use of inappropriate methods for estimating the effects of covariates in survival data with frailty leads to erroneous conclusions in medical research. This study evaluated the performance of 13 survival regression models in assessing the factors associated with the timing of complications in implant-supported dental restorations in a Swedish cohort. Data were obtained from randomly selected cohort (n = 596) of Swedish patients provided with dental restorations supported in 2003. Patients were evaluated over 9 years of implant loss, peri-implantitis or technical complications. Best Model was identified using goodness, AIC and BIC. The loglikelihood, the AIC and BIC were consistently lower in flexible parametric model with frailty (df = 2) than other models. Adjusted hazard of implant complications was 45% (adjusted Hazard Ratio (aHR) = 1.449; 95% Confidence Interval (CI): 1.153–1.821, p = 0.001) higher among patients with periodontitis. While controlling for other variables, the hazard of implant complications was about 5 times (aHR = 4.641; 95% CI: 2.911–7.401, p<0.001) and 2 times (aHR = 2.338; 95% CI: 1.553–3.519, p<0.001) higher among patients with full- and partial-jaw restorations than those with single crowns. Flexible parametric survival model with frailty are the most suitable for modelling implant complications among the studied patients.


Introduction
Survival regression methods are commonly used to explore heterogeneity among subjects in medical research [1] and to estimate prognostic factors for survival [2][3][4][5][6]. However, one of the major challenges in survival analysis modelling is clustering among followed subjects, otherwise known as frailty [7,8]. The concept of frailty is an issue of discourse in statistical modelling, including survival analysis. Frailty is a group-specific latent random effect that multiplies into the hazard function. The frailties are unobservable positive quantities. They follow a gamma distribution with a mean of 1 and variance to be estimated from the data. Theoretically, any distribution with positive support (mean = 1) and finite variance may be used to model frailty. In most cases, shared-frailty models are used to model the within-group correlation. Observations within a particular group are often referred to as correlated because they share the same frailty. Let us consider the case of patients attending a dental clinic. Any of the patients may have issues with any of their teeth after a particular intervention. The teeth of a single patient form a cluster. While it is reasonable to assume independence of the dental patients, it is incorrect to assume that the occurrence of infection to any of the teeth within each patient is independent. It is necessary therefore to accommodate the potential "dependency" by assuming that it was the result of a latent patient-level effect or frailty. Non-consideration of clustering in clustered data causes poor model fit and biased estimates. This suggests that a mixed-effects model that contains both the random and fixed effects would be most appropriate to model such outcomes. The alternative, traditional survival regression models, divided into parametric (Poisson, Weibull), semiparametric (Cox), and nonparametric (Kaplan-Meier) have distinct disadvantages that could make them unsuitable to correctly predict survival outcomes [1,9]. For instance, the Kaplan-Meier model does not accommodate covariates, hence its utilization is limited [1,5]. Although the Cox proportional hazard (CPH) model is the most commonly used model in survival analyses [1,10,11] and has been used extensively in the literature [4,[11][12][13], its efficiency is limited for short observation periods [1]. Further, its distribution-free assumption is often violated in long-term studies. In either case, many of the subjects may not have experienced the event of interest and, thus, survival and cumulative hazard functions are incomplete and cannot be extrapolated in the CPH [1]. The CPH models assume a constant hazard, an assumption that is also frequently violated [14,15]. The Cox model has an advantage in that it does not assume the form of the baseline hazard function, therefore, not hindering the prediction of hazards and other related functions for a given set of covariates but this advantage gave birth to its major disadvantage [14]. Moreover, survival and cumulative hazard functions of the CPH model are step functions and, thus, limit the possibility of having smooth functions [10,16].
Parametric models, such as the exponential and Weibull models [1], attempted to overcome some of the shortcomings of the CPH model by producing smooth predictions by assuming a functional form of the hazard [1,17] and directly estimating the absolute and relative effects [14]. The models can be used to estimate the smooth cumulative hazard functions and hazard ratios of risk factors and extrapolate survival and cumulative functions [1]. Nonetheless, the models assume that the survival and hazard functions have a specific distribution which is often too structured and sometimes unrealistic for use with real data [9,10]. In addition, parametric models with complex underlying hazard fail to capture true effects [18,19]. Thus, in most cases, parametric models have insufficient flexibility and, thereby, produce biased cumulative hazard and survival functions [10].
While the disadvantages of non-parametric models can be overcome by the use of stratification, the number of factors used for such stratification may be limited [1]. Another way of alleviating the challenges of the CPH is to use a sufficiently large sample size and extensive study duration [20]. Also, parametric survival models may be useful if available data do not violate the underlying assumptions of the distributions. Despite these mitigations, none of the nonparametric, semi-parametric or parametric models is flexible enough to accommodate structural composition of all real-life data.
Royston and Parmar (RP) therefore developed flexible parametric survival regression (FPSR) models as a result of lack of adequate flexibility of the Cox and parametric survival models [9,10,21]. The FPSR model offers a compromise between the CPH and parametric models and retains the desired features of both types of models. The flexible parametric approach works by relaxing the assumption of linearity of log time by using restricted cubic splines [10,22]. The overall advantages of the FPSR models have been reported in the literature [1,9,14,18,19,23,24]. CPH, parametric and the FPSR models have incorporated frailty options.
Different factors could be associated with complications affecting dental restorations supported by implants but it is not known how non-consideration of the clustering nature of the implants (multiple implants in one subject) affect outcomes of the modelling approach. The need for appropriate statistical models for accurate medical inferences and decisions motivated this study. It is designed to evaluate and compare the application of CPH, parametric and FPSR models to complications affecting dental implants. Implants are clustered within patients and intra-cluster dependency may occur. We hypothesized that models with frailty, and the flexible parametric model with frailty, in particular, would perform better than all other range of models. We aimed to apply different survival analysis regression models to a dataset originating from Sweden [25,26] and assess the performance of the models to identify the model with the best data fit. We considered the (i) Cox proportional hazard models for frailty, (ii) Multilevel mixed-effects parametric survival models for proportional hazard and accelerated failure times and (iii) Flexible parametric survival regression models with frailty generally referred to as the Royston-Parmar (RP) models and their equivalents without frailty.

Cox proportional hazard models with frailty
The Cox proportional hazard (PH) model with frailty is an extension of the Cox PH model developed in 1972 which assumed that hazards are multiplicatively proportional to baseline hazards [5] as shown in Eq (1).
The above equation provides estimates of β 1 , β 2 ,. . .,β k , and its variance-covariance matrix but provides no direct estimate of the baseline hazard (h 0 (t)). However, the model provides an avenue to estimate the baseline cumulative hazard (H 0 (t)) and baseline survival (S 0 (t)) which can be used to estimate the h 0 (t) [10].
where group-level frailty is estimated by α i . The frailties are unobservable positive quantities and are assumed to have a mean of 1 and a variance θ. Shared-frailty models are used to model within-group correlation; observations within a group are correlated because they share the same frailty. The degree of within-group correlation can be measured by an estimate of "θ", where θ is 0 in cases where there is no frailty. By letting v i = log α i , the hazard is as shown in Eq (3) which makes the log frailties v i , to be analogous to random effects obtainable in the corresponding standard linear models. Numerically, let x i be the row vector of covariates for the time interval (t 0i ; t i ) for the i th observation in a dataset with N subjects (i = 1,. . .. . .. . ..,N). The estimates of the coefficient (β i ) of the covariates (X i ) can be estimated by maximizing the partial log-likelihood function in Eq (4) Where j represents the index of the ordered failure times t (j) , j = 1;. . .. . .. . .; D; such that D j is the set of d j observations that fail at t (j) ; d j is the number of failures at t (j) ; and R j is the set of observations k that are at risk at time t (j) , that is, for all k such that t (0k) <t (j) �t (k) [27][28][29].
The data for Cox shared-frailty models are usually organized into G groups with the i th group consisting of n 1 observations, i = 1,. . .. . .. . .,G. The estimation of θ is done via maximum profile log-likelihood. For fixed θ, estimates of β and v 1 ,. . .. . ..,v G are obtained by maximizing the parameters in Eq (5).
where D i is the number of death events in group i and logL Cox is the standard Cox partial log-likelihood, with the v i as the vector of the variables' coefficients indicator which identifies the groups. The j th observation in the i th group has log relative hazard βx ij +ν i .
The values that maximize logLðŷÞ are the final estimates of β in ν i [30].

Mixed-effects parametric survival models
The mixed-effects parametric survival models otherwise called the multilevel parametric survival models are well known [31]. These models contain both fixed and random effects. The accelerated failure-time (AFT) model and the multiplicative or proportional hazards (PH) model are the most-used models for adjusting survivor functions for the effects of covariates. In the AFT model, log t is expressed as a linear function of the covariates, when random-effects is incorporated, the function yields the function in Eq 6.
for j = 1,. . .. . .. . .. . .,M, clusters, with cluster j consisting of i = 1,. . .. . .. . .. . .,n j observations. The 1 X p row vector X ji contains the covariates for the fixed effects, with regression coefficients (fixed effects) β. The z ji has 1 x q dimension and contains the covariates corresponding to the random effects. Also, v ji are the observation-level errors with density φ(.). In the PH models, the model contains the covariates which have a multiplicative effect on the hazard function in Eq (7).
where h 0 (t), the baseline hazard function, is assumed to be parametric. Both the exponential and Weibull models can be implemented using the AFT and PH parameterizations, but the gamma and log-logistics and log-normal can only be implemented in AFT and implemented with "mestreg" in Stata.

Flexible parametric survival regression models
The FPSR model is based on a series of models that are modifications of several standard survival models [21] but has additional flexibility [21,23]. These models use restricted cubic splines to model a transformation of the survival function. The Weibull model is one of the most common parametric models and is an approximation of the PH model. It has been criticized for inflexibility in the shape of the baseline hazard function, which either increases or decreases monotonically. Weibull survival function is S(t) = exp(−λtγ), and the corresponding log cumulative hazard scale is ln {H(t)} = ln[−ln{S(t)}] = ln[−ln{exp (−λtγ)}] = ln(λ)+γ ln(t), after transformation. Addition of covariates to the model produces ln {H(t|x i )} = ln(λ)+γ ln(t) +x i β.
Splines are flexible mathematical functions defined by piecewise polynomials with some constraints to ensure that the overall curve is smooth, are used. The polynomials join one another at points called knots. The fitted function is forced to have continuous 0 th , 1 st and 2 nd derivatives. The most common splines used are cubic splines. Restricted cubic splines with k knots can be fitted by creating k-1 derived variables.
The choice of position of knots determines the complexity of the flexible models. Usually, k knots, maximum at 9 knots, has k+1 degrees of freedom (df). The position of the knots (internal) is usually in centiles computed as 100/df. So, for 3 knots, the df is 4 and the knots will be located at centiles 25, 50 and 75. The internal knots are bounded by "boundary knots" which are placed at the minimum and maximum of the distribution of uncensored survival times. The FPSR models become the Weibull model if the number of knots is 0, while γ 0 and γ 1 are equal to the scale parameter and shape parameter respectively. Royston et al. suggest using 1 or 2 knots for smaller (<10,000) datasets and 4 or 5 for larger (> = 10,000) datasets [10]. The FSPR models are implemented in Stata using "stpm2" with an option for frailty.
In the flexible parametric model, the contribution to the log-likelihood for the i th individual on the log cumulative hazard scale can be written as shown in Eq 9.
where d i is the event indicator. The likelihood can be maximized by defining an additional equation for the derivatives of the spline function and constrain the parameters to be equal to the equivalent spline functions in the main linear prediction [22,32].

Model selection criteria
Log-likelihood, Akaike information criteria (AIC) [33] and the Bayesian information criteria (BIC) [34] were used for model selection. Lower values of AIC and BIC indicated a better model fit. AIC and BIC are usually computed and compared separately among different models to determine the best fitting model. However, confusion may arise if the best fitting model according to the AIC is different from that identified by the BIC [14]. Literature suggests that AIC will choose a more complex model irrespective of sample size while BIC is more likely to choose a simpler model [14]. AIC is often preferable in situations when a false negative finding would be considered to be more misleading than a false positive, and BIC is superior in situations where a false positive is as misleading as, or more misleading than, a false negative. AIC is best for prediction as it is asymptotically equivalent to cross-validation. BIC is best for an explanation as it allows consistent estimation of the underlying data generating process [14,35].

The data
The dataset used for this study originates from a project evaluating the effectiveness of dental implant therapy in Sweden. A cohort of 596 randomly selected adults, provided with implantsupported dental restorations in 2003, were followed over 9 years. The extent of dental treatment varied from the replacement of single teeth to the restoration of full jaws. The average number of implants per patient was 4.0 ±2.8 (range [1][2][3][4][5][6][7][8][9][10][11][12]. Complications related to the restorations/implants were scored using the patient as the unit of analysis and timing was recorded in days relative to the time point of implant insertion. The complications included: Loss of a dental implant, development of peri-implantitis and/or occurrence of a technical complication. For details regarding case definitions of the different categories of complications, the reader is referred to Derks et al. [25,26,36]. The occurrence of any of the complications referred to above was considered as an event in the present analyses. There were a total of 1,038 events during the observation period with 469 complications in single-record/single-failure data.

Operational definitions
Median survival time: This is a statistic that refers to how long patients "survive" in general after dental restorative therapy including the use of implants. The incidence rate is a measure of the frequency with which dental implant complications occurred per day.

Incidence Rate ¼
Number of new cases of disease during specified period Time each person was observed; totalled for all persons

Ethics approval and patient consent
The research protocol was approved by the regional Ethical Committee, Gothenburg, Sweden (Dnr 290-10), registered at ClinicalTrials.gov (NCT01825772) and study participants signed an informed consent form prior to inclusion.

Results
One of the 596 subjects was excluded from analysis due to missing data. The mean age (in 2003) of the 595 included participants was 62.3 (SD = 9.3) years, with 42% aged 60-69 years, 24% aged 70-79 years and 55% were females. Roughly 60% of patients presented without signs of periodontitis at the 9-year examination, 24% had periodontitis and 16% were edentulous (no teeth). A total of 28% had full-jaw restorations, 48% had partial-jaw restorations and 24% had single crowns, only. Regarding dental products, 31% received Straumann implants (Type A), 20% had Astra Tech implants (Type B), 40% had Nobel Biocare implants (Type C). The remaining 9% of subjects were treated with various other types of implants categorized as Type D.
The overall incidence rate of implant complications was 0.000241 per day. It was higher among those without remaining natural teeth (0.000387) and those who had full-jaw restorations (0.000439). The median survival time (when 50% of implant-carrying subjects would have "failed") to implant complications was 2476 days, while the 25% survivorship was 820 days. The medium "survival" time was highest among those who had partial-jaw restorations (3347 days), females (3044 days), treated within the public dental service (3044 days), treated with Type A (3347 days) or other dental products (Type D) (3227 days respectively) as shown in Table 1.
The distribution of the incidence rate of implant complications by time is shown in Fig 1. The incidence rate reduced with increasing time of follow-up.

The hazard and survival functions of dental complications under different distributions
We assessed the hazard function under different models for periodontal status (one of the main prognostic factors) in Fig 2. In Fig 2, the chart in panel (a) is from the Weibull mixed effects parametric proportional hazard (b) Loglogistic mixed effects parametric proportional hazard (c) Cox proportional hazard and (d) Cox smoothed proportional hazard. The hazard for "periodontitis" was consistently higher than the hazard for "healthy" and "no teeth". The hazard for "healthy" and "no teeth" was similar.
In Fig 3, we present the survival function under different models for periodontal status. The chart in panel (a) is from the Weibull mixed effects parametric proportional hazard (b) Loglogistic mixed effects parametric proportional hazard (c) Cox proportional hazard and (d) Cox smoothed proportional hazard for periodontal status. The survival for "periodontitis" was consistently lower than the survival for "healthy" and "no teeth". The survival for "healthy" and "no teeth" was similar.

Test of assumption of proportionality
The test of violations of assumptions of the proportional hazard showed that the test was not violated (X 2 = 5.50, df = 4, p = 0.240)

Comparison of the survival and hazard functions of the flexible model at different degrees of freedom
We compared the performance of the survival and hazard functions of the flexible model at various degrees of freedom (1, 2, 3 and 6) for the periodontal status of the patients. A degree of freedom of 1 is an equivalent of the Weibull distribution. The hazard functions of the Weibull distribution were different from the hazard functions at the other degrees of freedom. However, the function at 6 degrees of freedom was different and more flexible than at 2 and 3 degrees of freedom (Fig 4A). The survival functions at 2, 3 and 6 degrees of freedom were, however, similar but distinct at 1 degree of freedom (Fig 4B).

Selection of the best model
Loglikelihood, AIC BIC for all the models considered, with and without frailty, are presented in Table 2. All three parameters were consistently lower among the flexible frailty models at different degrees of freedom than the Cox PH frailty, parametric frailty models ( Table 2). We observed that the AIC and BIC of the parametric models without frailty were consistently lower than those with frailty. Among the FPSR models at different degrees of freedom, the lowest loglikelihood was at df = 6, the lowest AIC was at df = 4, while the lowest BIC was at df = 2. However, for df >1, differences between the lowest and highest loglikelihood, between the lowest and highest AIC and between the lowest and highest BIC were 4.4 (0.45%), 1.47 (0.04%) and 18.7 (0.89%) respectively. According to the AICs, all the FPSR models at df>2 were similar. Hence we chose the simplest of all the FPSR models at df = 2. Our decision was further supported by the significance of the spline variables for the log baseline cumulative hazard (_rcsi) otherwise called the slope of the hazard curve within each of the knots generated by the degrees of freedom. At df = 2, the two slopes were statistically significant: _rcs1 = 2.26 (p<0.001) and _rcs2 = 1.150 (p<0.001). At df = 3, the first two slopes were statistically significant: _rcs1 = 2.25 (p<0.001) and _rcs2 = 1.134 (p<0.001) but the last slope was not significant (_rcs3 = 1.02 (p = 0.107). The slopes at df>2 had similar patterns with the slopes at df = 3.

Modelling the risk factors of implant complications
We fitted an FPSR model at 2 degrees of freedom to the data and identified the adjusted determinants of implant complications among the patients.

Discussion
This study was designed to apply and compare the performance of semi-parametric, parametric and flexible parametric survival regression models to a dataset on dental implant-related complications with or without frailty. This analytical study showed that models with frailty performed better than those without frailty except among the parametric models where the reverse was the case. This could be ascribed to inconsistencies and inflexibilities of parametric models (Royston and Lambert, 2011). Nonetheless, the AIC and the BIC of the flexible models were lower than those computed from the other models irrespective of whether the clustering nature of the implant data was considered or not. Therefore, the flexible parametric survival regression model was the best of the three main models considered in this study. Our finding is consistent with findings in earlier studies [10,15,23]. All measures of model fit and model selection adopted in our study were consistently better in the flexible parametric survival regression models than in the other models. Loglikelihood, AIC and BIC were lower in the flexible parametric survival regression models than the Cox PH model and the parametric models. Similar findings have been reported in the literature [15,23,37]. The flexible models had a unique advantage by separating the hazard function into segments (splines) based on the specified degrees of freedom and computing the hazard within each spline [9,10,18,21].
We used AIC and BIC to select the ultimate degrees of freedom to use for the flexible parametric survival regression model. AIC and BIC are measures of the amount of information lost in the models [33,34]. The lower these values, the better the models. In this study, BIC   was lowest at 2 degrees of freedom while AIC was lowest at 4 degrees of freedom. This discrepancy has previously been reported [35] and may be due to how the two information measures compute "complexities". The problem of defining "N" (the number of observations) is not related to AIC because N is not used in computing AIC, which rather uses a constant 2 to weight complexity as measured by k (number of parameters estimated), rather than ln(N) in BIC. According to Stone at al. [38], AIC approximately minimizes the prediction error and is asymptotically equivalent to leave-1-out cross-validation (LOOCV) while BIC is equivalent to leave-k-out cross-validation (LKOCV) [39] and it is not consistent with the amount of data available. However, BIC has the advantage of being consistent. With a very large amount of data, and if the true model is among the candidate models, the probability of selecting the true model based on the BIC criterion would approach 1. This, however, may slightly affect prediction performance. We chose the flexible parametric survival regression model at 2 degrees of freedom as suggested by the BIC because the slopes of the curves within each spline were insignificant after 2 degrees of freedom, the differences in parameter estimates at 2, 3 and 4 degrees of freedom were negligible. Also, the AIC at 2, 3 and 4 degrees of freedom changed by 0.07%, which was considered negligible.
Our finding that the FPSR method with frailty fitted the data used in this study is further corroborated by the behaviour of the hazard and survival functions shown in Figs 2-4. However, there could be challenges of over-parametrization in the flexible model due to its adaptability and incorporation of up to ten knots. Also, the Cox model makes minimal assumptions about the form of the baseline hazard function and may have hindered the prediction of hazards and other related functions for a given set of covariates. It also results in unsmooth estimated curves and lack of information about what occurs between the observed failure times. Parametric models, on the other hand, produce smooth predictions by assuming a functional form of the hazard. Its assumed form is too structured for use with real data (Royston and Lambert, 2011). Therefore, the non-proportional hazards can be modelled using restricted cubic splines in FPSR models [14] and thereby produce a better fit. The fitted flexible parametric survival regression model at 2 degrees of freedom showed that the hazard of implant complications was higher among male patients, patients with periodontitis, among patients with either full-or partial-jaw restorations and among patients that were provided with dental product Type B. It is plausible that more extensive restorations are at higher risk for complications through the simple fact that more implants and surfaces are exposed to potential events. More extensive restorations, however, may also serve as a surrogate parameter for the individual's susceptibility to developing tooth-or implant-related problems. This may be illustrated by the fact that subjects presenting with periodontitis at remaining teeth were at higher risk for implant-related complications. This relationship is most likely explained by the strong association between periodontitis and peri-implantitis  [40]. Peri-implantitis was one of the complications recorded in the present study. The background to the other factors identified in the model (sex and dental product) are not understood. It may be speculated that biting force and/or behaviour in terms of oral health may have had an impact. The covariates included in the flexible model have shown that there is a wide range of factors that contribute to complications affecting dental implants. Their inclusion has influenced the performance of the models as they demonstrated reality. For instance, the risk of implant complications was generally higher among patients with periodontitis than those that were periodontally healthy. No difference, however, was noted between periodontally healthy and edentulous patients. Similar assertions have been made in earlier studies [41,42].

Conclusion
Flexible parametric survival model represents the best approach for estimating the hazard of clustered implant complications including (i) implant loss, (ii) peri-implantitis and (iii) technical complications. The study underscores the need to explore the multilevel (clustering) nature of datasets to be analysed. Non-consideration of the clustering nature of data is potentially misleading. The hazard of complications was higher among male patients, patients with periodontitis, patients with more extensive restorations and was dental product specific.