Direct Contra Naïve-Indirect Comparison of Clinical Failure Rates between High-Viscosity GIC and Conventional Amalgam Restorations: An Empirical Study

Background Naïve-indirect comparisons are comparisons between competing clinical interventions’ evidence from separate (uncontrolled) trials. Direct comparisons are comparisons within randomised control trials (RCTs). The objective of this empirical study is to test the null-hypothesis that trends and performance differences inferred from naïve-indirect comparisons and from direct comparisons/RCTs regarding the failure rates of amalgam and direct high-viscosity glass-ionomer cement (HVGIC) restorations in permanent posterior teeth have similar direction and magnitude. Methods A total of 896 citations were identified through systematic literature search. From these, ten and two uncontrolled clinical longitudinal studies for HVGIC and amalgam, respectively, were included for naïve-indirect comparison and could be matched with three out twenty RCTs. Summary effects sizes were computed as Odds ratios (OR; 95% Confidence intervals) and compared with those from RCTs. Trend directions were inferred from 95% Confidence interval overlaps and direction of point estimates; magnitudes of performance differences were inferred from the median point estimates (OR) with 25% and 75% percentile range, for both types of comparison. Mann-Whitney U test was applied to test for statistically significant differences between point estimates of both comparison types. Results Trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled clinical longitudinal studies and from direct comparisons based on RCT evidence are not the same. The distributions of the point estimates differed significantly for both comparison types (Mann–Whitney U  =  25, nindirect  =  26; ndirect  =  8; p  =  0.0013, two-tailed). Conclusion The null-hypothesis was rejected. Trends and performance differences inferred from either comparison between HVGIC and amalgam restorations failure rates in permanent posterior teeth are not the same. It is recommended that clinical practice guidance regarding HVGICs should rest on direct comparisons via RCTs and not on naïve-indirect comparisons based on uncontrolled longitudinal studies in order to avoid inflation of effect estimates.


Introduction
The term 'high-viscosity' or 'high-viscous glass-ionomer cement' (HVGIC) has emerged within the scientific dental literature: A simple search conducted in PubMed/Medline (25.09.2012) with the string of search terms: "high-viscosity glass ionomer cement" OR "high-viscous glass ionomer cement" revealed 16 citations of articles, published between 2003 -2011, of which five articles referred to the term in their titles and all articles in their listed abstracts and related it specifically to the products Fuji IX (GC Corporation, Japan) or Ketac Molar (3M ESPE, Germany).
HVGICs appear distinct from other (low) viscosity GICs (including Cermets) in their comparative survival rate to that of conventional amalgam restorations. The results of a meta-analysis found a survival rate for HVGIC (Fuji IX; Ketac Molar) similar to that of amalgam but showed significantly lower survival rates for ''low-viscosity'' GICs (Chelon Silver ( = Cermet); Chem Fil; Fuji II) than for amalgam [1].
Glass ionomer cements, such as HVGICs, adhere primarily via calcium bonds to the mineral content of the tooth structure [2]. This adherence provides an adaptive seal, and, as the material slowly leaches fluoride ions into the adjacent tooth tissue, these materials are capable of halting or slowing the progression of carious lesions [3]. Glass-ionomer cements are ideally suited to managing dental caries as they can be applied in the very early stages of caries development or in the larger cavity. Additionally, they simplify the tooth restorative procedure and enable the dentine-pulp complex to react against the caries process [4].
Amalgam has been used successfully as an universal posterior restorative material for over a century [5]. Its operative advantages of being relatively simple to place, its intrinsic strength and the longevity of the final restoration has led to amalgam's being considered the ''gold standard'' against which all newer materials, such as HVGICs, are measured for outcomes; such as the effectiveness and durability of the restoration.
In line with the low-/high-viscosity distinction of conventional (chemically cured) glass-ionomer cements (GICs) based on such pure clinical grounds, definition of HVGICs according to laboratory/material characteristics such as powder/liquid ratio or compressive strength may prove to be difficult i.e.: the powder/ liquid ratio for Ketac Molar and Fuji IX has been reported to be 2.9/1 [6] and 3.6/1 [6,7], respectively but appears to be not generally higher than that reported for Chelon Silver; Chem Fil and Fuji II (3.8/1 [8]; 3.7/1 [9] and 2.7/1 [7], respectively). While the measured compressive strength of HVGIC may be above 200 Mpa [10] after 24 hours, and that of low-viscosity GIC below 200 Mpa [8,9], a further laboratory study reported the compressive strength of Fuji IX to be 147.93 Mps (SD = 18.02) after 24 hours [11]. Such conflicting and inconclusive in-vitro evidence may be attributed to heterogeneous methodologies employed in different laboratory studies and thus have to be regarded with caution. In addition, caution in extrapolating in-vitro results to clinical practice is warranted on the basis that in-vitro/laboratory evidence appears to correlate poorly with the clinical merits of dental materials [12,13].
Against this background the distinction between low and highviscosity conventional GICs, on a clinical rather than chemical basis, seems to empirically support justification and recommendation of HVGIC as an appropriate restorative treatment option in permanent posterior teeth [1]. However, such consideration may currently not be shared by many dental associations in developed countries and may even contravene standing recommendations. In Germany, for example, the joint statement issued in 2005 by two dental associations, i.e. Deutsche Gesellschaft für Zahnerhaltung (DGZ) and Deutsche Gesellschaft für Zahn-Mund-und Kieferheilkunde (DGZMK), states that HVGICs are due to their high fracture and wear risk not suitable for use in permanent posterior tooth restoration [14].
A detailed analysis of the DGZ/DGZMK statement (File S1) reveals that its recommendations regarding HVGIC are based on the findings of one comprehensive, non-systematic literature review by Manhart et al., 2004 [15]. Although the difficulty of comparison of clinical material characteristics from uncontrolled clinical longitudinal studies is asserted in this review, the authors, however, maintain that certain trends and performance differences between for example amalgam and glass-ionomer cement restorations may be inferred from these types of studies [15]. Consequently, the review bases its content, conclusions and recommendations on restoration survival and failure rates mainly extracted from cross-sectional and uncontrolled clinical longitudinal studies and lists these results separately for amalgam-, direct composite-, compomer-, GIC-, gold, composite and ceramic inlay/onlay restorations in posterior teeth in tables for naïveindirect comparisons [15]. (According to commonly accepted terminology 'naïve-indirect comparison' is defined as 'comparison of competing clinical interventions from data of individual arms of different studies, based on the assumption that the treatment groups are clinically homogeneous in composition'. In contrast, direct comparisons are comparisons between randomised intervention groups within RCT settings [16]).
Against this background, the aim of this empirical study is to investigate whether trends and performance differences between conventional amalgam and direct HVGIC restorations in posterior teeth can be inferred through naïve-indirect comparison of failure rates from uncontrolled longitudinal clinical studies. The null-hypothesis is tested that trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled longitudinal clinical studies and from direct comparisons based on RCT evidence have similar direction and magnitude. Titles and abstracts of the resulting citations were scanned for possible inclusion in line with the following inclusion criteria:

Search of uncontrolled clinical longitudinal studies
Prospective clinical one-arm study (uncontrolled longitudinal study investigating either direct HVGIC or conventional amalgam restorations) or quasi-one-arm study (two-arm study that did not compare HVGIC with amalgam restorations, but included either HVGIC or amalgam as one of the study arms); (ii) Minimum 12-month follow-up period; (iii) Investigated cavity type Class I or II in permanent posterior teeth (Tunnel restorations not included); (iv) Publication language: English; (v) Study outcome: restoration failure.
Articles whose title and abstracts were in alignment with the inclusion criteria were retrieved in full copy and were reviewed by both authors of this article. Disagreements were resolved through discussion and consensus. Articles were excluded if no computable data were reported or if they did not match the characteristics of the control data. The search of PubMed/Medline generated 214 citations for HVGIC and 682 for amalgam restorations. Of these, 12 and five citations fulfilled the inclusion criteria, respectively, and were further reviewed. One article related to HVGIC [17] could not be traced in full as the journal appeared to be suspended and the article was thus excluded. In total, 11 articles related to HVGIC [18][19][20][21][22][23][24][25][26][27][28] and 5 for amalgam [29][30][31][32][33] were provisionally accepted ( Table 1).

Selection of uncontrolled clinical longitudinal studies
The included HVGIC and amalgam longitudinal studies were matched with each other, as well as with available RCTs [34] according to investigated cavity type and follow-up period ( Table  2). No full-match was found for one HVGIC study [26] and three amalgam studies [30][31][32] due to different length of follow-up period per cavity type. These studies were thus excluded. journal names and trial results. The extracted data included: number of restorations failures (n) and number of evaluated restorations (N) at the end of each follow-up period, per type of restorative treatment (HVGIC or amalgam) and cavity type (Class I or II). The n/N-data from each HVGIC study was statistically compared to that of each amalgam study and Odds ratios (OR)  with 95% Confidence intervals (CIs) were computed using statistical software RevMan 4.1.2. The thus extracted and computed data was considered as the 'test-data' in this study. The 'control data' was in turn extracted from a systematic review of 20 randomised control trials (RCTs) by the authors [34] that appraised the current clinical evidence regarding to the question as to whether, in patients with carious cavities, direct HVGIC restorations placed according to the atraumatic restorative treatment approach have a higher failure rate than conventional amalgam restorations (File S2). For the purpose of this study, only those RCTs were selected from the systematic review report that matched the uncontrolled clinical longitudinal studies according to investigated cavity type and follow-up period ( Table 2).
The extracted data comprised of single dichotomous datasets per RCT, consisting of number of restorations failures (n) and number of evaluated restorations (N) for each cavity type at the end of each follow-up period.
The intention was to pool datasets of the same cavity type and follow-up period using random-effects meta-analysis (RevMan 4.1.2), if possible. The test results from uncontrolled clinical longitudinal studies were plotted together with the control results
Direct HVGIC/amalgam comparison of RCT data. Eight (n/N -n/N) datasets from three RCTs [35][36][37] relevant to Class I and II restorations in posterior permanent teeth after 12, 24 and 36 months were extracted from Table 10 of the systematic review [34] (File S2).
The computed Odds ratios (95% CI) were plotted for Class I and II restorations and are shown in Figure 2 and 3, respectively. Meta-analysis of longitudinal study results was not conducted, as only one n/N dataset was available from amalgam studies per type of cavity and follow-up period against which all n/N datasets from HVGIC studies were set. Figures 2 and 3 show that the width of the 95% confidence intervals (CI) differs largely between the two types of comparisons, which may be ascribed to the generally larger sample size in direct comparisons (RCTs). However, from the confidence intervals and point estimates (OR) the following could be observed: The distributions of the point estimates differed significantly for both comparison types (Mann-Whitney U = 25, n indirect = 26; n direct = 8; p = 0.0013, two-tailed).These results indicate that trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled clinical longitudinal studies and from direct comparisons based on RCT evidence do not have the same direction and magnitude. The nullhypothesis was therefore rejected.

Limitations of study method
The aim of this empirical study was to investigate whether trends and performance differences between conventional amalgam and direct HVGIC restorations in posterior teeth can be inferred through naïve-indirect comparison of failure rates from uncontrolled clinical longitudinal studies. The objective was to test the null-hypothesis that trends and performance differences inferred from naïve-indirect comparison based on evidence from uncontrolled clinical longitudinal studies and from direct comparisons based on RCT evidence have similar directions and magnitude.
The intention was to pool datasets of the same cavity type and follow-up period using random-effects meta-analysis (RevMan 4.1.2), if possible. However, meta-analysis of longitudinal study results was not conducted, as only one n/N dataset was available from amalgam studies per type of cavity and follow-up period against which all n/N datasets from HVGIC studies were set. Pooling of these results would have generated erroneously too narrow confidence intervals and thus potentially misleading summary outcomes.
Data was drawn only from studies published in English. The reason for this language restriction was the consideration that the inclusion of non-English trials may have had little effect on summary treatment effect estimates but may rather be assumed as confirmatory [38,39]. Only uncontrolled longitudinal studies that were listed in PubMed/Medline from 2002 were searched, in order to limit the risk of any possible chronological bias, as no RCTs that provided a direct comparison between HVGIC and amalgam restorations before that date could be identified [34]. Further focus was on HVGIC studies that placed tooth restorations using the atraumatic restorative treatment (ART) approach. The reason was that the RCT data were drawn exclusively from a systematic review [34] that included HVGIC/ ART restorations and this ensured that the studies of both, uncontrolled longitudinal design and RCT did not differ in this point. However, in the literature search we did not identify any HVGIC longitudinal studies that were not based on ART.
The restrictions of this study may have limited the data available. However, the authors are confident that the identified cohort of studies represents the clinical evidence from most, if not all, clinical longitudinal studies and RCTs relevant to posterior HVGIC and amalgam restorations in the permanent dentition that have been listed in PubMed/Medline during the 2002-2012 period.

Study results
The results of this investigation suggest that the trend direction and magnitude of performance differences inferred from study results are highly affected by the utilized type of comparison and type of study design (i.e. naïve-indirect comparison of uncontrolled longitudinal evidence versus direct comparison within RCTs). The results from naïve-indirect comparison of uncontrolled longitudinal evidence are in keeping with the current general consensus on clinical HVGIC merits and is also expressed in the DGZ/ DGZMK statement of Germany [14]. Table 3. Extracted datasets from studies for analysis. From each comparison type, different trend directions and magnitudes of performance differences can be inferred (i.e. the failure rate of HVGIC being inferior/equal to that of conventional amalgam restorations in permanent posterior teeth). The nullhypothesis was rejected. This raises questions regarding the reliability of the study designs and comparison methods for subsequent inference: Randomised control trials (RCT) are 2-or more arm studies where the different intervention groups have been formed through random allocation. RCTs have been recognised as the 'goldstandard' in clinical trial methodology [40].
Clinical uncontrolled longitudinal studies are defined as a subset of non-RCTs without use of a comparison group, which evaluate the effect of a particular treatment in patients who are all offered this same particular treatment [41]. The rationale of this study type comprises of: (a) application of a pre-test measurement to a single group of patients, e.g. 'count' (absence) of restoration failures after restoration placement at baseline; (b) reapplication of the same measurement (count of restoration failures) as post-test after a certain time period; e.g. after 12, 24 or 36 months [42]. Clinical uncontrolled longitudinal studies have been found to be more efficient than cross-sectional studies in estimating the average change of measurement and its variation between individual patients [43]. They are very common in medicine [44], are faster, more convenient and less expensive to conduct than RCTs [41] and function as valuable pilot studies for guiding the planning of subsequent RCTs, e.g. in the estimation of effect sizes as basis for RCT sample size calculation [41,45].
However, despite it's stated merits the rationale of uncontrolled longitudinal studies carries the logical ''post hoc ergo propter hoc'' or 'false cause' fallacy [46] as its results suggest that a causal relationship exists between the applied intervention (e.g. the type of the restorative material) and the observed average change of post-test measurement (e.g. the restoration failure rate). Such erroneously assumed causality does not take into account other potential factors that may have caused or at least influenced the post-test measurement, which the uncontrolled longitudinal study design is unable to exclude.
Owing to the lack of a randomly selected comparison group, uncontrolled longitudinal studies are vulnerable to many sources of invalidity that are often difficult to rule out [42]. These include external confounding factors that can be known or unknown to both patient and study operator and whose effects may increase with length of follow-up period [42]. Another source of vulnerability is regression to the mean, either due to variations within patients or to measurement errors that cannot be corrected,  Table 3). doi:10.1371/journal.pone.0078397.g002 due to lack of a control group [41,47]. In addition, uncontrolled longitudinal studies are at higher risk of investigator bias and are thus more likely to lead to statistically significant results favouring one type of treatment above another [41]. Because of these shortcomings, uncontrolled longitudinal studies provide weak evidence and their results should thus not be used to guide clinical practice [45,48].
The shortcomings of the uncontrolled longitudinal study design have further impact when its results are used for naïve-indirect comparisons between two competing clinical interventions. Such comparisons are based on the assumptions of homogeneity, similarity and consistency between uncontrolled longevity studies as their data sources. Moreover, study characteristics that are not based on randomised distribution of variables within one single clinical/methodological setting cannot assure the certainty of such assumptions. Consequently, investigations have established that results from naïve-indirect comparisons have an inflated probability of statistical significance with a 30% smaller standard error (SE) than direct comparisons based on randomised control trials [16]. It was further found that 40% of confidence intervals generated from naïve-indirect comparisons do not contain the correct effect size value and that results of naïve-indirect comparisons have very poor agreement with results from direct comparisons (kappa = 0.28). The reasons for such discrepancies have been mainly ascribed to lack of compatibility, due to different prognostic factors, between patients from the different studies included in naïve-indirect comparisons accompanied by risk of random error (5% chance of type I error), that may cause a statistical significance even if the null-hypothesis is true [16]. The results of the present study, particularly the statistically higher median point estimate established from naïve-indirect comparison in favour of amalgam above HVGIC (OR 6.29; 1. 34 -19.27), appear to be in line with such observations.
Despite the more promising trends and performance differences that can be inferred from direct comparisons within RCTs regarding the failure rate of direct posterior HVGIC restorations in permanent teeth, shortcomings in the current evidence remain (e.g. related to aspects of internal validity and sample size). These require further research [49]. Nevertheless, these shortcomings do not provide evidence in support of the recommendation that HVGIC are not suitable for use as permanent posterior tooth restoration materials [14], which only direct comparisons within RCTs can provide. In this context it is interesting to note that despite a broad systematic literature appraisal no summary RCT evidence could be established in support of the hypothesis that ''direct HVGIC restorations are inferior to those of amalgam in posterior cavities of permanent teeth'' [34,50].
It is recommended that any guidance for clinical practice should be based on direct comparisons from randomised control trials, ideally appraised during systematic reviews of the clinical literature. Where RCT evidence has not as yet been established in clinical fields, clinical guidance should at least avoid recommendations based on flawed data comparisons (i.e. naïve-indirect comparison) and fallacious study methodology (i.e. uncontrolled longitudinal study design) that carry high risk of confounding and systematic error.

Conclusions
The results of this study indicate that differences concerning the failure rate of direct HVGIC versus amalgam restorations, inferred from naïve-indirect comparison and from direct comparisons based on RCT evidence are not similar in direction and magnitude. The discrepancy is ascribed to severe shortcomings in uncontrolled longitudinal clinical study design and the flawed method of naïve-indirect comparison. Both are found to carry high confounder influence risk and bias/systematic error and so may have inflated its results favouring amalgam above HVGIC restorations. Specifically, the naïve-indirect comparison of clinical characteristics of high-viscosity glass ionomer cements against gold standards for posterior permanent restorations, such as conventional amalgam fillings, based on uncontrolled clinical longitudinal studies may have augmented the reasons for the current negative  Table 3). doi:10.1371/journal.pone.0078397.g003 clinical recommendations for HVGICs as, for example expressed in the DGZ/DGZMK statement of Germany. The reliance of such directives on naïve-indirect comparison based on uncontrolled clinical longitudinal study evidence calls for attention and revision.