Prevalence of Failure due to Adverse Reaction to Metal Debris in Modern, Medium and Large Diameter Metal-on-Metal Hip Replacements – The Effect of Novel Screening Methods: Systematic Review and Metaregression Analysis

Metal-on-metal (MoM) hip replacements were used for almost a decade before adverse reactions to metal debris (ARMD) were found to be a true clinical problem. Currently, there is a paucity of evidence regarding the usefulness of systematic screening for ARMD. We implemented a systematic review and meta-analysis to establish the prevalence of revision confirmed ARMD stratified by the use of different screening protocols in patients with MoM hip replacements. Five levels of screening were identified: no screening (level 0), targeted blood metal ion measurement and/or cross-sectional imaging (level 1), metal ion measurement without imaging (level 2), metal ion measurement with targeted imaging (level 3) and comprehensive screening (both metal ions and imaging for all; level 4). 122 studies meeting our eligibility criteria were included in analysis. These studies included 144 study arms: 100 study arms with hip resurfacings, 33 study arms with large-diameter MoM total hip replacements (THR), and 11 study arms with medium-diameter MoM THRs. For hip resurfacing, the lowest prevalence of ARMD was seen with level 0 screening (pooled prevalence 0.13%) and the highest with level 4 screening (pooled prevalace 9.49%). Pooled prevalence of ARMD with level 0 screening was 0.29% and with level 4 screening 21.3% in the large-diameter MoM THR group. In metaregression analysis of hip resurfacings, level 4 screening was superior with regard to prevalence of ARMD when compared with other levels. In the large diameter THR group level 4 screening was superior to screening 0,2 and 3. These outcomes were irrespective of follow-up time or study publication year. With hip resurfacings, routine cross-sectional imaging regardless of clinical findings is advisable. It is clear, however, that targeted metal ion measurement and/or imaging is not sufficient in the screening for ARMD in any implant concepts. However, economic aspects should be weighed when choosing the preferred screening level.


Introduction
In the late 1990s, advances in metallurgy and tribology led to a renewed interest in the use of metal-on-metal (MoM) bearings in total hip replacements (THR) [1]. The use of large diameter (LD) femoral heads that mimick the native anatomy of the hip joint requires relatively thin, i.e., 4-8 mm acetabular components to prevent excessive acetabular bone resection. Due to the extreme hardness of the new cobalt-chrome alloy Metasul, it became possible to manufacture these thin acetabular components [1]. With improved fixation techniques, the concept of large-headed femoral components coupled with thin monoblock cups were rapidly adopted for MoM hip resurfacings. Preliminary results with these second generation MoM hip resurfacings were excellent, and the number of hip resurfacings surged in the early 2000s. Later on, LD MoM bearings for cementless stemmed THRs were adopted.
It was only after a decade of use of these contemporary MoM bearings that adverse reaction to metal debris (ARMD) came into focus [2,3]. Metal debris caused by the increased wear of the bearing and/or the corrosion in the neck-head taper in a stemmed MoM THR leads to local soft tissue reactions that include synovitis, necrosis and extra-articular cysts or solid masses, i.e., pseudotumours [4][5][6][7][8]. ARMD is an umbrella term proposed by Langton et al. in 2010 to describe all the microscopic and macroscopic findings that were seen in revision surgeries performed on patients with MoM hip replacements who suffered from unexplained pain [2].
Nowadays, revision surgery is considered if a large thick-walled pseudotumour is seen in MRI, or if extremely high metal ion levels (>10-20 ppb) are found in the serum or whole blood (WB). Prior to 2010, MoM hip resurfacings and modular MoM THRs produced excellent results in young and active patients. Since 2010, however, there has been a drastic decline in the use of MoM hip replacements due to the higher than anticipated prevalence of ARMD and subsequent high failure rates [9,10].
An extensive amount of research has been published on ARMD. Novel screening methods such as blood metal ion measurement and cross-sectional imaging have offered the profound possibility to investigate the etiopathogenesis, clinical history and clinical manifestation of ARMD [2,4,5,[11][12][13][14]. Both blood metal ion measurement and cross-sectional imaging have been proven to be useful in the diagnostics of ARMD [4,5,15]. However, there is a paucity of current literature regarding the usefulness of systematic screening for ARMD. Current guidelines on how to follow-up patients with MoM hip replacements lack sufficient evidence [16][17][18]. The intensity and coverage of such screening is strongly associated with the costs related to the surveillance of patients with MoM hip replacements.
Although the number of patients receiving MoM THRs has decreased drastically in recent years,. there are still a vast number of patients with MoM hip replacement still in situ. Thus, there is a clear need for sufficient evidence on how to manage patients with MoM hip replacements.
The aim of this study was to conduct a systematic review and meta-analysis of current literature to establish the prevalence of revision confirmed ARMD stratified by the use of different screening protocols. A second aim was to explore the possible confounding effects of follow-up time and year of publication of the study on the prevalence of revision confirmed ARMD by performing a meta-regression analysis.

Eligibility criteria
A study was deemed eligible for our analysis if 1) it included an original patient cohort operated on with a single disclosed implant, 2) the implant used in the study was MoM hip resurfacing or MoM THR with a femoral head diameter of 36 mm or larger, and 3) the reasons for the revisions were clearly stated or the operative findings in the revision surgery were outlined. An original patient cohort means a clearly defined population of patients operated on with a certain implant within a certain time interval at disclosed hospital(s). A study was excluded if more than one implant was used and the number of patients for each implant was not given. Moreover, a study was excluded if 1) it included patients referred from elsewhere to the place where the study was carried out (violation of eligibility criteria 1), 2) more than one implant was used, but revisions were not stratified according to the implant used, 3) the reasons for all revisions were not given, 4) the size of a study arm for a single implant was less than 20 hips.
Furthermore, if a study included a study arm or a subcohort of a study arm, which had been included in a previous study and in both studies a similar screening method was used, the more recent study was included in our analysis. However, if the two studies included an identical study arm or a subcohort of a study arm but the more recent study implemented a different level of screening, both studies were included in our analyses since the primary aim of our study was to investigate the effect of screening levels on the prevalence of revision confirmed ARMD.

Information sources and search strategy
The review was done according to PRISMA checklist (S1 Checklist). We developed a search strategy that was implemented in the PubMed and Scopus databases. Since our objective was to establish a pooled prevalence of ARMD seen with contemporary MoM hip replacements, we started our search of these databases from the year 1995. The search was conducted in May 2015.
We performed two searches of these two databases. The following search strategy was used first: "(metal-on-metal AND hip) OR (("hip resurfacing" OR ("surface replacement" AND hip) OR "large-diameter total hip") NOT (metal-on-metal))". A second search was carried out afterwards that was combined with the first search using the Boolean operator NOT in order to remove duplicates: "((recap AND hip) OR (magnum AND hip) OR (cormet AND hip) OR (durom AND hip) OR (conserve AND hip) OR (pinnacle AND hip) OR (asr AND hip) OR (m2a AND hip) OR (birmingham AND hip) OR (mitch AND hip) OR (adept AND hip))". The latter search phrase outlines the most commonly used MoM hip replacement brand names.

Study selection
All the records retrieved from the two databases using our search strategy were screened. An assessment of duplicate references was not performed. Abstracts of all the records were assessed. Studies that outlined the use of any "metal-on-metal" hip implant or a hip implant by a brand name along with any clinical outcome (patient reported outcome score, survival rate, failure rate, complication rate, revision rate, deaths, levels of metal ion levels, cross-sectional imaging findings) were selected for eligibility assessment. All the studies meeting our eligibility criteria were selected for both systematic review and meta-analysis that was conducted using the metaregression technique. mm), which comprise identical bearing systems as in hip resurfacings coupled with a mainly cementless stem, and 3) MoM THR with a femoral diameter of 44 mm or smaller [mediumdiameter (MD) MoM THR]. The last group included two fixed-sized MoM THRs: the Pinnacle  36 mm MoM THR and the M2a-38 MoM THR with a 38 mm femoral diameter. Small-diameter (<36 mm) MoM hip replacements were not included in the present study. Two studies included patients operated on with a Birmingham Mid-Head Resection (BMHR) device. These were included in the hip-resurfacing group due to their similarity. Follow-up time was recorded. We did not differentiate between mean and median follow-up times. The use of metal ion level measurement in serum or WB was assessed as well as the use of any cross-sectional imaging modality. In short, five levels of screening were identified: no screening (level 0), targeted blood metal ion measurement and/or cross-sectional imaging (level 1), blood metal ion measurement without imaging (level 2), blood metal ion measurement with targeted imaging (level 3) and comprehensive screening (both blood metal ions and imaging for all; level 4). If neither screening method was used, the level of screening was labeled "None" (Level 0). If prerevision details that included metal ion levels and/or imaging findings were described in single patients in the results section, the use of screening was considered lacking unless there was a protocol rationale in methods section that described the use of these screening methods, i.e., patients with a complaint underwent an MRI scan. If blood metal ion measurements were performed in a subset of patients and no cross-sectional imaging was performed or if cross-sectional imaging was performed in a subset of patients without any given metal ion data, the level of screening was labeled "Targeted CoCr and/or imaging" (Level 1). Again, if prerevision imaging findings or metal ion levels were described in single patients without any protocol rationale detailed in the methods section, these screening methods were considered to be lacking. If all patients underwent a metal ion measurement without any imaging protocol outlined in the methods section, the level of screening was labeled "CoCr without imaging" (Level 2). If targeted imaging was used along with a routine (full coverage) metal ion measurement, the level of screening was labeled "CoCr with targeted imaging" (Level 3). If all patients underwent both metal ion measurement and cross-sectional imaging, the level of screening was labeled "Comprehensive" (Level 4). The modality of the imaging was recorded. We did not differentiate between serum and WB measurements.
The number and reasons for the revisions were recorded. Detailed prerevision and perioperative findings were assessed if described. The following reasons for the revision were considered to be ARMD: "ARM(e)D", "Adverse wear","adverse local tissue reaction (ALTR)","adverse tissue reaction (ATR)", "metallosis", "pseudotumour" and "synovitis". "Elevated metal ion levels" or "component/cup malposition" as reasons for revision were not considered to be ARMD unless perioperative findings were described. The definition of ARMD was also met if perioperative findings were described, that is if the operative findings described included the terms "metallosis", "synovitis", "pseudotumour", "necrotic substance/ tissue" and the case(s) outlined were considered to have failed due to ARMD. Cases with ARMD as revision indication but that also included a clear statement about component loosening were not included in our analyses.

Summary measures
The primary summary measure was the prevalence of ARMD. This was calculated by dividing the total number of revisions due to ARMD by the total number of hips included in the study. Confounding variables included in the meta-regression were follow-up time, year of publication and level of screening. This information was extracted as described in the previous section.

Data synthesis
The Q-statistic was used to assess heterogeneity across the studies. If the Q-statistic suggested high heterogeneity (p-value< 0.1), a random effects model was used instead of a fixed effects model. The amount of heterogeneity was assessed using the I 2 -measure. The DerSimonian-Laird estimator was used as a random effects model when needed. Arcsine transformation was used for the summary measure (prevalence o revision confirmed ARMD). We preferred this to logit transformation because zero prevalence was overrepresented in our study. With logit transformation we would have been obliged to choose a random increment to be added to the zero summary measures. We would not have had any reasonable value for this. Metaregression analysis was used to assess whether differences in the prevalence of revision confirmed ARMD across different levels of screening remained after adjusting for the year of publication and follow-up time. All analyses were stratified by the implant concept (HR/LD THR/MD THR). Finally, we carried out a "best case" sensitivity analysis by performing all the aforementioned analyses using only study arms with patients operated on with Birmingham Hip Resurfacing (BHR, Smith&Nephew, Warwick, United Kingdom).

Results
A PRISMA flow diagram of the study selection process is shown in Fig 1. In total, 122 studies were included. These studies comprised 145 study arms (Table 1) (S1 Datafile). The median number of hips in the study arms was 128 (range 20-3095). The most commonly used implant was BHR (Table 2). Patients in 48 study arms were operated on with BHR. In total, 100 study arms included patients operated on with a hip-resurfacing device [14,15,. Thirty-three study arms included patients with LD-THA [35,56,79,95,. The most commonly used LD THA was the articular surface replacement (ASR) XL THR (DePuy Orthopedics, Warsaw, IN, USA). In general, the stems used with LD MoM bearing couples varied greatly. MD THR was used in 11 study arms [94,112,120,[122][123][124][125][126][127][128][129]. In most of the studies, a traditional or conventional follow-up protocol without metal ion measurement and cross-sectional imaging was used ( Table 2). Distribution of the follow-up time and year of publication are presented in Table 2.
The total overall pooled estimate for the prevalence of revision confirmed ARMD among 145 study arms was 1.07% (CI: 0.69-1.49, I 2 = 92.3%, p heterogeneity < 0.001). In general, the amount of heterogeneity was high. Individual study arms were stratified according to the level of screening for ARMD and the implant concept. The individual weighted prevalences of ARMD in each study arm under the random effects model are shown in Figs 2-4. In the hipresurfacing group, the overall pooled prevalence of ARMD was 0.43% (CI: 0.25-0.65). With more comprehensive screening, the higher pooled prevalence of ARMD was observed (Fig 2). In the LD THR group, the overall pooled prevalence of ARMD was 4.6% (CI: 1.94-8.32). Prevalence peaked in the study arms with Level 4 screening (Fig 3). The clear trend of higher prevalence of ARMD associated with an increased level of screening seen among resurfacings groups was not observed in this group. The overall pooled prevalence of revision confirmed ARMD in the MD THR group was 1.43% (CI: 0.21-3.70%). This group lacked study arms with Level 1 and Level 4 screening (Fig 4).

Metaregression analysis
Metaregression was performed separately for hip resurfacings and LD THRs (Table 3). Metaregression was not performed in the MD THR group since no study arms were available for screening levels 1 and 4. For hip resurfacings, comprehensive screening (Level 4) was superior when compared with other levels, i.e., the prevalence of revision confirmed ARMD was significantly higher in level 4 studies when compared with others. An increase in follow-up time had a small, positive effect on the prevalence of ARMD. This association remained after adjusting for confounding variables. In the LD THR group, level 1 screening proved to be as good as level 4 screening. Screening levels 0.2 and 3 were inferior when compared with level 4 screening, i.e., the prevalence of ARMD was significantly lower in these levels compared with the level 4 study arms. These differences remained after adjusting for confounding variables.                  Effect of Screening Methods on the Prevalence of ARMD Targeted blood metal ions without imaging 8 Targeted imaging without blood metal ions 3 Targeted blood metal ions with targeted imaging 9 Level 2 Blood metal ions without imaging 26 Cross-sectional imaging without blood metal ions 1 Level 3 Blood metal ions with targeted imaging 11 Level 4 Blood metal ions AND cross-sectional imaging 10 Year of publication Study arms

Sensitivity analysis
All analyses were calculated using only study arms with Birmingham Hip Resurfacing, which has been the most used implant (48 arms). A trend was observed that showed an increased prevalence of ARMD associated with an increased level of screening ( Fig 5). The results of the metaregression analyses were similar to those observed with all hip resurfacings with the exception of level 3, where no inferiority to level 4 screening was observed (Table 4). Screening levels 0,1 and 2 were significantly inferior to level 4.

Discussion
Despite the marginal use of MoM hip replacements nowadays, the orthopedic community must bear the burden of a vast follow-up that has resulted from the widespread use of these devices over the past 15 years [130]. It is evident that patients with MoM hip replacements must be followed-up, at least clinically. However, there is paucity of information available regarding the optimal follow-up protocol and especially regarding the use of blood metal ion measurement and cross-sectional imaging. We must be rigorous and aim for the best possible and up-to-date evidence when constructing guidelines on how to manage patients with MoM replacements. Thus, we have performed a systematic literature review and meta-analysis to investigate the influence of the extent of the screening protocol on the prevalence of revision confirmed ARMD. The overall pooled prevalence of confirmed prevalance ARMD was low. This is not a surprising finding considering that in most of the studies no screening was implemented other than conventional x-rays and clinical examination. The prevalence of ARMD was lowest in the study arms without screening (level 0). Moreover, these study arms also included the largest number of hips. Due to the weighting based on the sample sizes, the overall prevalence of ARMD does not, therefore, correctly highlight the current situation in patients with MoM hip replacements.
Heterogeneity between the studies was high. Firstly, there was a lot of variation in the implants used. There are many implant specific factors (clearance, hemispherity, carbon content, etc) that influence the wear of the bearing surface, and, therefore, bearing wear rates may differ greatly between different bearing systems [131,132]. Furthermore, both clinical studies and registry data show that there are major differences in the failure rates between different hip resurfacings and LD THRs [79,110,133]. These differences in failure rates are due to the modular taper-trunnion junction between the head and stem in the THRs, which is an additional source of metal debris due to corrosion and mechanical wear in the taper interface [134]. Secondly, when only study arms with the BHR implant were analyzed, high heterogeneity was still observed. The outcome variable assessed in our analysis was the revision rate for ARMD. Even  if two studies implemented identical screening protocols, i.e., full coverage blood metal ion measurement and targeted cross-sectional imaging, very different prevalences of revision confirmed ARMD could still be observed. This is because indications for revision surgery can vary greatly between different surgeons and different hospitals. Some surgeons may prefer closer follow-up in cases where others would prefer revision surgery. The current literature lacks a specific definition for ARMD and especially the indications for revision. Due to these implant and inter-observer related differences, a high heterogeneity is observed. Other confounding factors that may have influenced the observed prevalence of revision confirmed ARMD are the follow-up time and the publication year of the study. The prevalence of ARMD increases in a cumulative manner with increasing follow-up time [110]. ARMD may manifest as early as two years postoperatively and a late occurrence after ten-years of follow-up is also possible [2,15]. Thus, we included follow-up time as a confounding variable in our metaregression analysis to investigate whether prevalence of ARMD is a matter of long enough follow-up time or a matter of the screening protocol used. The follow-up time had no influence on the observed prevalence of ARMD. Therefore, our results suggests that even if the prevalence of revision confirmed ARMD would increase with increasing follow-up time, this association would not be observed due to the stronger effect of the screening protocol used.
The year of publication was also an important variable to consider as a confounder. We believe that there has been considerable publication bias in the MoM literature during recent years. As a result, there has been a strong tendency to publish as high as possible prevalences of pseudotumours and ARMD. Several extremely poorly functioning MoM hip replacements in have been in widespread use, and during the last two years numerous studies have been published that report the outcome of these poorly functioning implants. In most of the studies, the primary aim has been to elucidate the higher than anticipated failure rate due to ARMD. The "higher than anticipated" failure rate reflects the actual situation that we are facing nowadays with several MoM hip replacements, but from the perspective of the literature review we are facing publication bias. We do not have a sufficient number of studies where novel screening  [79,110,117]. As was the case with follow-up time, we did not observe any influence of the year of publication on the observed prevalence of ARMD. Therefore, our results reliably highlight the important role that the screening protocol has in influencing the prevalence of ARMD.
In the hip-resurfacing group, metaregression analysis suggested that level 4 screening was superior to all other screening levels and especially when compared to level 3. We consider this to be a novel finding. The main difference between these screening protocols is that when changing from level 3 to level 4, we ought to refer many patients with non-elevated metal ion levels or without complaints for cross-sectional imaging since these patients are not imaged in level 3 screening. We observed slightly higher pooled prevalence of revision confirmed ARMD in study arms with level 4 screening compared with study arms with level 3 screening. This difference in pooled estimate for the prevalence of ARMD is one benefit of screening patients without relevant clinical findings.
Our results suggest that this movement from targeted imaging to full coverage imaging is useful with regard to the prevalence of revision confirmed ARMD. However, the economical aspect and cost-effectiveness of this "transition" should be carefully assessed. When only BHR implants were analyzed, level 3 screening was not found to be inferior to level 4 screening. This would indicate that full coverage imaging would not be beneficial in patients with BHR. However, in this subgroup the analysis might be underpowered. These results should be kept in mind especially when the economics of the surveillance of patients with MoM hip replacements are considered since cross-sectional imaging is the most expensive procedure in the screening process. In contrast to imaging, especially MRI, metal ion measurement is a readily available, inexpensive screening modality that should be used in the surveillance of patients with MoM hip replacements [15]. Current MHRA guidelines do not give instructions on how to perform metal ion measurement in asymptomatic patients with MoM hip resurfacing [135]. In our institution, however, metal ion measurement is a routine follow-up measure in all hip-resurfacing patients. As our previous study suggests, the measurement of metal ions is beneficial in patients without complaints since ARMD is often seen in asymptomatic patients [15]. The results of our current study also imply the usefulness of routine metal ion measurement. A comparision of the non-routine metal ion measurement (level 1) with the routine measurement (level 3) would have been more sensible from this point of view, but for the sake of simplicity we used level 4 as our reference in our metaregression analysis. It should be noted, however, that confidence intervals for pooled prevalences of revision confirmed ARMD in the level 1 and level 2 screening study arms barely overlap with those seen in the level 3 study arms. Moreover, a distinct change in the prevalence of ARMD is seen when moving from screening levels 1-2 to screening levels 3-4. To conclude, our results suggest that routine metal ion measurement is useful in patients with hip resurfacing or BHR. More importantly, routine metal ion measurement should be performed along with targeted or full coverage imaging.
Results in the LD THR group were different than those in the hip-resurfacing group. Surprisingly, level 1 screening was equal to level 4 screening. Moreover, the pooled prevalence of ARMD with level 2 and 3 screening was clearly smaller than in study arms with level 1 screening. This is probably a biased result due to sample sizes since most study arms in level 2 screening included less than 40 hips, and these numbers might be too small to detect the actual failure rate. However, in study arms with level 3 screening there were two cohorts with more than 250 hips, and surprisingly a small prevalence of ARMD was observed in these study arms. For example, Bayley et al. had no revision due to ARMD after extensive screening [100].
The major issue with LD THRs is taper corrosion that may possibly release more toxic wear debris than that originating from the bearing surfaces [134]. Most probably, severe ARMD may be observed even in the presence of non-elevated (< 5 ppb) metal ion levels as a result. This would also explain why the clearly highest prevalence of ARMD was seen in the study arms with level 4 screening, i.e., in those studies where all patients were screened with crosssectional imaging independent of blood metal ion levels. Hence, in patients with LD THRs, a low threshold for imaging is recommended even in the presence of normal metal ion levels.

Conclusions
The aims of this study were successfully achieved. Based on our systematic literature review and metaregression analysis, the overall pooled prevalence of revision confirmed ARMD represented in the current MoM literature is low. However, this seems to be a consequence of the use of the conventional follow-up protocol, namely x-rays and clinical examination, in the majority of the published studies. The implementation of the novel screening protocol results in a clearly higher prevalence of ARMD. The highest prevalence of revision confirmed ARMD was seen when all patients had undergone both blood metal ion measurement and cross-sectional imaging. These outcomes were irrespective of the follow-up time or study publication year. With regard to hip resurfacings, routine cross-sectional imaging regardless of clinical findings is advisable. Moreover, targeted metal ion measurement and/or imaging are not sufficient in the screening for ARMD in any implant concept. However, economical aspects should be considered when choosing the preferred screening level.