Imputation of adverse drug reactions: Causality assessment in hospitals

Background & objectives Different algorithms have been developed to standardize the causality assessment of adverse drug reactions (ADR). Although most share common characteristics, the results of the causality assessment are variable depending on the algorithm used. Therefore, using 10 different algorithms, the study aimed to compare inter-rater and multi-rater agreement for ADR causality assessment and identify the most consistent to hospitals. Methods Using ten causality algorithms, four judges independently assessed the first 44 cases of ADRs reported during the first year of implementation of a risk management service in a medium complexity hospital in the state of Sao Paulo (Brazil). Owing to variations in the terminology used for causality, the equivalent imputation terms were grouped into four categories: definite, probable, possible and unlikely. Inter-rater and multi-rater agreement analysis was performed by calculating the Cohen´s and Light´s kappa coefficients, respectively. Results None of the algorithms showed 100% reproducibility in the causal imputation. Fair inter-rater and multi-rater agreement was found. Emanuele (1984) and WHO-UMC (2010) algorithms showed a fair rate of agreement between the judges (k = 0.36). Interpretation & conclusions Although the ADR causality assessment algorithms were poorly reproducible, our data suggest that WHO-UMC algorithm is the most consistent for imputation in hospitals, since it allows evaluating the quality of the report. However, to improve the ability of assessing the causality using algorithms, it is necessary to include criteria for the evaluation of drug-related problems, which may be related to confounding variables that underestimate the causal association.


Interpretation & conclusions
Although the ADR causality assessment algorithms were poorly reproducible, our data suggest that WHO-UMC algorithm is the most consistent for imputation in hospitals, since it allows evaluating the quality of the report. However, to improve the ability of assessing the causality using algorithms, it is necessary to include criteria for the evaluation of drug-related problems, which may be related to confounding variables that underestimate the causal association. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
The adverse drug reaction (ADR) causality assessment is a routine procedure in Pharmacovigilance [1], because it allows assessing drug safety parameters and the relationship and likelihood between drug exposure and the occurrence of ADR of health technologies in the post-marketing period.
Since the 1970s, different methods to standardize the evaluation of the causal association of ADRs have been available, ranging from small questionnaires to comprehensive algorithms [2].
The development of these tools, which are ordinary to use [3] and require minimal expertise to be employed [1,4], aims to solve methodological bias, reliability, and validity issues in the imputation of drug-induced adverse effects [5]. However, the main advantage arises from the possibility of decentralizing the causality assessment from the medical diagnosis, extending it to different health care levels: academics, the pharmaceutical industry, and health agencies [6].
By standardizing ADR causality assessment, the uncertainty of the association between a drug and an adverse event will not be reduced, but semi-quantitatively categorized [2] in different links of probability.
Establishing a causal link may influence the rationale for the correlation of an event that occurs to drug consumers [7]; therefore, the results of the causality assessments using algorithms must be reproducible. This is important to ratify the viability of their employment in pharmacovigilance [8], as well as their capacity to detect ADR signals [9,10]. This is because the higher the agreement on a defined ADR causal link, the more robust the hypothesis about the relationship between the use of a medication and the adverse event observed, allowing the communication of the risk and, therefore, the implementation of risk minimization and patient safety plans.
Because serious ADRs lead to hospitalization, it is necessary to assess the causality in the tertiary health care level. However, there are few data about the agreement on ADR causality assessment using different algorithms in patients hospitalized in internal medicine units in developing countries.
It is known that most ADR evidence arises from hospitals, due to the high risks associated with treatments in the tertiary health care level [11]. Therefore, the causality assessment in high complexity institutions contributes to: i) the early recognition of adverse effects, which helps to prevent iatrogenic complications; ii) therapy optimization [2]; iii) establishing barriers to prevent recurrence; iv) reducing the time of hospitalization and unnecessary burden with hospitalizations that could be avoided [12].
This study aimed to compare the results of the imputation of ADRs using different algorithms in a Brazilian public hospital, to identify the most appropriate for establishing causal associations between medication use and the occurrence of adverse events.

Study design
We assessed the causality of all of the ADRs reported by health professionals during the first year of implementation of the pharmacovigilance service (March 2012 until March 2013) in a general assistance, public, medium complexity (secondary health care level) hospital with 104 beds located in the state of São Paulo.

Selection of algorithms
Twenty-nine (29) algorithms for ADR causality assessment were identified by literature review. Nineteen (19) were excluded for the following reasons: absence of equivalent terminology for the level of imputation of ADRs (n = 6); inclusion of information that is not required for the causality assessment in Brazil (n = 3); tools that were developed for the assessment of specific ADRs (n = 3); and no access to the article (n = 7). The ten algorithms (Table 1) considered eligible for the study included the combination of five main criteria for the causality assessment [13], namely: i) plausible temporality; ii) prior bibliographic description of the adverse effects related to the use of the drug involved; iii) alternative causes; iv) positive withdrawal (discontinuation of the drug with improvement of the ADR); v) positive rechallenge (reintroduction of the drug with reappearance of the ADR).
Owing to the quantitative and qualitative variability in the terminologies used to express the results of the imputation of ADRs in the included algorithms, the nomenclature developed by Macedo et al. (2005) [13] was used. To improve the accuracy of the comparison, the equivalent terms of the likelihood level were grouped into four major categories: definite, probable, possible and unlikely.

Causality assessment
Using 10 causality algorithms (Table 1), four judges: FRV (rater A), ADFS (rater B) SPS (rater C) and IO (rater D) independently assessed the first 44 cases of ADRs reported to the hospital's risk management service during its first year of implementation.
The group of judges who conducted the analysis included: a clinical pharmacist of the hospital (rater A) who had PhD in Pharmaceutical Sciences and 8 years of professional experience with pharmacovigilance issues; three pharmacy undergraduate students (raters B, C and D) who were in the last year of the course and had previously experience in pharmacovigilance´s scientific research for at least 1 year. The students were trained, in order to standardize the analysis of causal association. The 12-hour training included: 1) discussion of scientific papers on the subject (evaluation of ADR causality; evaluation of ADR causality with different decision algorithms, application of Austin Bradford-Hill's criteria in pharmacoepidemiological studies); 2) directed study (comparison and critical analysis) of the algorithms used; 3) simulation of an ADR causality assessment with a fictional case [14].
The cases of ADRs reported and selected for the study contained at least the following information: i) suspected drug (start and end date); ii) a brief description of the event (start and end Table 1. Algorithms selected for the causality assessment of adverse drug reactions in a public and general hospital in the State of Sao Paulo, Brazil (n = 10).

Algorithm
Reference ( date, data of laboratory tests when relevant); iii) polypharmacy (start and end date); iv) the patient's medical history; v) relevant interventions. We considered ADR any noxious, unintended, or undesired effect of a drug occurring at doses used in humans for prophylaxis, diagnosis, or therapy [15].
The clinical manifestations reported were classified according to seriousness and expectance. Serious ADR were defined as those causing hospitalization, those that were fatal or lifethreatening, or those that resulted in significant changes in patient treatment (thereby prolonging hospitalization) [16].
Informational drug sheets approved by the National Agency of Sanitary Surveillance (ANVISA) and monographs, such as those in the DRUGDEX (MICROMEDEX1database), Uptodate1 database and LexiComp Manole (2009) were consulted to verify the expectancy of ADR.
The results of imputation obtained with the ten algorithms were compared to analyze the agreement between the judges and the feasibility of the algorithms in the causality assessment in hospitals.

Statistical analysis
Two descriptive statistics were used to measure the nominal agreement between two or more raters: Cohen´s kappa and Light´s kappa.
Cohen´s kappa measure the degree of concordance between two judges. The analysis carried out by FRV (rater A) was considered gold-standard to calculate the inter-rater agreement between judges B, C and D.
Light´s kappa is a multi-rater statistic which measures the degree of concordance among multiple judges without gold-standard. It is an extension of Cohen's kappa. For both tests, we considered α = 0.05, 95%CI for all analyses. Values were interpreted according to Landis and Koch protocol (1977) [17] (Table 2).

Research ethics committee
This study (E-015/10 protocol) was approved by the Research Ethics Committee of the Instituto Lauro de Souza Lima.

Results
During the period of data collection, the risk management department received 24 ADRs reports that enclose 36 different types of clinical manifestations resulting from 19 drugs (Table 3). Owing to the causality imputation was carried out case to case, each judge independently assess 44 cases, since a single report may describe more than one clinical manifestation associated with only one drug or may signalize more than one suspected drug for the occurrence of a single clinical manifestation.  According to seriousness, seven ADR reports showed symptomatology classified as serious, since 4 of them prolonged hospital length-stay, 2 resulted in temporary disability and 1 was related to hospital admission.
After causality assessment, none of the algorithms showed 100% agreement between judges on the imputation of ADRs. Fair agreement was observed for both statistic tests (Cohen´s and Light´s kappa) ( Table 4). Findings suggest the poor reproducibility of the algorithms in performing ADR imputation with different judges.

Discussion
Our data suggest that WHO-UMC algorithm is the most consistent for causal imputation of hospital ADR that affected patients admitted to an internal medicine unit of a medium complexity hospital. The advantage of this tool is the semi-quantitative assessment of the causal likelihood and of the quality of the report; it has been used as a gold standard in causality studies [8,13]. Moreover, this tool was developed to evaluate the occurrence of adverse effects during the post-marketing period, which helps to achieve higher probability scores of causal association and a better reproducibility between the judges.
Emanueli (1984) algorithm, which contains a minimalist, simplified, dichotomous structure that considers only the clinical condition of the patient with alternative cause, may overestimate the cases of ADR and generate false-positive signals in risk communication, which is why it is not the most recommended for causality assessment in the context of this study.
For the remaining algorithms, we noted a weak agreement between the judges on ADR causality. Studies have shown a great variability in the results of imputation of ADR using different algorithms [8,13,[18][19][20][21][22]. According to Shakir and Layton (2002) [10], the tools are inconsistent and sometimes of poor quality for signal detection. Furthermore, they have significant limitations that reduce the accuracy and reliability of the assessment of the probability of ADR [1]. Table 4. Inter-rater and multi-rater agreement in adverse drug causality assessment, according to the statistical analysis with Cohen´s and Light s kappa.

Algorithm
Cohen Considering Naranjo et al. (1981) algorithm, data from previous study showed slight agreement between the judges [21], because it was developed and validated for the assessment of ADRs that occur during randomized clinical trials [19]. Other authors suggest the use of this tool for the imputation of ADR [21] due to its rapid implementation. However, we disagree this is the only factor to consider when choosing an algorithm. In addition to this aspect, the reliability of the results and the limitations of each tool, especially in the context of medication use (clinical trial versus post-marketing surveillance), should be considered. According to the data from our study, WHO-UMC (2010) partially meets these criteria.
We understand that it meets in an incomplete manner, because all of the analyzed algorithms do not include other factors that may be associated with adverse events, such as medication errors, product quality deviations, and suspected therapeutic ineffectiveness. The arbitrary weighting given to the evaluation criteria is another limitation that may contribute to the inconsistency of algorithms [5,23]. This adds subjectivity inherent to the algorithm structure according to criteria these authors deem most important and give greater weighting in scoring. The causality assessment itself also includes some subjectivity [2,6]. Both situations described may contribute to the poor agreement between the algorithms in the imputation of the causal link of ADR.
Another evidence that may decrease the accuracy of risk communication is the absence of ADR reports of good quality and underreporting [10,24,25]. Poor or missing information in the reports makes it difficult assessing causality in details, differentiating between probable and possible cases [2], and finding a definitive causal association. Consequently, the assumptions are not robust enough to generate signals in pharmacovigilance, impairing the assessment of drug safety in the post-marketing period.
Nowadays, there is no gold-standard algorithm for the assessment of events occurring in primary care [6,26] and studies that compared the imputation of ADR reported to national pharmacovigilance centers [13,23]. At the tertiary health care level, Kane-Gill et al. (2012) [20] found strong agreement when comparing three algorithms by active search of retrospective cases of ADR in the intensive care unit. This can be explained by the ward where the study was conducted, the methodology (one judge) and the active search method. Critical patients are constantly monitored, so the records in medical charts are more complete, which allows the collection of better information and increases the robustness of causality assessment. However, the disadvantage of the active search is the time necessary to review medical records [27], which turns the process unfeasible.
Considering the limitations described, there is evidence that it is necessary to develop better quality tools that improve the diagnosis of ADR [26]. A strategy to increase the generation of pharmacovigilance signals is the monitoring of adverse drug events [28], and its evaluation criteria should be included in the algorithms to improve the reproducibility in the causal imputation. These criteria involve the assessment of any drug-related problem.
Therefore, in an attempt to minimize the described flaws and confounding variables during the assessment, the need, effectiveness, adherence and safety parameters should also be included in the algorithms in order to update the assessment with the new concepts of WHO (2002) [29] about post-marketing studies.
Considering drug use [28,30] and the intentional non-compliance with pharmacotherapy related to the diagnosis stereotypical diseases [31] are associated with undesirable effects, it is also necessary to evaluate the impact of ADRs on the patient. This would help to know the priorities, difficulties and factors that may motivate the use or discontinuation of therapy and therefore the occurrence of undesirable effects derived from these perceptions and practices.
Finally, although the literature presents a wide range of methods for the causality assessment, including computational approaches, algorithms are still viable alternatives to the causality assessment in hospitals, since these tools are easy to use, require little financial resources to be applied in the clinical routine and need minimal expertise to be applied [18]. Thus, it is important to update these algorithms in accordance with the new definition of pharmacovigilance, allowing the monitoring of adverse drug events [26], in order to minimize confounding variables associated with the causal imputation process and therefore improve the risk/benefit assessment of medications available on the market.

Conclusion
Our data show slight agreement on the ADR causality assessment for the majority of the tested algorithms. However, WHO-UMC (2010) algorithm showed fair reproducibility and allows the analysis of the quality of the report, which is why we suggest that it is the best tool for causality assessment of ADRs occurring in hospitals. Since the Naranjo algorithm was developed and validated to diagnose ADR occurring in randomized clinical trials and showed slight concordance between the judges, this tool is not the most consistent for the assessment of ADRs that affect non-critical patients in a secondary hospital. In addition, data demonstrate the need for the development of better quality tools, that include other criteria for the assessment of drug-related problems, such as effectiveness, safety, compliance, quality or quality deviation and medication errors.