The authors have declared that no competing interests exist.
We performed a systematic review to assess whether we can quantify the underreporting of adverse events (AEs) in the published medical literature documenting the results of clinical trials as compared with other nonpublished sources, and whether we can measure the impact this underreporting has on systematic reviews of adverse events.
Studies were identified from 15 databases (including MEDLINE and Embase) and by handsearching, reference checking, internet searches, and contacting experts. The last database searches were conducted in July 2016. There were 28 methodological evaluations that met the inclusion criteria. Of these, 9 studies compared the proportion of trials reporting adverse events by publication status.
The median percentage of published documents with adverse events information was 46% compared to 95% in the corresponding unpublished documents. There was a similar pattern with unmatched studies, for which 43% of published studies contained adverse events information compared to 83% of unpublished studies.
A total of 11 studies compared the numbers of adverse events in matched published and unpublished documents. The percentage of adverse events that would have been missed had each analysis relied only on the published versions varied between 43% and 100%, with a median of 64%. Within these 11 studies, 24 comparisons of named adverse events such as death, suicide, or respiratory adverse events were undertaken. In 18 of the 24 comparisons, the number of named adverse events was higher in unpublished than published documents. Additionally, 2 other studies demonstrated that there are substantially more types of adverse events reported in matched unpublished than published documents. There were 20 meta-analyses that reported the odds ratios (ORs) and/or risk ratios (RRs) for adverse events with and without unpublished data. Inclusion of unpublished data increased the precision of the pooled estimates (narrower 95% confidence intervals) in 15 of the 20 pooled analyses, but did not markedly change the direction or statistical significance of the risk in most cases.
The main limitations of this review are that the included case examples represent only a small number amongst thousands of meta-analyses of harms and that the included studies may suffer from publication bias, whereby substantial differences between published and unpublished data are more likely to be published.
There is strong evidence that much of the information on adverse events remains unpublished and that the number and range of adverse events is higher in unpublished than in published versions of the same study. The inclusion of unpublished data can also reduce the imprecision of pooled effect estimates during meta-analysis of adverse events.
In a systematic review, Su Golder and colleagues study the completeness of adverse event reporting, mainly associated with pharmaceutical interventions, in published articles as compared with other information sources.
Research on medical treatments provides information on the efficacy of such treatments, and on side effects.
The balance between efficacy and side effects is important in assessing the overall benefit of a new treatment.
How much information on the side effects of medical treatments that is currently not published in journal articles is not known.
We searched several databases and other sources, and found 28 studies that provided information on the amount of data on side effects in published journal articles as compared to other sources (such as websites, conferences, and industry-held data).
The 28 studies found that a lower percentage of published studies than unpublished studies contain information on side effects of treatments.
A lower number of side effects are generally reported in published than unpublished studies, and a wider range of named side effects are reported in unpublished than published studies.
Including unpublished data in research leads to more precise conclusions.
These findings suggest that researchers should search beyond journal publications for information on side effects of treatments.
These findings also support the need for the drug industry to release full data on side effects so that a complete picture can be given to health professionals, policy makers, and patients.
Adverse events (AEs) are harmful or undesirable outcomes that occur during or after the use of a drug or intervention but are not necessarily caused by it [
The perceived importance of systematic reviews in assessing harms is exemplified by the growing number of such reviews published over the past few years. The Database of Abstracts of Reviews of Effects (DARE) includes 104 reviews of adverse events published in 2010 and 344 in 2014 [
Nevertheless, there remains considerable uncertainty about the extent of unpublished or industry data on adverse events beyond that reported in the published literature [
Serious concerns have emerged regarding publication bias or selective omission of outcomes data, whereby negative results are less likely to be published than positive results. This has important implications for evaluations of adverse events because conclusions based on only published studies may not present a true picture of the number and range of the events. The additional issue of poor reporting of harms in journal articles has also been repeatedly highlighted [
These emerging concerns indicate that publication and reporting biases may pose serious threats to the validity of systematic reviews of adverse events. Hence, we aimed to estimate the potential impact of additional data sources and the extent of unpublished information when conducting syntheses of adverse events. This present methodological review updates previous work [
The review protocol describing the methods to be employed was approved by all the authors before the study was initiated, and only one protocol amendment was made, whereby the risk of bias was changed to take into account the fact that we separately considered matched and unmatched cohorts (see
Any type of evaluation was considered eligible for inclusion in this review if it compared information on adverse events of health care interventions according to publication status (i.e., published versus unpublished). “Published” articles were generally considered to be manuscripts that are found in peer-reviewed journals. “Unpublished” data could be located through any other avenue (for example, from regulatory websites, trial registries, industry contact, or personal contact) and included “grey literature,” whereby grey literature is defined as print or electronic information not controlled by commercial or academic publishers. Examples of grey literature include government reports, working papers, press releases, theses, and conference proceedings.
All health care interventions were eligible (such as drug interventions, surgical procedures, medical devices, dentistry, screening, and diagnostic tests). Eligible articles were those that quantified the reporting of adverse events—in particular, the number, frequency, range, or risk of adverse events. This included instances in which the same study was compared in both its published and unpublished format (i.e., “matched comparisons”), such as a journal article and a CSR, as well as evaluations that compared adverse events outcomes from different sets of unrelated published and unpublished sources addressing the same question (i.e., “unmatched comparisons”). We selected these types of evaluations as they would enable us to judge the amount of information that would have been missed if the unpublished data were not available, and to assess the potential impact of unpublished data on pooled effect size in evidence synthesis of harms.
In summary, the PICO was as follows: P (Population), any; I (Intervention), any; C (Comparisons), published and unpublished data and/or studies; O (Outcomes), number of studies, patients, or adverse events, types of adverse events, or adverse event odds ratios and/or risk ratios.
We were primarily concerned with the effects of interventions under typical use in a health care setting. We therefore did not consider the broader range of effects, such as intentional and accidental poisoning (i.e., overdose), drug abuse, prescribing and/or administration errors, or noncompliance.
We did not include systematic reviews that had searched for and compared case reports in both the published format and from spontaneous reporting systems (such as the Yellow Card system in the United Kingdom). It is apparent that more case reports can be identified via spontaneous reporting than through the published literature, and a different pattern of reporting has already been established [
We did not restrict our inclusion criteria by year of study or publication status. Although our search was not restricted by language, we were unable to include those papers for which a translation is not readily available or those in a language unknown to the authors.
A wide range of sources were searched to reflect the diverse nature of publishing in this area. The databases covered areas such as nursing, research methodology, information science, and general health and medicine as well as grey literature sources such as conferences, theses, and the World Wide Web. In addition, handsearching was carried out of key journals, an existing extensive bibliography of unpublished studies, and all years of the Cochrane library. The sources searched are listed in
Other methods for identifying relevant articles included reference checking of all the included articles and related systematic reviews, citation searching of any key papers on Google Scholar and Web of Science, and contacting authors and experts in the field.
The methodological evaluations from the previous review were included in the analysis, in addition to the papers identified from the new search [
The results of the searches were entered into an Endnote library, and the titles and abstracts were sifted independently by two reviewers. Any disagreements were resolved by discussion, and the full articles were retrieved for potentially relevant studies. Full text assessment was then carried out independently by two reviewers and disagreements discussed; when agreement could not be reached, a third independent arbiter was required.
The search strategies contained just two facets—“publication status” and “adverse events”—representing the two elements of the review’s selection criteria. Where possible, a database date entry restriction of 2009 onwards was placed on the searches, as this is the date of the last searches carried out [
Data were extracted by one reviewer and then all data were checked by another reviewer. Any discrepancies were resolved by discussion or by a third reviewer when necessary. Information was collected on the study design and methodology, interventions and adverse events studied, the sources of published and unpublished data and/or studies, and outcomes reported, such as effect size or number of adverse events.
No quality assessment tool exists for these types of methodological evaluations or systematic reviews with an evaluation embedded. The quality assessment criteria were therefore derived in-house by group discussion to reflect a range of possible biases. We assessed validity based on the following criteria:
Adequacy of search to identify unpublished and published versions: for instance, were at least two databases searched for the published versions? Incomplete searches would give the impression of either fewer published or unpublished sources in existence.
Blinding: was there any attempt to blind the data extractor to which version was published or unpublished?
Data extraction: was the data extraction checked or independently conducted when comparing adverse events between the different sources?
Definition of publication status: were explicit criteria used to categorise or define unpublished and published studies? For example, unpublished data may consist of information obtained directly from the manufacturers, authors or regulatory agencies. Conversely, a broader definition incorporating “grey literature” may include information from websites, theses or dissertations, trial registries, policy documents, research reports, and conference abstracts, although these are often publically available.
External validity and representativeness: did the researchers select a broad-ranging sample of studies (in terms of size, diversity of topics, and spectrum of adverse events) that were reasonably reflective of current literature?
We were aware that published and unpublished data could be compared in different ways. An unconfounded comparison is possible if investigators checked the reported adverse event outcomes in the published article against the corresponding (matching) unpublished version of the same study. In these instances, we assessed validity with the following additional criteria:
a. Matching and confirming versions of the same study: for instance, were explicit criteria for intervention, sample size, study duration, dose groups, etc. used? Were study identification numbers used, or were links and/or citations taken from unpublished versions?
Conversely, there are comparisons of adverse events reports between cohorts of unmatched published and unpublished sources, in which investigators synthesise harms data from collections of separate studies addressing the same question. This situation is typically seen within meta-analyses of harms, but we recognize major limitations from a methodological perspective when comparing pooled estimates between published and unpublished studies that are not matched. This stems from the possibility of confounding factors that drive differences in findings between the published and unpublished sources rather than publication status alone. We assessed validity of these comparisons with the following additional criteria:
b. Confounding factors in relation to publication status: the results in published sources may differ from those of unpublished sources because of factors other than publication status, such as methodological quality, study design, type of participant, and type of intervention. For instance, did the groups of studies being compared share similar features, other than the difference in publication status? Did the unpublished sources have similar aims, designs, and sample sizes as the published ones? If not, were suitable adjustments made for potentially confounding factors?
We did not exclude any studies based on the quality assessment, as all the identified studies were of sufficient quality to contribute in some form to the analysis.
We considered that there would be different approaches to the data analysis depending on the unit of analysis. We aimed, therefore, to use the following categories of comparison, which are not necessarily mutually exclusive, as some evaluations may provide comparative data in a number of categories:
Comparison by number of studies
Numbers of studies that reported specific types or certain categories of adverse events. This, for instance, may involve identifying the number of published and unpublished studies that report on a particular adverse event of interest (e.g., heart failure) or of a broader category, such as “serious adverse events.”
Comparison by adverse events frequencies
Number or frequency of adverse events
How many instances of adverse events were reported (such as the number of deaths reported or number of fracture effects recorded) in the published as compared to unpublished sources?
Description of types of adverse events
How many different types of specific adverse events (such as heart failure, stroke, or cancer) were described in the published as compared to unpublished sources?
Comparison by odds ratios or risk ratios
The impact of adding unpublished sources to the summary estimate—in terms of additional number of studies and participants—as well as the influence on the pooled effect size in a meta-analysis.
In all instances (1 to 3), the published and unpublished sources could be matched (i.e., different versions of the same study) or unmatched (i.e., separate sets of studies on the same topic area).
If pooled risk ratios or odds ratios were presented, we conducted a comparison of the magnitude of treatment effect from published sources versus that of combining all sources (published and unpublished) via a graphical representation.
To avoid double data analysis, where data were presented for all adverse events, serious adverse events, withdrawals because of adverse events, and specific named adverse events, the primary analysis was conducted with the largest group of adverse events (mostly all adverse events or all serious adverse events). In addition, we undertook subgroup analysis with the serious adverse events only.
A total of 4,344 records (6,989 before duplicates were removed) were retrieved from the original database searches in May 2015, and an additional 747 records were identified from the update searches in July 2016. Altogether, 28 studies met the inclusion criteria from 31 publications (
Ninety-three articles were excluded based on full-text screening. The excluded studies tended not to include adverse events data or did not present relevant data for both published and unpublished articles (
The majority of the included studies looked at drug interventions. Only 2 considered a medical device—both of which had a pharmacological component [
A total of 11 studies considered named adverse events [
The included studies fell into three categories: those which measured the numbers of studies reporting adverse events in published and unpublished sources (9 studies [
A total of 26 studies conducted literature searches for the published papers; the other 2 studies either did not report the sources searched [
Blinding of researchers to the publication status was not reported in any of the studies. This probably reflects the difficulty of blinding, as it is obvious to data extractors, for example, whether they are looking at an unpublished CSR or a journal paper. In one paper, the authors even refer to the impossibility of blinding. Some form of double data extraction was reported in 22 studies, 3 studies used a single data extractor only [
Although many of the studies included any adverse events, the generalisability of the majority of the included studies was limited, as they were all restricted to one drug intervention or one class of drugs (e.g., selective serotonin reuptake inhibitors [SSRIs] or nonsteroidal anti-inflammatory drugs [NSAIDs]).
There were 16 studies that used matched published and unpublished sources. Of these, 8 studies did not describe how they matched unpublished and published articles of the same study [
A total of 14 studies compared unmatched published and unpublished sources (2 studies also used matched published and unpublished sources). Although some studies compared the characteristics of published and unpublished sources, only 1 study controlled for confounding factors (
There were 8 included studies (from 9 publications and representing 10 comparisons) that compared the number of studies reporting adverse events in the published and matched unpublished documents (
*Classified adverse events information as either “completely reported” versus “incompletely reported.” Incompletely reported adverse events could lack numerical data or include only selected adverse events, for example. Maund 2014a [
There were 2 studies that compared the number of studies reporting adverse events in unmatched published and unpublished sources (
*Classified adverse events information as either “completely reported” versus “incompletely reported.” Incompletely reported adverse events could lack numerical data or include only selected adverse events, for example. Hemminki 1980a [
There were 11 included studies that compared the actual numbers of adverse events, such as number of deaths or number of suicides [
In circumstances where multiple AEs were presented, the largest category is included (for example, all AEs or all SAEs). Pranić 2015 [
The overall percentage of adverse events that would have been missed had an analysis relied only on the published versions of studies varied between 43% and 100%, with a median of 64% (mean 62%) (
In circumstances where multiple AEs were presented, the largest category is included in Fig 5 (for example, all AEs or all SAEs). Pranić 2015 [
Some studies reported the number of adverse events of specific named outcomes, such as death, suicide, or infection, or the numbers of specific categories of adverse events, such as urogenital, cardiovascular, and respiratory adverse events. There were 24 such comparisons, and, in 18 cases, the numbers of adverse events reported in the unpublished documentation were higher than in the trial publications. In three instances, the numbers of adverse events were the same [
There were 2 studies that compared how many different types of specific named adverse events were in matched published and unpublished sources. Pang (2011) found that 67.6% and 93.3% of the serious or fatal adverse events, respectively, in the company trial reports were not fully listed in the published versions [
Only 1 study compared the numbers of adverse events from unmatched published and unpublished articles (in addition to matched articles) (
There were 11 included studies reporting 28 meta-analyses that looked at magnitude of the pooled risk estimate—i.e., odds ratio or risk ratio for adverse events—with unpublished data and published data or a combination of published and unpublished data (
Although 2 studies looked at a variety of conditions and treatments, 3 of the 11 studies looked at antidepressants and 2 at cardiovascular treatments, while the other drugs examined were for conditions such as diabetes, eye problems, pain, and menopausal symptoms.
Eyding 2010a [
Eyding 2010a [
Adverse events are often rare or of low frequency, and we found that meta-analyses of published studies alone tended to yield effect estimates with very broad 95% confidence intervals. In
There were also two systematic reviews identified in Potthast 2014 [
In
The 28 studies included in our review give an indication as to the amount of data on adverse events that would have been missed if unpublished data were not included in assessments. This is in terms of the number of adverse events, the types of adverse events, and the risk ratios of adverse events. We identified serious concerns about the substantial amount of unpublished adverse events data that may be difficult to access or “hidden” from health care workers and members of the public. Incomplete reporting of adverse events within published studies was a consistent finding across all the methodological evaluations that we reviewed. This was true whether the evaluations were focused on availability of data on a specific named adverse event, or whether the evaluations aimed to assess general categories of adverse events potentially associated with an intervention. Our findings suggest that it will not be possible to develop a complete understanding of the harms of an intervention unless urgent steps are taken to facilitate access to unpublished data.
The extent of “hidden” data has important implications for clinicians and patients who may have to rely on (incomplete) published data when making evidence-based decisions on benefit versus harms. Our findings suggest that poor reporting of harms data, selective outcome reporting, and publication bias are very serious threats to the validity of systematic reviews and meta-analyses of harms. In support of this, there are case examples of systematic reviews that arrived at a very different conclusion once unpublished data were incorporated into the analysis. These examples include the Cochrane review on a neuraminidase inhibitor, oseltamivir (Tamiflu) [
Our review demonstrates the urgent need to progress towards full disclosure and unrestricted access to information from clinical trials. This is in line with campaigns such as AllTrials, which are calling for all trials to be registered and the full methods and results to be reported. We are starting to see major improvements, however, in the availability of unpublished data based on initiatives of the European Medicines Agency (EMA) (since 2015, EMA policy on publication of clinical data has meant the agency has been releasing clinical trial reports on request, and proactive publication is expected by September 2016), European law (the Clinical Trial Regulation), the FDA Amendments Act of 2007—which requires submission of trial results to registries—and pressure from Cochrane to fully implement such policies. Although improvements have been made in the accessibility of data, there are still major barriers and issues to contend with. Researchers are still failing to obtain company data, and even when data are released, which can be after a lengthy process, they can be incomplete [
The discrepant findings in numbers of adverse events reported in the unpublished reports and the corresponding published articles are also of great concern. Systematic reviewers may not know which source contains the most accurate account of results and may be making choices based on inadequate or faulty information. Many studies stated that they were unclear as to why the discrepancies exist, whilst others referred to incomplete reporting or nonreporting, changing the prespecified outcomes analysed (post hoc analysis), or different iterations of the process of aggregating raw data. It is not clear if the differences stemmed from slips in attention and errors, or whether the peer review process led to changes in the data analysis. Journal editors and readers of systematic reviews should be aware that a tendency to overestimate benefit and underestimate harm in published papers can potentially result in misleading conclusions and recommendations [
One major limitation that we are aware of is the difficulty of demonstrating (even with meta-analytic techniques) that inclusion of unpublished data leads to changes in direction of effect or statistical significance in pooled estimates of rare adverse events. The meta-analysis case examples here represent only a small number amongst thousands of meta-analyses of harms, and it would be entirely inappropriate to draw conclusions regarding any potential lack of impact on statistical significance of the pooled estimate from inclusion of unpublished data.
Nevertheless, the available examples clearly demonstrate that availability of unpublished data leads to a substantially larger number of trials and participants in the meta-analyses. We also found that the inclusion of unpublished data often leads to more precise risk estimates (with narrower 95% confidence intervals), thus representing higher evidence strength according to the GRADE (Grades of Recommendation, Assessment, Development and Evaluation) classification, in which strength of evidence is downgraded if there is imprecision [
There are a number of other limitations to this review. The included studies in our review were difficult to identify from the literature, so relevant studies may have been missed. Searches for methodological studies are notoriously challenging because of a lack of standardized terminology and poor indexing. In addition, systematic reviews with some form of relevant analysis embedded in the full text may not indicate this in the title, abstract, or database indexing of the review. Also, those studies identified may suffer from publication bias, whereby substantial differences between published and unpublished data are more likely to be published.
Few studies compared published and unpublished data for nonpharmacological interventions. Yet the importance of publication bias for adverse events of nondrug interventions may be just as important as for drug interventions. Unpublished adverse events data of nondrug interventions may differ from unpublished adverse events data of drugs because of aspects such as regulatory requirements and industry research. To examine the generalisability of our findings, more studies with a range of intervention types are required.
In conclusion, therefore, there is strong evidence that substantially more information on adverse events is available from unpublished than from published data sources and that higher numbers of adverse events are reported in the unpublished than the published version of the same studies. The extent of “hidden” or “missing” data prevents researchers, clinicians, and patients from gaining a full understanding of harm, and this may lead to incomplete or erroneous judgements on the perceived benefit to harm profile of an intervention.
Authors of systematic reviews of adverse events should attempt to include unpublished data to gain a more complete picture of the adverse events, particularly in the case of rare adverse events. In turn, we call for urgent policy action to make all adverse events data readily accessible to the public in a full, unrestricted manner.
*Classified adverse events information as either “completely reported” versus “incompletely reported.” Incompletely reported adverse events could lack numerical data or include only selected adverse events, for example. Maund 2014a [
(TIFF)
*Classified adverse events information as either “completely reported” versus “incompletely reported.” Incompletely reported adverse events could lack numerical data or include only selected adverse events for example. Hemminki 1980a [
(TIFF)
In Jefferson 2011 [
(TIFF)
(TIFF)
(DOCX)
(DOCX)
(DOCX)
(DOCX)
(DOC)
(DOCX)
(DOCX)
(DOCX)
We would like to thank the authors of the included studies, Alex Hodkinson, Tom Jefferson, Michael Köhler, Emma Maund, Regine Potthast, Mark Rodgers, and Beate Wieseler, for checking the data extraction for their papers. We are also grateful to Lesley Stewart for comments on an earlier draft.
The views expressed in this publication are those of the authors and not necessarily those of the National Health Service, the National Institute for Health Research, or the Department of Health.
adverse event
clinical study report
Database of Abstracts of Reviews of Effects
European Medicines Agency
Grades of Recommendation, Assessment, Development and Evaluation
Health Technology Assessment
nonsteroidal anti-inflammatory drug
odds ratio
risk ratio
selective serotonin reuptake inhibitor