^{1}

^{1}

^{2}

^{1}

^{3}

^{1}

^{3}

^{4}

^{*}

The authors have declared that no competing interests exist.

Conceived and designed the experiments: MM MEF. Performed the experiments: MM VGA MEF. Analyzed the data: MM VGA KZV MEF. Contributed reagents/materials/analysis tools: MM VGA KZV MEF. Wrote the paper: MM VGA KZV MEF.

Biomedical literature is increasingly enriched with literature reviews and meta-analyses. We sought to assess the understanding of statistical terms routinely used in such studies, among researchers.

An online survey posing 4 clinically-oriented multiple-choice questions was conducted in an international sample of randomly selected corresponding authors of articles indexed by PubMed.

A total of 315 unique complete forms were analyzed (participation rate 39.4%), mostly from Europe (48%), North America (31%), and Asia/Pacific (17%). Only 10.5% of the participants answered correctly all 4 “interpretation” questions while 9.2% answered all questions incorrectly. Regarding each question, 51.1%, 71.4%, and 40.6% of the participants correctly interpreted statistical significance of a given odds ratio, risk ratio, and weighted mean difference with 95% confidence intervals respectively, while 43.5% correctly replied that no statistical model can adjust for clinical heterogeneity. Clinicians had more correct answers than non-clinicians (mean score ± standard deviation: 2.27±1.06

A considerable proportion of researchers, randomly selected from a diverse international sample of biomedical scientists, misinterpreted statistical terms commonly reported in meta-analyses. Authors could be prompted to explicitly interpret their findings to prevent misunderstandings and readers are encouraged to keep up with basic biostatistics.

Literature reviews, including systematic reviews and meta-analyses, are critical components of evidence-based medicine. Such studies are commonly regarded as valuable sources of evidence and influence both clinical practice and public health policy

Statistical terms commonly used in meta-analyses, but also original research studies, include effect estimate measures such as the odds ratio (OR), risk ratio (RR), and weighted mean difference (WMD). Another important component of evidence synthesis studies is heterogeneity, which can be classified as clinical or statistical heterogeneity. Previous studies have implied a suboptimal understanding of such statistical terms among readers and/or researchers, but no study to our knowledge has assessed the understanding of plain effect estimates, provided in a commonly-encountered, clinical context. In this regard, we sought to investigate the current level of comprehension of statistical terms commonly used in meta-analyses.

An on-line survey was conducted from December 2011 to January 2012, based on the methodology of electronic surveys previously published

Participants were informed about the aims of the study, the length of time of the survey, and the primary investigator (MEF). The questionnaire was a structured, web-based, multiple-choice form, comprising of 5 single-answer questions. Four mandatory questions evaluated the understanding of simple statistical terms commonly used in meta-analyses (OR, RR, WMD, and heterogeneity), in a clinical context, and the last, optional question, inquired the specialty of the respondent (

1) A meta-analysis of randomized controlled trials (RCTs) compared a new drug |

2) A meta-analysis of RCTs compared a new drug |

3) A meta-analysis of RCTs compared a new drug |

4) A meta-analysis was conducted, pooling studies with clinical heterogeneity but without substantial statistical heterogeneity (p>0.1, I^{2} = 30%). Which of the following statistical models would be appropriate for this meta-analysis?a) The fixed effect model.b) The random effects model.c) Another model. |

5) Your specialty is:a) Medical (including psychiatry)b) Surgical (including anesthesiology)c) Clinical laboratory (including radiology)d) None of the above |

Respondents' answers were pooled and graphically presented. A score was calculated for each participant, representing the number of correct answers (1 point was awarded for each correct answer). Univariate comparisons were performed to examine the potential effect of respondents' specialty, region, and questionnaire completion time on their score. We used Pearson correlation, Student's t-test, and analysis of variance tests for normally distributed variables, and Spearman correlation, Mann-Whitney, and Kruskal-Wallis (for non-parametrically distributed variables) tests, as appropriate. The normality of the distribution of the variables was assessed with the Wilk-Shapiro test. All analyses were performed with STATA 11.2 (Stata Corp., College Station, TX, USA) statistical software package. A p<0.05 was considered to denote statistical significance.

The online questionnaire was accessed 800 times and after exclusion of 1 duplicate report, a total of 315 complete forms were analyzed (participation rate 39.4%). The median questionnaire completion time was 202 seconds (interquartile range: 143 to 362 seconds). Most participants completed the questionnaire from Europe (151/315, 48%) and North America (99, 31%), and fewer from Asia/Pacific (52, 17%) and Central & South America or Africa (13, 4%). Most of the participating physicians (n = 169; 16/315 respondents did not provide relevant data) had a medical specialty (69%, 116/169; including psychiatry), while 25% (43/169) had a surgical specialty (including anesthesiology) and few (6%, 10/169) had a clinical laboratory specialty (including radiology). 130 respondents were non-clinicians (non-physicians or physicians without specialty).

Responses to our questions are presented in

Correct answers are marked with an asterisk; the questionnaire is presented in

The percentages of correct responses to each question among the respondents' groups are presented in

Clinicians had more correct answers than non-clinicians (mean score ± standard deviation: 2.27±1.06

The main finding of our survey is that, even among researchers, there is incomplete understanding of statistical terms commonly reported in meta-analyses. This finding was more pronounced in non-clinicians; among clinicians, those with a medical specialty tended to have a slightly better understanding of statistical terms than the others. Although the questions were clinically oriented and commonly encountered in the biomedical literature, overall, almost half (48.3%) were answered incorrectly; 10.5% of the respondents answered correctly all questions, while 9.2% answered all questions incorrectly.

Few studies have addressed the level of comprehension of commonly used statistical terms among the providers and the recipients of biomedical research (authors and readers). Previous studies noted an incomplete understanding of the difference between odds ratio and risk ratio, in terms of both calculation

Our findings suggest a better understanding of the tested statistical terms among clinicians, compared with non-clinicians. Clinicians with a medical specialty tended to score higher than the rest. Interestingly, the groups that tended to score higher were the ones that were mostly represented in our analysis (169 clinicians

Our study has significant implications. It has already been argued that a large part of published biomedical research is inaccurate

Although through this study we cannot identify the source of the problem, nor suggest a practical solution, the first step in the problem solving process remains the definition and identification of the problem. Our study also serves as a call for careful consideration of published research by journal editors, article authors, and readers. At the end of the day, in this era of rapidly evolving evidence-based medicine, physicians would rather be able to properly interpret current research findings than memorize a large amount of potentially outdated information.

One might argue that our findings should not be generalized to the majority of physicians or biomedical scientists. However, the participants in our survey were corresponding authors of articles indexed by PubMed, who in general are expected to be more statistically knowledgeable than ordinary readers; in addition, the participants represented a random, international sample of scientists and physicians of various specialties. Another potential explanation for our findings would be that the participants did not pay adequate attention to the questions; this is unlikely, considering that those not interested in our survey would not complete and submit it (only complete responses were assessed), and that the median completion time was around 3 minutes (for 4 “interpretation” questions); in this regard, it should be acknowledged that the participation rate was relatively low (39.4%), which is not unusual for this type of research. Last, our study suffers the inherent limitations of online surveys, including self-selection bias and concerns on the accuracy and reproducibility of the responses

In conclusion, a large proportion of biomedical researchers misinterpreted simple effect estimates commonly used in meta-analyses. Journal editors and article authors may embrace a more comprehensive interpretation of each study's findings, while readers are encouraged to keep up with basic biostatistics.