‘Spin’ in published biomedical literature: A methodological systematic review

In the scientific literature, spin refers to reporting practices that distort the interpretation of results and mislead readers so that results are viewed in a more favourable light. The presence of spin in biomedical research can negatively impact the development of further studies, clinical practice, and health policies. This systematic review aims to explore the nature and prevalence of spin in the biomedical literature. We searched MEDLINE, PreMEDLINE, Embase, Scopus, and hand searched reference lists for all reports that included the measurement of spin in the biomedical literature for at least 1 outcome. Two independent coders extracted data on the characteristics of reports and their included studies and all spin-related outcomes. Results were grouped inductively into themes by spin-related outcome and are presented as a narrative synthesis. We used meta-analyses to analyse the association of spin with industry sponsorship of research. We included 35 reports, which investigated spin in clinical trials, observational studies, diagnostic accuracy studies, systematic reviews, and meta-analyses. The nature of spin varied according to study design. The highest (but also greatest) variability in the prevalence of spin was present in trials. Some of the common practices used to spin results included detracting from statistically nonsignificant results and inappropriately using causal language. Source of funding was hypothesised by a few authors to be a factor associated with spin; however, results were inconclusive, possibly due to the heterogeneity of the included papers. Further research is needed to assess the impact of spin on readers’ decision-making. Editors and peer reviewers should be familiar with the prevalence and manifestations of spin in their area of research in order to ensure accurate interpretation and dissemination of research.


Author summary
In the scientific literature, spin refers to reporting practices that distort the interpretation of results and mislead readers so that results are viewed in a more favourable light. The presence of spin in biomedical research can negatively impact the development of further studies, clinical practice, and health policies. We conducted a systematic review to explore the nature and prevalence of spin in the biomedical literature. We included 35 reports, which investigated spin in clinical trials, observational studies, diagnostic accuracy studies, systematic reviews, and meta-analyses. The nature of spin varied according to study a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Spin, commonly associated with propaganda, public relations, and the media, is broadly understood as a biased presentation, intended to ensure that audiences view matters favourably. Spin also occurs in published biomedical research, sometimes known as 'science hype', where scientific findings are inappropriately overstated [1]. In the scientific literature, spin refers to specific reporting practices that distort the interpretation of results and mislead readers so that results are viewed in a more favourable light [2].
Accurate reporting and interpretation of research results is essential for knowledge translation and has implications for the development of further studies, policies, and clinical practice. Examples of spin include misinterpreting statistically nonsignificant results as 'showing an effect' or the selective interpretation of results to emphasise significant secondary outcomes and minimizing nonsignificant primary outcomes [2]. These tactics could lead to subsequent research on clinical interventions for which there is a lack of supporting evidence. This, in turn, could lead to skewed systematic reviews and misinformed clinical practice guidelines or health policies. In addition, 'promising' scientific discoveries that are based upon conclusions with spin rather than data could stimulate financial investments in medical interventions that are later found to be ineffective or even harmful [1].
Spin is an enduring topic in research [3]; however, there has been recent interest in spin in the reporting and interpretation of results in published biomedical research. Boutron et al. [2] defined spin as 'specific reporting strategies, whatever their motive, to highlight that the experimental treatment is beneficial, despite a statistically non-significant difference of the primary outcome, or to distract the reader from statistically non-significant results.' This definition has served as a basis for other researchers investigating spin in published studies in particular clinical fields [4][5][6][7][8]. However, to date, there has been no systematic review or meta-analysis of the nature or prevalence of spin in biomedical literature in general or across study designs. Thus, neither the extent of spin nor its implications are well understood.
The objectives of this methodological systematic review were to examine the nature, prevalence and implications of spin in published biomedical literature across disciplines and clinical areas. The research questions included: How has spin been studied in the biomedical literature? How does spin manifest and what is its prevalence? What factors are associated with the presence of spin? Although we defined the population of interest (empirical biomedical publications) and exposure (spin) a priori, we included all spinrelated outcomes reported in the identified sample of reports. As a number of studies hypothesised that funding source was a factor associated with spin, we tested this hypothesis in our review.

Characteristics of included reports
A total of 4,471 reports were identified, with 4,450 acquired through searching the electronic databases and 21 through hand-searching the reference lists of included reports. A flowchart of the screening process is summarised in Fig 1, and Table 1 shows the characteristics of the included reports. Of the 35 included reports, 22 (63%) were published in the last 5 years (since 2012), and 34 (97%) were published in the last 10 years (since 2007). The majority of reports (31/35, 89%) were reviews of published literature designed to assess the occurrence of spin in  published biomedical literature; other designs included a survey (1/35, 3%), a randomised controlled trial (RCT) (1/35, 3%) designed to assess the effects of spin, and examination of data sources such as regulatory or company documents (2/35, 6%). The majority of reports (18/35, 51%) received funding from public or not-for-profit sources; 10 reports (10/35, 29%) did not disclose their funding source. Sixteen reports (16/35, 46%) declared that authors had no conflicts of interest; 6 of the reports (6/35, 17%) did not make a disclosure statement. The majority of the reports (23/35, 66%) investigated spin in trials. The fields of research of the included studies varied, and reports were largely focused on biomedical interventions. Eight papers (8/35, 23%) did not restrict the inclusion of studies to a clinical discipline, 5 (5/ 35, 14%) examined studies in oncology, and 4 (4/35, 11%) examined studies in surgery. All of the included studies were conducted with human participants.

Methods for assessing spin
Defining spin. The majority of reports (30/35, 86%) defined spin a priori and then sought to assess its frequency, severity, or characteristics. There was considerable variation in how researchers defined spin. We inductively classified the ways that spin was defined into 1 of 4 categories ( Table 2): (1) reporting practices that distort the interpretation of results and create misleading conclusions, suggesting a more favourable result; (2) discordance between results and their interpretation, with the interpretation being more favourable than the results; (3) attribution of causality when study design does not allow for it; and (4) overinterpretation or inappropriate extrapolation of results. Spin was defined as the inappropriate use of causal language exclusively in the context of observational or nonrandomised studies. Outcomes measured. Investigators of included reports assessed several different outcomes related to spin. These included the prevalence of spin (31/35, 89%), the level or severity of spin (8/35, 23%), practices used to spin results (19/35, 54%), factors associated with spin (19/35, 54%), and the impact of spin on a reader's interpretation (3/35, 9%).
Instruments for assessing spin. Of the reports which assessed spin in published articles (n = 34; 1 included report was an RCT), 32 used a prespecified, standardized data collection instrument (94%). Nine (9/34, 26%) used or adapted the instrument developed by Boutron et al. [2], which was originally developed for the assessment of spin in RCTs with nonsignificant primary outcomes, though it was applied to intervention studies more broadly. Reports assessing the level/severity of spin exclusively used the Boutron instrument [2], which was implemented in the context of RCTs with nonsignificant primary outcomes. Twenty-three reports (23/34, 68%) used an author-generated data collection instrument, though only 11 (11/ 34, 32%) were subject to pilot or reliability testing. One report relied on a previously published rating scale by Ridker and Torres [38], designed to assess the significance and magnitude of the intervention effect, as a means to rate discordance.
Only 4 reports (4/34, 12%) used inductive methods to assess the nature of spin, including the seminal report by Boutron et al. [2] upon which 8 other reports relied. Two others also developed instruments specifically for the assessment of spin in nonrandomised studies [4] and systematic reviews [8], though neither has yet been replicated to our knowledge. This meant that reports generally assessed spin practices that were prespecified; few conducted exploratory assessments of the nature of spin.
Assessing spin. Consistent with review methods, the majority of the reports (27/34, 79%; 1 report was an RCT and this did not apply) used multiple independent data extractors to assess spin, which was acknowledged to be subjective. Reports included additional methods to reduce interpretation bias, including resolving any discrepancies through discussion until consensus was reached (22/34, 65%), review of discrepancies by a third investigator (10/34, 29%), or, less commonly, blinding data extractors to the author, funding source, or journal (2/34, 6%).
Half of the reports (17/34, 50%) that assessed spin in published literature assessed spin in both the abstract and main text, 4 of which specifically compared the main text results to the Table 2. Definitions of spin provided by the included reports (n = 35).

Definition n = 35 Example
Reporting practices that distort the interpretation of results and create misleading conclusions, suggesting a more favourable result 20 (57%) 'Specific reporting strategies, whatever their motive, to highlight that the experimental treatment is beneficial, despite a statistically nonsignificant difference for the primary outcome, or to distract the reader from statistically nonsignificant results'. [2] 'We considered spin to exist when we observed an explicit description of spinning study findings in the internal company documents or a description in the main publication that appeared to re-frame the study results in order to explain away unfavorable findings or to emphasize favorable findings'. [33] Discordance between results and their interpretation, with the interpretation more favourable than the results 9 (26%) '. . .whether data presented in the study supported the author's conclusions. . .' [21] Attribution of causality when study design does not support it 3 (9%) 'Inappropriate use of causal language in the abstracts and titles of almost one third of human observational obesity or nutrition related study reports. . .' [15] Overinterpretation or inappropriate extrapolation of results 3 (9%) 'We defined overinterpretation as reporting of diagnostic accuracy studies that makes tests look more favorable than the results justify'. [6] https abstract and/or main text conclusions as a measure of discordance. Suggesting that the consequences of spin in the abstract were more severe given that many clinicians rely on abstracts alone, 7 reports (7/34, 21%) assessed spin in the abstract only. Nine reports (9/34, 26%) assessed spin only in the main text of the article. Three reports (3/34, 9%) additionally assessed spin in the articles' titles.

Prevalence of spin
Thirty-one reports (31/35, 89%) measured the prevalence of spin. Table 3 shows the prevalence of spin (median and range) in the different types of studies assessed in the reports. The highest prevalence of spin was measured in the main texts of a sample of 10 implantable cardioverter defibrillator trials, which all (100%) used at least 1 rhetorical practice resulting in spin [35]. The lowest was measured in the abstracts of a sample of RCTs of systemic therapy in lung cancer, where 9.7% presented discordant conclusions from study results [10]. In general, trials showed the greatest variability in the prevalence of spin. Though small sample sizes prevented statistical comparison between groups, trials with nonsignificant primary outcomes and with higher risk of bias (i.e., nonrandomized) appeared to have a higher prevalence of spin.

Level of spin
Nine reports (9/35, 26%) examined the level or severity of spin; 8 did so in the conclusions of trials with nonsignificant or inconclusive results. These 8 reports used a measure developed by Boutron et al. [2], which defined a 'high' level of spin in study conclusions as: no uncertainty  in the framing of conclusions, no recommendations for further trials, no acknowledgment of the statistically nonsignificant primary outcomes, and/or making recommendations to use the intervention in clinical practice. On average, the abstracts of 30% (141/474) and main text of 22% (75/346) of trials with nonsignificant results had 'high' levels of spin in their conclusions.
One study sought to assess the perceived severity of spin in the context of systematic reviews. Yavchitz et al. [8] invited members of the Cochrane Collaboration to rank a sample of statements from systematic reviews and meta-analyses that included spin according to their severity using a Q-sort survey. The types of spin perceived to be most severe in the context of systematic reviews were: concluding recommendations for clinical practice when not supported by the results; titles claiming the treatment is beneficial when not supported by the results; and selective reporting of or overemphasis on analysis favouring the beneficial effect of the intervention [8].

Practices used to spin results
Nineteen reports (19/35, 54%) investigated the practices that researchers used to spin results. We inductively grouped spin practices identified across study designs in order to demonstrate the range and diversity of spin practices but also to draw generalisations about the nature of spin across study designs and clinical areas. Spin practices measured in the included studies were thematically grouped into the following 4 categories: (1) inappropriate claims; (2) inappropriate extrapolations or recommendations for clinical practice; (3) selective reporting; and (4) more robust or favourable data presentation.
Inappropriate interpretation given study design. Spin manifested as claims that were inappropriate or unwarranted given the study design. For example, several reports examining spin in the context of trials with nonsignificant results found that the most common spin practice was to interpret the nonsignificant results as meaning the 2 treatments were equally good when the trial was designed to show the superiority of 1 arm [11,13,22,23,28,29,37]. The use of causal language was identified as a specific and the most prevalent spin practice in nonrandomised or observational studies, as study designs do not permit this type of inference [4]. For example, in a sample of 128 abstracts of nonrandomised studies evaluating an intervention, the most prevalent spin practice (53% of studies) was the use of causal language, including the use of statements that suggested the outcome was a result of the intervention (e.g., 'X increases Y' or 'X facilitates the rapid recovery of Y') or tone inferring a strong result (e.g., 'this study shows that' or 'the results demonstrate') [4].
Inappropriate extrapolations or recommendations for clinical practice. In studies that investigated the use of particular clinical tests or treatment options, spin may present as an inappropriate extrapolation or recommendations for clinical practice when not supported by study results. Additionally, this can include expressing confidence in the test or treatment without suggesting the need for further confirmatory studies. For example, in a sample of observational studies, 56% endorsed a recommendation for clinical practice, of which 86% failed to state that an RCT should be first performed [7].
Selective reporting. Researchers can spin their results through selectively and strategically reporting outcomes in various places in the report. This differs from outcome reporting bias, where all of the outcomes identified in a study protocol are not reported in the study report [39]. Selective reporting resulting in spin can include the omission of nonsignificant endpoints in the conclusion or abstract that were presented in the methods and results sections or discussing only significant secondary results to distract the reader from nonsignificant or unfavourable ones [2]. For example, in a sample of wound care trials with no clear primary outcome identified in the methods section, 'cherry picking' of statistically significant results was commonplace, particularly between the main text and corresponding abstract: while 74% (32/43) of reports included at least 1 statistically nonsignificant outcome in the main text, only 28% (12/43) of abstracts contained at least 1 statistically nonsignificant result [5]. Similarly, in a sample of inconclusive noninferiority trials of antiretroviral therapies, authors focused on statistically significant secondary outcomes, subgroup analyses, or modified population analyses [20]. Selective reporting could also encompass the selective citation of results from external research to support the authors' interpretation of their data [14].
More robust or favourable data presentation. Researchers used a variety of general spin practices to present study results as being more favourable than data warranted. In a study that examined internal pharmaceutical company documents for evidence of spin, investigators found company emails that contained explicit descriptions of attempts to spin study findings in this manner: 1 email with the subject line 'spinning Serpell' (Serpell was the lead study investigator) stated, 'If Pfizer wants to use, present, and publish this comparative data analysis in which 2 of 5 studies compared make the overall picture look bad, how to (sic) we make it sound better than it looks on the graphs' [33].
This category of spin included writing an overly optimistic abstract; employing an extensive rationale to explain away nonsignificance (for example, describing nonsignificant results as 'trends'); misleadingly describing the study design (to present it as more robust); and underreporting or ruling out adverse events. For example, in a sample of diagnostic accuracy studies, one study concluded, 'Detection of antigen in BAL using the MVista antigen appears to be a useful method. Additional studies are needed in patients with pulmonary histoplasmosis', whereas the abstract concluded, 'Detection of antigen in BAL fluid complements antigen detection in serum and urine as an objective test for histoplasmosis' [6]. A variety of rhetorical practices were used in the reporting of trials of implantable cardioverter defibrillators, including failure to discuss complications (9/10, 90%), compare the risks and benefits (10/10, 100%), or mention that benefits are likely to be less in clinical practice than in the clinical trial (10/10, 100%) [35].

Factors associated with spin
Authors of 19 reports (19/35, 54%) assessed whether particular factors were associated with the presence of spin, including (1) conflicts of interest and study funding; (2) author characteristics; (3) journal characteristics; and (4) study design and/or quality. However, the studies were largely too heterogeneous and sample sizes too small in most instances to draw conclusions.
None of the included studies consistently found any factors to be significantly associated with spin. The only factor that was significantly and positively associated with spin across several studies was having a nonsignificant primary endpoint, though we could not conduct a quantitative meta-analysis of these data due to the heterogeneity of included studies [15,27,34]. This finding supports researchers' focus on assessing spin in studies with nonsignificant results described above.
Conflicts of interest and funding source. Nine reports (26%) investigated the association between funding source and the presence of spin. We were able to include 7 of these (including 1,110 studies) in a meta-analysis examining the association between funding source and the presence of spin and found that industry studies were no more likely to have spin than nonindustry sponsored studies (risk ratio [RR]: 1.08; 95% confidence interval [CI]: 0.87, 1.34; I 2 = 40%) (Fig 2).

Effects of spin on readers' interpretation
Two reports (2/35, 6%) sought to examine the effect of spin on readers' interpretation, though only 1 retrospectively assessed the effect on actual decision-making.
Boutron et al. [12] conducted an RCT with clinical oncology researchers to assess the effect of spin in trial abstracts on interpretation. When abstracts contained spin, readers judged the experimental treatment as more beneficial (mean difference, 0.71; 95% CI, 0.07 to 1.35; P = 0.030) and the trial as less rigorous (mean difference, −0.59; 95% CI, −1.13 to 0.05; P = 0.034) yet still were more interested in reading the full text (mean difference, 0.77; 95% CI, 0.08 to 1.47; P = 0.029).
Only 1 study noted an effect of spin on decision-making. Roest et al. [31] compared published articles on second-generation antidepressants for anxiety with their corresponding United States Food and Drug Administration (FDA) reviews and found that, for the not-positive trials containing spin (3/16, 19%), the FDA judged these to be questionable or negative.

Discussion
This systematic review describes how spin has been explored in 35 reports, which were largely reviews of trials and observational studies with human subjects, across clinical areas. These reports documented various aspects related to the nature of spin in the included studies, which was also commonly referred to as 'discordance between study results and conclusions' or 'overextrapolation'. In general, spin is prevalent in the biomedical literature, though this varies by study design, with the highest rates found in clinical trials. However, prevalence also appeared to vary according to the trial's risk of bias and significance of primary outcomes. Spin manifests in diverse ways, which challenged investigators attempting to systematically identify and document instances of spin.
Spin was variably defined by investigators examining different bodies of biomedical research. As trials are designed to determine if an intervention is effective, authors may be motivated to interpret statistically nonsignificant findings in ways that still portray the intervention in a favourable light. In observational studies, study designs do not allow investigators to establish a causal relationship. Spin in these studies instead manifests as implying cause and effect to suggest a positive sequential relationship between an exposure and an outcome and to increase the perceived importance of the findings [40].
Spin is perhaps best understood in the context of RCTs with nonsignificant primary outcomes due to the development of a valid and reliable instrument by Boutron et al. [2], which has been applied across clinical areas. We identified 3 other valid instruments specifically for assessing spin in nonrandomised intervention studies [4], diagnostic accuracy studies [6], and systematic reviews [8]. However, researchers largely took an approach in which the nature of spin was prespecified and thus may not have fully explicated the full range of spin practices across study designs or clinical areas. This field could benefit from inductive approaches that aim to rigorously assess the diversity of spin practices, as well as evaluations of the effect of spin on those who rely upon biomedical evidence.
Our analysis identified several themes under which spin practices that occur across study designs and clinical areas can be grouped. These categories (inappropriate claims, inappropriate extrapolations or recommendations for clinical practice, selective reporting, and more robust or favourable data presentation) may be useful in educating researchers, peer reviewers, and editors about the various manifestations of spin, regardless of study type. These categories could also underpin instrument development focused on the assessment of spin that can be generalised beyond study design, which may be more useful to peer reviewers and editors of biomedical journals than tools specifically designed for clinical trials, for example.
Although investigators have hypothesised that a plethora of factors are related to the prevalence of spin, ranging from author characteristics to aspects of study design, there is very little evidence to suggest that any of these are related to the presence of spin. Industry sponsorship, which was the most common factor examined, was also not significantly associated with spin. Widening the investigation of factors contributing to spin from characteristics of individual authors or studies to the cultures and structures of research, which may incentivise or deincentivise spin, would be instructive in developing strategies to mitigate the occurrence of spin in biomedical research.
To our knowledge, this is the first methodological systematic review investigating spin in published biomedical literature across a variety of fields. Thus, the aims were exploratory, and due to the heterogeneity of studies meeting the inclusion criteria, we were not able to fully answer questions related to the nature, prevalence, or implications of spin. Other methodological systematic reviews have been conducted with regards to publication bias [41], outcome reporting bias [39], funding bias [42], and selective reporting and inclusion of results [43]. Although the concept of spin draws on features of selective reporting of results, such as giving outcome data different prominence throughout different sections of a report [43], spin involves the additional aspect of interpretation bias. This systematic review highlights that further work is needed in the area of developing instruments and standards for assessing the occurrence of spin across different study designs. Little is known about the contextual factors that contribute to spin, and even less is known about the impact of spin on research, clinical practice, or policy environment.
Despite the lack of tools to assist with the identification of spin, there are a number of safeguards that can prevent spin. First, as routinely occurs, peer reviewers and journal editors check that abstract and manuscript conclusions are consistent with the study results, for inappropriate use of causal language, and for overgeneralisation. Second, clinical practice and public health guidelines should be developed based on systematic reviews to ensure that recommendations are founded on rigorous data and not misleading conclusions. Third, promoting fully open data or inviting published interpretation of published data from multiple researchers could mitigate the occurrence of spin. Finally, structural reforms within academia are needed to change research incentives and reward structures that emphasise 'positive' conclusions, including the pressure to publish and media attention.
Our review had a few key limitations. First, there are no predefined terms for spin, resulting in difficulty with formulating a comprehensive but specific search strategy. Our search strategy involved identifying possible words and phrases that could encompass the concept of spin in scientific research and exploring how potentially included papers were indexed in MEDLINE and Embase. Additionally, we hand-searched the reference lists of included reports to identify other potential articles. However, despite these measures, it is possible that reports were missed. Second, the included reports were heterogeneous; spin was investigated in numerous different ways across multiple study designs. As a result, it was only possible to descriptively analyse the characteristics of spin that were measured in most instances. Third, it is possible that some of the included reports that focused on the same area of research may have included the same studies. However, examination of the search strategies and included studies of the included reports (where provided) suggests that overlap is unlikely.
Despite these limitations, we conducted a comprehensive search for all studies investigating spin in the biomedical literature. We did not discover any reports that investigated spin in animal studies. As these studies often lay the groundwork for future interventions to be tested with human subjects, the presence of spin could contribute to the failure to translate scientific findings into clinical trials or human applications when results do not live up to their 'hype'.
The reports included in our review noted some key limitations relevant to the investigation of spin, including the need to develop robust interpretive methodologies, as the assessment of spin is inherently open to interpretation and the thresholds for things like 'significance' are arbitrary and contextual. Future studies should consider more inductive and exploratory approaches, particularly when assessing spin in diverse study designs, as spin can manifest in variable ways. However, research that contributes to understanding how spin affects scientific, clinical, and policy decision-making, as well as the development of tools for scientists, peer reviewers, and editors, is needed.

Conclusions
Spin in biomedical research is prevalent across a range of study designs, including trials, observational studies, diagnostic accuracy studies, and systematic reviews. Included reports examined and assessed spin in a variety of ways, and the definitions and spin practices identified may vary according to the study type investigated. Further research is required to develop more comprehensive and reproducible measures of spin across research fields. Further investigation of factors contributing to spin, particularly at the cultural and structural levels of research, is needed to develop ways of reducing spin. Editors and peer reviewers should be made aware of the widespread prevalence of spin and ways to avoid it in order to ensure accurate research interpretation and dissemination.

Materials and methods
We conducted a methodological systematic review according to the PRISMA guidelines (S1 Text) [44].

Inclusion and exclusion criteria
We broadly defined 'spin' as any reporting practices that distort the interpretation of results and mislead readers so that results are viewed in a more favourable light [2]. We searched for reports that included the measurement of spin in any of its forms as at least 1 stated outcome and provided quantitative data measuring spin. We included reviews, cross-sectional studies, cohort studies, and other empirical studies. We excluded editorials, perspectives, commentaries, and papers that examined spin in publications other than published biomedical literature, such as press releases or media reports. There were no limits placed on language or date of publication.

Data sources and searching
The MEDLINE, PreMEDLINE, Embase, and Scopus (fields of Life Sciences and Health Sciences) databases were searched for articles published from 1946 (MEDLINE), 1974 (Embase), and 1960 (Scopus) through 24 November 2016. The search strategy for MEDLINE and Embase included combining (1) words and phrases that encompassed the concept of spin in biomedical research; and (2) the indexed term for research as a topic, which captured reports that investigated spin in published studies (S2 Text).

Study screening and selection
One author (KC) performed the search and screened for relevant titles and abstracts for obvious exclusions (for example, 'spin' particles in physics articles). Both KC and QG independently assessed the 127 full texts for inclusion, with LB reviewing any discrepancies and disagreements. KC and QG independently searched the reference lists of included articles for additional papers during the process of duplicate data extraction.

Data extraction
Two authors (KC and QG) independently extracted data into a collection form generated using REDCap electronic data capture tools hosted at The University of Sydney [45]. Data were collected on the following characteristics for each report: year of publication; journal name; funding source; author conflicts of interest; study design; and sample size. Data were also collected on the following characteristics regarding the included studies: field of research; time frame; definition of spin; location of spin; method of measuring spin; and all spin-related outcomes. We included all spin-related findings, whether or not they were explicitly presented as such, and extracted these findings verbatim. For example, not every report explicitly referred to spin (e.g., some reports measured 'discordance between study results and conclusions'). Any discrepancies in data extraction were reviewed and discussed until consensus was reached.

Assessing risk of bias
We categorised included reports by study design. Assessing risk of bias was not possible due to the heterogeneity in the study designs of our included reports and in the outcomes measured to assess spin. Furthermore, we did not wish to exclude any reports of low quality, due to the exploratory nature of this review.

Synthesis of results
We calculated frequencies where possible for report and study characteristics. For unstructured data, we conducted a descriptive and thematic analysis with the goal of presenting the full range of findings.
We grouped the reports' findings inductively according to spin-related outcome measures. This meant extracting all spin-related data reported in each of the included reports into an Excel spread sheet as 'Findings.' Then, we grouped these extracted data into categories based on shared characteristics; for example, all the frequency measures were grouped as 'prevalence' and any measure of the association between the occurrence of spin and an author, study, or reporting characteristic as 'factors associated with spin'. The final categories included: how spin was defined, prevalence of spin, level of spin, practices used to spin results, and factors associated with spin. These categories were not predetermined but were expanded and added until all spin-related findings were accounted for.