Epidemiology and Reporting Characteristics of Systematic Reviews of Biomedical Research: A Cross-Sectional Study

Background Systematic reviews (SRs) can help decision makers interpret the deluge of published biomedical literature. However, a SR may be of limited use if the methods used to conduct the SR are flawed, and reporting of the SR is incomplete. To our knowledge, since 2004 there has been no cross-sectional study of the prevalence, focus, and completeness of reporting of SRs across different specialties. Therefore, the aim of our study was to investigate the epidemiological and reporting characteristics of a more recent cross-section of SRs. Methods and Findings We searched MEDLINE to identify potentially eligible SRs indexed during the month of February 2014. Citations were screened using prespecified eligibility criteria. Epidemiological and reporting characteristics of a random sample of 300 SRs were extracted by one reviewer, with a 10% sample extracted in duplicate. We compared characteristics of Cochrane versus non-Cochrane reviews, and the 2014 sample of SRs versus a 2004 sample of SRs. We identified 682 SRs, suggesting that more than 8,000 SRs are being indexed in MEDLINE annually, corresponding to a 3-fold increase over the last decade. The majority of SRs addressed a therapeutic question and were conducted by authors based in China, the UK, or the US; they included a median of 15 studies involving 2,072 participants. Meta-analysis was performed in 63% of SRs, mostly using standard pairwise methods. Study risk of bias/quality assessment was performed in 70% of SRs but was rarely incorporated into the analysis (16%). Few SRs (7%) searched sources of unpublished data, and the risk of publication bias was considered in less than half of SRs. Reporting quality was highly variable; at least a third of SRs did not report use of a SR protocol, eligibility criteria relating to publication status, years of coverage of the search, a full Boolean search logic for at least one database, methods for data extraction, methods for study risk of bias assessment, a primary outcome, an abstract conclusion that incorporated study limitations, or the funding source of the SR. Cochrane SRs, which accounted for 15% of the sample, had more complete reporting than all other types of SRs. Reporting has generally improved since 2004, but remains suboptimal for many characteristics. Conclusions An increasing number of SRs are being published, and many are poorly conducted and reported. Strategies are needed to help reduce this avoidable waste in research.

• Systematic reviews, which explicitly use methods to identify, select, critically appraise, and synthesize the results of all existing studies of a given question, are considered the highest level of evidence for decision makers.
• We wanted to know how many systematic reviews of biomedical research are being published, what questions they are addressing, and how well the methods are reported, since information of this sort has not been collected since 2004.
What Did the Researchers Do and Find?
• We looked for all systematic reviews added to the main bibliographic database for biomedical literature during one month (February 2014), and recorded the characteristics of these reviews.
• We found 682 systematic reviews-a 3-fold increase over the last decade-that addressed a wide range of topics.
• In many cases, important aspects of the methods used were not reported (for example, at least a third of the reviews did not report how they searched for studies or how they assessed the quality of the included studies), unpublished data was rarely sought, and at least a third of the reviews used statistical methods discouraged by leading organizations that have developed guidance for systematic reviews (for example, Cochrane and the Institute of Medicine).

Introduction
Biomedical and public health research is conducted on a massive scale; for instance, more than 750,000 publications were indexed in MEDLINE in 2014 [1]. Systematic reviews (SRs) of research studies can help users make sense of this vast literature, by synthesizing the results of studies that address a particular question, in a rigorous and replicable way. By examining the accumulated body of evidence rather than the results of single studies, SRs can provide more reliable results for a range of health care enquiries (e.g., what are the benefits and harms of therapeutic interventions, what is the accuracy of diagnostic tests) [2,3]. SRs can also identify gaps in knowledge and inform future research agendas. However, a SR may be of limited use to decision makers if the methods used to conduct the SR are flawed, and reporting of the SR is incomplete [4,5]. Moher et al. previously investigated the prevalence of SRs in the biomedical literature and their quality of reporting [6]. A search of MEDLINE in November 2004 identified 300 SRs indexed in that month, which corresponded to an annual publication prevalence of 2,500 SRs. The majority of SRs (71%) focused on a therapeutic question (as opposed to a diagnosis, prognosis, or epidemiological question), and 20% were Cochrane SRs. The reporting quality varied, with only 66% reporting the years of their search, 69% assessing study risk of bias/quality, 50% using the term "systematic review" or "meta-analysis" in the title or abstract, 23% formally assessing evidence for publication bias, and 60% reporting the funding source of the SR.
The publication landscape for SRs has changed considerably in the subsequent decade. Major events include the publication of the PRISMA reporting guidelines for SRs [7,8] and SR abstracts [9] and their subsequent endorsement in top journals, the launch of the Institute of Medicine's standards for SRs of comparative effectiveness research [10], methodological developments such as a new tool to assess the risk of bias in randomized trials included in SRs [11], and the proliferation of open-access journals to disseminate health and medical research findings, in particular Systematic Reviews, a journal specifically for completed SRs, their protocols, and associated research [12]. Other studies have examined in more recent samples either the prevalence of SRs (e.g., [13]) or reporting characteristics of SRs in specific fields (e.g., physical therapy [14], complementary and alternative medicine [15], and radiology [16]). However, to our knowledge, since the 2004 sample, there has been no cross-sectional study of the characteristics of SRs across different specialties. Therefore, we considered it timely to explore the prevalence and focus of a more recent cross-section of SRs, and to assess whether reporting quality has improved over time.
The primary objective of this study was to investigate the epidemiological and reporting characteristics of SRs indexed in MEDLINE during the month of February 2014. Secondary objectives were to explore (1) how the characteristics of different types of reviews (e.g., therapeutic, epidemiology, diagnosis) vary; (2) whether the reporting quality of therapeutic SRs is associated with whether a SR was a Cochrane review and with self-reported use of the PRISMA Statement; and (3) how the current sample of SRs differs to the sample of SRs in 2004.

Study Protocol
We prespecified our methods in a study protocol (S1 Protocol).

Eligibility Criteria
We included articles that we considered to meet the PRISMA-P definition of a SR [17,18], that is, articles that explicitly stated methods to identify studies (i.e., a search strategy), explicitly stated methods of study selection (e.g., eligibility criteria and selection process), and explicitly described methods of synthesis (or other type of summary). We did not exclude SRs based on the type of methods they used (e.g., an assessment of the validity of findings of included studies could be reported using a structured tool or informally in the limitations section of the Discussion). Also, we did not exclude SRs based on the level of detail they reported about their methods (e.g., authors could present a line-by-line Boolean search strategy or just list the key words they used in the search). Further, we included articles regardless of the SR question (e.g., therapeutic, diagnostic, etiology) and the types of studies included (e.g., quantitative or qualitative). We included only published SRs that were written in English, to be consistent with the previous study [6].
We used the PRISMA-P definition of SRs because it is in accordance with the definition reported in the PRISMA Statement [7] and with that used by Cochrane [19], by the Agency for Healthcare Research and Quality's Evidence-based Practice Centers Program [20], and in the 2011 guidance from the Institute of Medicine [10]. Further, the SR definition used by Moher et al. for the 2004 sample ("the authors' stated objective was to summarize evidence from multiple studies and the article described explicit methods, regardless of the details provided") [6] ignores the evolution of SR terminology over time.
We excluded the following types of articles: narrative/non-systematic literature reviews; non-systematic literature reviews with meta-analysis or meta-synthesis, where the authors conducted a meta-analysis or meta-synthesis of studies but did not use SR methods to identify and select the studies; articles described by the authors as "rapid reviews" or literature reviews produced using accelerated or abbreviated SR methods; overviews of reviews (or umbrella reviews); scoping reviews; methodology studies that included a systematic search for studies to evaluate some aspect of conduct/reporting (e.g., assessments of the extent to which all trials published in 2012 adhered to the CONSORT Statement); and protocols or summaries of SRs.
This search strategy retrieved records that were indexed, rather than published, in February 2014. An information specialist (M. S.) ran a modified search strategy to retrieve records in each of the 3 mo prior to and following February 2014, which showed that the number of records entered into MEDLINE during February 2014 was representative of these other months.

Screening
Screening was undertaken using online review software, DistillerSR. A form for screening of titles and abstracts (see S1 Forms) was used after being piloted on three records. Subsequently, three reviewers (M. J. P., L. S., and L. L.) screened all titles and abstracts using the method of liberal acceleration, whereby two reviewers needed to independently exclude a record for it to be excluded, while only one reviewer needed to include a record for it to be included. We retrieved the full text article for any citations meeting our eligibility criteria or for which eligibility remained unclear. A form for screening full text articles (see S1 Forms) was also piloted on three articles. Subsequently, two authors (of M. J. P., L. S., L. L., R. S.-O., E. K. R., and J. T.) independently screened each full text article. Any discrepancies in screening of titles/abstracts and full text articles were resolved via discussion, with adjudication by a third reviewer if necessary. In both these rounds of screening, articles were considered a SR if they met the Moher et al. [6] definition of a SR. Each full text article marked as eligible for inclusion was then screened a final time by one of two authors (M. J. P. or L. S.) to confirm that the article was consistent with the PRISMA-P 2015 definition of a SR (using the screening form in S1 Forms).

Data Extraction and Verification
We performed data extraction on a random sample of 300 of the included SRs, which were selected using the random number generator in Microsoft Excel. We selected 300 SRs to match the number used in the 2004 sample. Sampling was stratified so that the proportion of Cochrane reviews in the selected sample equaled that in the total sample. Data were collected in DistillerSR using a standardized data extraction form including 88 items (see S1 Forms). The items were based on data collected in two previous studies [6,15] and included additional items to capture some issues not previously examined. All data extractors piloted the form on three SRs to ensure consistency in interpretation of data items. Subsequently, data from each SR were extracted by one of five reviewers (M. J. P., L. S., F. C.-L., R. S.-O., or E. K. R.). Data were extracted from both the article and any web-based appendices available.
At the end of data extraction, a 10% random sample of SRs (n = 30 SRs) was extracted independently in duplicate. Comparison of the data extracted revealed 42 items where a discrepancy existed between two reviewers on at least one occasion (items marked in S1 Forms). All discrepancies were resolved via discussion. To minimize errors in the remaining sample of SRs, one author (M. J. P.) verified the data for these 42 items in all SRs. Also, one author (M. J. P.) reviewed the free text responses of all items with an "Other (please specify)" option. Responses were modified if it was judged that one of the forced-choice options was a more appropriate selection.

Data Analysis
All analyses were performed using Stata version 13 software [21]. Data were summarized as frequency and percentage for categorical items and median and interquartile range (IQR) for continuous items. We analyzed characteristics of all SRs and SRs categorized as Cochrane therapeutic (treatment/prevention), non-Cochrane therapeutic, epidemiology (e.g., prevalence, association between exposure and outcome), diagnosis/prognosis (e.g., diagnostic test accuracy, clinical prediction rules), or other (education, psychometric properties of scales, cost of illness). We anticipated that these different types of SRs would differ in the types of studies included (e.g., therapeutic SRs would more likely include randomized trials than epidemiology SRs). However, we considered nearly all of the other epidemiological and reporting characteristics as equally applicable to all types of SRs (i.e., all SRs, regardless of focus, should describe the methods of study identification, selection, appraisal, and synthesis). We have indicated in tables when a characteristic was not applicable (e.g., reporting of harms of interventions was only considered in therapeutic SRs). We also present, for all characteristics measured in the Moher et al. [6] sample, the percentage of SRs with each characteristic in 2004 compared with in 2014.
We explored whether the reporting of 26 characteristics of therapeutic SRs was associated with a SR being a Cochrane review and with self-reported use of the PRISMA Statement to guide conduct/reporting. The 26 characteristics were selected because they focused on whether a characteristic was reported or not (e.g., "Were eligible study designs stated?") rather than on the detail provided (e.g., "Which study designs were eligible?"). We also explored whether the reporting of 15 characteristics of all SRs differed between the 2004 and 2014 samples (only 15 of the 26 characteristics were measured in both samples). Associations were quantified using the risk ratio, with 95% confidence intervals. The risk ratio was calculated because it is generally more interpretable than the odds ratio [22]. For the analysis of the association of reporting of characteristics with PRISMA use, we included only therapeutic SRs because PRISMA was designed primarily for this type of SR, and we excluded Cochrane SRs because they are supported by software that promotes good reporting.
The analyses described above were all prespecified before analyzing any data. The association between reporting and self-reported use of the PRISMA Statement was not included in our study protocol; it was planned following the completion of data extraction and prior to analysis. We planned to explore whether reporting characteristics of non-Cochrane SRs that were registered differed to non-Cochrane SRs that were unregistered, and whether reporting differed in SRs with a protocol compared to SRs without a protocol. However, there were too few SRs with a registration record (n = 12) or a SR protocol (n = 5) to permit reliable comparisons.
We performed a post hoc sensitivity analysis to see if the estimated prevalence of SRs was influenced by including articles that met the definition of a SR used by Moher et al. [6] but not the more explicit PRISMA-P 2015 definition.

Search Results
There were 2,337 records identified by the search (Fig 1). Screening of titles and abstracts led to the exclusion of 738 records. Of the 1,599 full text articles retrieved, 917 were excluded; most articles were not SRs but rather another type of knowledge synthesis (e.g., narrative review, scoping review, overview of reviews).
in the final sample of 682 SRs but that would have met the less explicit definition of a SR used by Moher et al. [6] for the 2004 sample. In a sensitivity analysis, adding these to the final sample raised the SR prevalence to 769 SRs indexed in the month of February 2014, which is equivalent to more than 9,000 SRs per year and 25 SRs per day being indexed in MEDLINE.     ) were classified as therapeutic, 74/300 (25%) as epidemiology, 33/300 (11%) as diagnosis/prognosis, and 29/300 (10%) as other. All Cochrane SRs focused on a therapeutic question. There was wide diversity in the clinical conditions investigated; 20 ICD-10 codes were recorded across the SRs, with neoplasms, infections and parasitic diseases, and diseases of the circulatory system the most common (each investigated in >10% of SRs). All SRs were written in English.
Most SRs (263/300 [88%]) were published in specialty journals ( Table 2). Only 31/300 (10%) were updates of a previous SR. The majority of these were Cochrane SRs (25/31 [81%]); only one diagnosis/prognosis SR and no epidemiology SRs were described as an update. Of the therapeutic SRs, 76/164 (46%) investigated a pharmacological intervention, 75/164 (46%) investigated a non-pharmacological intervention, and 13/164 (8%) investigated both types of intervention. A median of 15 studies involving 2,072 participants were included in the SRs overall, but the number of studies and participants varied according to the type of SR. Cochrane SRs included fewer studies (median 9 versus 14 in non-Cochrane therapeutic SRs), and epidemiology SRs included a larger number of participants (median 8,154 versus 1,449 in non-epidemiology SRs). Only 4/300 (1%) SRs were "empty reviews" (i.e., identified no eligible studies). Meta-analysis was performed in 189/300 (63%) SRs, with a median of 9 (IQR 6-17) studies included in the largest meta-analysis in each SR that included one or more meta-analyses. Harms of interventions were collected (or planned to be collected) in 113/164 (69%) therapeutic SRs. Few SRs (23/172 [13%]) considered costs associated with interventions or illness.

Reporting Characteristics of SRs
Here we summarize a subset of the characteristics of the SRs about which data were collected in this study (Table 3; data for all items are presented in S1 Results). Many SRs (254/300 [85%]) included the term "systematic review" or "meta-analysis" in the title or abstract. This percentage increased to 94% (239/255) when Cochrane SRs, which generally do not include these terms in their title, were omitted. A few SRs (12/300 [4%]) had been prospectively registered (e.g., in PROSPERO). In rather more SRs, authors mentioned working from a review protocol (77/300 [26%]), but a publicly accessible protocol was cited in only 49/300 (16%). These figures were driven almost entirely by Cochrane SRs; a publicly accessible protocol was cited in only 5/119 (4%) non-Cochrane therapeutic SRs and in no epidemiology, diagnosis/prognosis, or other SRs. Authors reported using a reporting guideline (e.g., PRISMA [7], MOOSE [23]) in 87/300 (29%) SRs. The purpose of these guidelines was frequently misinterpreted; in 45/87 (52%) of these SRs, it was stated that the reporting guideline was used to guide the conduct, not the reporting, of the SR (S1 Results). In 93/255 (36%) non-Cochrane SRs, authors reported using Cochrane methods (e.g., cited the Cochrane Handbook for Systematic Reviews of Interventions [19]).

Reporting of Study Eligibility Criteria
At least one eligibility criterion was reported in the majority of SRs, but there was wide variation in the content and quality of reporting. In 116/300 (39%) SRs, authors specified that both published and unpublished studies were eligible for inclusion, while a quarter restricted inclusion to published studies (80/300 [27%]). However, in 103/300 (34%) SRs, publication status criteria were not reported. Language criteria were reported in 252/300 (84%) SRs, with more SRs considering all languages (129/300 [43%]) than considering English only (92/300 [31%]). Study design inclusion criteria were stated in 237/300 (79%) SRs. Nearly all Cochrane SRs (40/ 45 [89%]) restricted inclusion to randomized or quasi-randomized controlled trials, whereas only 64/119 (54%) non-Cochrane therapeutic SRs did so. Epidemiology, diagnosis/prognosis, and other SRs included a range of study designs, mostly observational (e.g., cohort, casecontrol). Empty review (no eligible studies) Number of studies included in the largest meta-analysis in each SR that included meta-analysis 9 (6-17) 6 (3-11) 8 ( Only one database searched Years of coverage reported Other sources searched      [24]). Risk of bias information was incorporated into the analysis (e.g., via subgroup or sensitivity analyses) in only 31/189 (16%) SRs with meta-analysis.

Reporting of Review Outcomes
Authors collected data on a median of four (IQR 2-6) outcomes; however, no outcomes were specified in the methods section of 66/300 (22%) SRs. An outcome was described in the SR as "primary" in less than half of the SRs (136/288 [47%]). Given that we did not seek SR protocols, it is unclear how often the primary outcome was selected a priori. Most primary outcomes were dichotomous (91/136 [67%]). Of the 99 therapeutic SRs with a primary outcome, a pvalue or 95% confidence interval was reported for 88/99 (89%) of the primary outcome intervention effect estimates. Of these estimates, 53/88 (60%) were statistically significant and favourable to the intervention, while none were statistically significantly unfavourable to the intervention.

Influence of Cochrane Status and Self-Reported Use of PRISMA on Reporting Characteristics
Nearly all of the 26 reporting characteristics of therapeutic SRs that we analyzed according to Cochrane status and self-reported use of the PRISMA Statement were reported more often in SRs that were produced by Cochrane (Fig 2) or that reported that they used the PRISMA Statement (Fig 3). The differences were larger and more often statistically significant in the Cochrane versus non-Cochrane comparison.

Comparison with 2004 Sample of SRs
The SRs we examined differed in several ways from the November 2004 sample. In 2014, review author teams were larger, and many more SRs were produced by Chinese authors (up from <3% of all SRs in 2004 to 21% in 2014) ( Table 4). The proportion of therapeutic SRs decreased (from 71% to 55% of all SRs), coupled with a rise in epidemiological SRs over the decade (from 13% to 25%). Five ICD-10 categories-neoplasms, infections, diseases of the circulatory system, diseases of the digestive system, and mental and behaviour disorders-were the most common conditions in both samples, while two other categories were common in 2014: diseases of the musculoskeletal system and endocrine, nutritional, and metabolic diseases. The proportion of SRs that were Cochrane reviews slightly decreased (from 20% to 15% of all SRs). There were more updates of previous Cochrane therapeutic SRs in the 2014 sample (38% in 2004 versus 56% in 2014) (Table 5). Also, more SRs included meta-analysis (52% in 2004 versus 63% in 2014). In contrast, there was a reduction in the median number of included studies in SRs in 2014, and in the percentage of therapeutic SRs considering harms (75% in 2004 versus 69% in 2014) or cost (24% versus 13%).
Many characteristics were reported more often in 2014 than in 2004 (Fig 4); however, the extent of improvement varied depending on the item and the type of SR ( Table 6). The following were reported much more often regardless of the type of SR: eligible language criteria (55% of all SRs in 2004 versus 84% in 2014), review flow (42% versus 78% of all SRs), and reasons for exclusion of full text articles (48% versus 70% of all SRs). Risk of bias/quality assessment was reported more often in 2014 in non-Cochrane therapeutic (49% in 2004 versus 74% in 2014),  Despite the large improvements for some SR types, many of the 2014 percentages are less than ideal. Further, there was little change or a slight worsening in the reporting of several features, including the mention of a SR protocol in non-Cochrane SRs (14% in 2004 versus 13% in 2014), presentation of a full Boolean search strategy for at least one database (42% versus 45% of all SRs), and reporting of funding source of the SR (59% versus 64% of all SRs).

Discussion
We estimate that more than 8,000 SRs are being indexed in MEDLINE annually, corresponding to a 3-fold increase over the last decade. The majority of SRs indexed in February 2014 addressed a therapeutic question and were conducted by authors based in China, the UK, or the US; they included a median of 15 studies involving 2,072 participants. Meta-analysis was performed in 63% of SRs, mostly using standard pairwise methods. Study risk of bias/quality assessment was performed in 70% of SRs, but rarely incorporated into the analysis (16%). Few SRs (7%) searched sources of unpublished data, and the risk of publication bias was considered in less than half of SRs. Reporting quality was highly variable; at least a third of SRs did not mention using a SR protocol or did not report eligibility criteria relating to publication status, years of coverage of the search, a full Boolean search logic for at least one database, methods for data extraction, methods for study risk of bias assessment, a primary outcome, an abstract conclusion that incorporated study limitations, or the funding source of the SR. Cochrane SRs, which accounted for 15% of the sample, had more complete reporting than all other types of SRs. Reporting has generally improved since 2004, but remains suboptimal for many characteristics.

Explanation of Results and Implications
The increase in SR production from 2004 to 2014 may be explained by several positive changes over the decade. The scientific community and health care practitioners may have increasingly recognized that the deluge of published research over the decade requires integration, and that a synthesis of the literature is more reliable than relying on the results of single studies. Some funding agencies (e.g., the UK National Institute for Health Research and the Canadian Institutes of Health Research) now require applicants to justify their applications for research funding with reference to a SR, which the applicants themselves must perform if one does not exist [26]. Further, some countries, particularly China, have developed a research culture that places a strong emphasis on the production of SRs [27]. Also, the development of free software to perform meta-analyses (e.g., RevMan [28], MetaXL [29], R [30,31]) has likely contributed to its increased use. There are also some unsavory reasons for the proliferation of SRs. In recent years, some countries have initiated financial incentives to increase publication rates (e.g., more funding for institutions that publish more articles or cash bonuses to individuals per article published) [32]. Further, appointment and promotion committees often place great emphasis on the number of publications an investigator has, rather than on the rigor, transparency, and reproducibility of the research [4]. Coupled with the growing recognition of the value of SRs,  Other "Systematic review" or "meta-analysis" used in the title or abstract

Eligible languages
All languages considered 37% " 43% Mixed (English and a specific language other than English) 2% " 10% Eligibility criteria based on study designs reported 72% " 79% Years of coverage reported investigators may be strongly motivated to publish a large number of SRs, regardless of whether they have the necessary skills to perform them well. In addition, the proliferation of new journals over the decade has made it more likely that authors can successfully submit a SR for publication regardless of whether one on the same topic has been published elsewhere. This has resulted in a large number of overlapping SRs (one estimate suggests 67% of meta-analyses have at least one overlapping meta-analysis within a 3-y period) [33]. Such overlap of SR questions is not possible in the Cochrane Database of Systematic Reviews, which may explain why the proportion of Cochrane SRs within the broader SR landscape has diminished. The conduct of SRs was good in some respects, but not others. Examples of good conduct are that nearly all SRs searched more than one bibliographic database, and the majority performed dual-author screening, data extraction, and risk of bias assessment. However, few SRs searched sources of unpublished data (e.g., trial registries, regulatory databases), despite their ability to reduce the impact of reporting biases [34,35]. Also, an appreciable proportion of SRs (particularly epidemiology and diagnosis/prognosis SRs) did not assess the risk of bias/quality of the included studies. In addition, the choice of meta-analysis model in many SRs was guided by heterogeneity statistics (e.g., I 2 ), a practice strongly discouraged by leading SR organizations because of the low reliability of these statistics [19,20]. It is therefore possible that some systematic reviewers inappropriately generated summary estimates, by ignoring clinical heterogeneity Primary outcome specified 51% # 47% 77% " 96% 37% " 48% 25% " 40% 15% " 18% 24% # 3%

Statistical heterogeneity investigated
Using statistical methods or qualitatively assessed (e.g., via narrative discussion) Using statistical methods when meta-analysis performed Risk of publication bias assessed (or intent to assess) Possibility of publication bias discussed/ considered in results, discussion, or conclusion Source of funding of the SR Authors specified there was no funding 1% " 13% 0% " 11% 2% " 12% 0% " 15% 0% " 21% 2% " 7% when statistical heterogeneity was perceived to be low. Further, a suboptimal number of therapeutic SRs considered the harms of interventions. It is possible that review authors did not comment on harms when none were identified in the included studies. However, reporting of both zero and non-zero harm events is necessary so that patients and clinicians can determine the risk-benefit profile of an intervention [36]. To reduce the avoidable waste associated with these examples of poor conduct of SRs, strategies such as formal training of biomedical researchers in research design and analysis and the involvement of statisticians and methodologists in SRs are warranted [4]. Cochrane SRs continue to differ from their non-Cochrane counterparts. Completeness of reporting is superior in Cochrane SRs, possibly due to the use of strategies in the editorial process that promote good reporting (such as use of the Methodological Expectations of Cochrane Intervention Reviews [MECIR] standards [37]). Also, word limits or unavailability of online appendices in some non-Cochrane journals may lead to less detailed reporting. Cochrane SRs tend to include fewer studies, which may be partly due to the reviews more often restricting inclusion to randomized trials only. However, fewer studies being included could also result from having a narrower review question (in terms of the patients, interventions, and outcomes that are addressed). Further research should explore the extent to which Cochrane and non-Cochrane SRs differ in scope, and hence applicability to clinical practice.
It is notable that reporting of only a few characteristics improved substantially over the decade. For example, the SRs in the 2014 sample were much more likely to present a review flow and reasons for excluded studies than SRs in the 2004 sample. This was most often done using a PRISMA flow diagram [7], suggesting that this component of the PRISMA Statement has been successfully adopted by the SR community. However, 2014 SRs were slightly less likely than their 2004 counterparts to identify an outcome as "primary" and to report both the start and end years of the search, and the number of SRs reporting the source of funding increased only marginally. We do not believe that the smaller changes in reporting of some characteristics is due to their receiving less emphasis in the original paper by Moher et al. [6] or the PRISMA Statement [7], because neither emphasized any characteristic over others. Therefore, more research is needed to determine which characteristics authors think are less important to report in a SR, and why.
Mention of the PRISMA Statement [7], perhaps a surrogate for actual use, appears to be associated with more complete reporting. However, reporting of many SRs remains poor despite the availability of the PRISMA Statement since 2009. There are several possible reasons for this. Some authors may still be unaware of PRISMA or assume that they already know how to report a SR completely. The extent to which journals endorse PRISMA is highly variable, with some explicitly requiring authors to submit a completed checklist at the time of manuscript submission, others only recommending its use in the instructions to authors, and many not referring to it at all [38,39]. Also, some PRISMA items include multiple elements (e.g., item 7 asks authors to describe the databases searched, whether authors were contacted to identify additional trials, the years of coverage of the databases searched, and the date of the last search). Some authors may assume that they have adequately addressed an item if they report at least one element. Also, authors may consider PRISMA only after spending hours drafting and refining their manuscript with co-authors, a point when they may be less likely to make the required changes [40].
Our findings suggest that strategies other than the passive dissemination of reporting guidelines are needed to address the poor reporting of SRs. One strategy is to develop software that facilitates the completeness of SR reporting [41]. For example, Barnes and colleagues recently developed an online writing tool based on the CONSORT Statement [40]. The tool is meant to be used by authors when writing the first draft of a randomized trial report and consists of bullet points detailing all the key elements of the corresponding CONSORT item(s) to be reported, with examples of good practice. Medical students randomly assigned to use the tool over a 4-h period reported trial methods more completely [40]; thus, a similar tool based on the PRISMA Statement is worth exploring. Also, journal editors could receive certified training in how to endorse and implement PRISMA and facilitate its use by peer reviewers [42]. Further, collaboration between key stakeholders (funders, journals, academic institutions) is needed to address poor reporting [26].
More research is needed on the risk of bias in SRs that is associated with particular methods. We observed that a considerable proportion of therapeutic SRs (40%) had potentially misleading conclusions because the limitations of the evidence on which the conclusions were based were not taken into consideration. This is a problem because some users of SRs only have access to the abstract and may be influenced by the misleading conclusions to implement interventions that are either ineffective or harmful [9]. We have not explored in this study the extent to which the results of the SRs we examined were biased. Such bias can occur for several reasons, including use of inappropriate eligibility criteria, failure to use methods that minimize error in data collection, selective inclusion of the most favourable results from study reports, inability to access unpublished studies, and inappropriate synthesis of clinically heterogeneous studies [43]. Determining how often the results of SRs are biased is important because major users of SRs such as clinical practice guideline developers tend to rely on the results (e.g., intervention effect estimates) rather than conclusions when formulating recommendations [44]. We only recorded whether methodological characteristics were reported or not, rather than evaluating how optimal each method was. Further, exploring whether non-reporting of a method is associated with biased results is problematic, because non-compliance with reporting guidelines is not necessarily an indicator of a SR's methodological quality. That is, some review authors may use optimal methods but fail to clearly specify those methods for non-biasrelated reasons (e.g., word limits). In future, investigators could apply to the SRs in our sample a tool such as the ROBIS tool [45], which guides appraisers to make judgements about the risk of bias at the SR level (rather than the study level) due to several aspects of SR conduct and reporting.

Strengths and Limitations
There are several strengths of our methods. We used a validated search filter to identify SRs, and screened each full text article twice to confirm that it met the eligibility criteria. Screening each article provides a more reliable estimate of SR prevalence than relying on the search filters for SRs, which we found retrieved many non-systematic reviews and other knowledge syntheses. We did not restrict inclusion based on the focus of the SR and, thus, unlike previous studies [14,15], were able to collect data on a broader cross-section of SRs.
There are also some limitations to our study. Our results reflect what was reported in the articles, and it is possible that some SRs were conducted more rigorously than was specified in the report, and vice versa. Our findings may not generalize to SRs indexed outside of MED-LINE or published in a language other than English. Two authors independently and in duplicate extracted data on only a 10% random sample of SRs. We attempted to minimize data extraction errors by independently verifying data for 42/88 "problematic items" (i.e., those where there was at least one discrepancy between two authors in the 10% random sample). We cannot exclude the possibility of errors in the non-verified data items, although we consider the risk to be low given that the error rate for these items was 0% in the random sample. Also, our results concerning some types of SRs (e.g., diagnosis/prognosis, other) were based on small samples, so should be interpreted with caution. Further, searching for articles indexed in MEDLINE, rather than published, during the specified time frame means that we examined a small number of SRs (8/300 [3%]) with more than a year's delay in indexing after publication. However, inclusion of these few articles is unlikely to have affected our findings.
Some terminology contained within the PRISMA-P definition of a SR may be interpreted differently by different readers (e.g., "systematic search" and "explicit, reproducible methodology"). Hence, it is possible that others applying the PRISMA-P definition may have reached a slightly different estimate of SR prevalence than we did. We tried to address this by also reporting a SR prevalence that included articles consistent with the less explicit definition used by Moher et al. [6]. Also, any observed improvements in reporting since 2004 may partly be attributed to our use of a more stringent definition of SRs in 2014, which required articles to meet more minimum reporting requirements. Hence, we may have slightly overestimated the improvements in reporting from 2004 to 2014 and underestimated the true scale of poor reporting in SRs.

Conclusion
An increasing number of SRs are being published, and many are poorly conducted and reported. This is wasteful for several reasons. Poor conduct can lead to SRs with misleading results, while poor reporting prevents users from being able to determine the validity of the methods used. Strategies are needed to increase the value of SRs to patients, health care practitioners, and policy makers.