Methodological quality of systematic reviews referenced in clinical practice guidelines for the treatment of opioid use disorder

Introduction With efforts to combat opioid use disorder, there is an increased interest in clinical practice guidelines (CPGs) for opioid use disorder treatments. No literature exists examining the quality of systematic reviews used in opioid use disorder CPGs. This study aims to describe the methodological quality and reporting clarity of systematic reviews (SRs) used to create CPGs for opioid use disorder. Methods From June to July 2016 guideline clearinghouses and medical literature databases were searched for relevant CPGs used in the treatment of opioid use disorder. Included CPGs must have been recognized by a national organization. SRs from the reference section of each CPG was scored by using AMSTAR (a measurement tool to assess the methodological quality of systematic reviews) tool and PRISMA (preferred reporting items for systematic reviews and meta-analyses) checklist. Results Seventeen CPGs from 2006–2016 were included in the review. From these, 57 unique SRs were extracted. SRS comprised 0.28% to 17.92% of all references found in the CPGs. All SRs obtained moderate or high methodological quality score on the AMSTAR tool. All reviews met at least 70% of PRISMA criteria. In PRISMA, underperforming areas included accurate title labeling, protocol registration, and risk of bias. Underperforming areas in AMSTAR included conflicts of interest, funding, and publication bias. A positive correlation was found between AMSTAR and PRISMA scores (r = .79). Conclusion Although the SRs in the CPGs were of good quality, there are still areas for improvement. Systematic reviewers should consult PRISMA and AMSTAR when conducting and reporting reviews. It is important for CPG developers to consider methodological quality as a factor when developing CPG recommendations, recognizing that the quality of systematic reviews underpinning guidelines does not necessarily correspond to the quality of the guideline itself.

when developing CPG recommendations, recognizing that the quality of systematic reviews underpinning guidelines does not necessarily correspond to the quality of the guideline itself.
quality or quality of the underlying evidence) [18][19][20][21][22][23]. As such, researchers must be able to evaluate the quality of evidence underlying CPGs because any translation of industry bias into patient care could prove detrimental.
This study aims to (1) identify the methodological quality and clarity of reporting in SRs underlying CPGs for opioid use disorder, (2) describe the variation in SR quality in CPGs published by different professional medical associations, and (3) outline the variation in SR quality of opioid use disorder CPGs between the United States and other countries with opioid use disorder treatment guidelines.

Protocol development and registration
Search strategies, eligibility criteria, and data abstraction were pre-specified in the research protocol developed and piloted a priori. This study did not meet the regulatory definition of human subject research as defined in 45 CFR 46.102(d) and (f) of the Department of Health and Human Services' Code of Federal Regulations ("45 CFR 46", 2016), and it was not subject to Institutional Review Board oversight. To ensure best practices in data abstraction and management, we consulted Li et al 2015 [24], the Cochrane Handbook for Systematic Reviews of Interventions [25], and the National Academies of Science, Engineering and Medicine's (previously the Institute of Medicine) Standards for Systematic Reviews [26]. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [27] for systematic review and Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines [28] for descriptive statistics were applied when relevant. Before initiating the study, we registered it with the University hospital Medical Information Network Clinical Trial Registry (UMIN-CTR, UMIN000023126), and study data are publically available on figshare (https:// dx.doi.org/10.6084/m9.figshare.3496781). ). This search string was based on a Cochrane systematic review search strategy designed to identify studies for opioid dependence [29]. We modified the Canadian Agency for Drugs and Technologies in Health's Information Services Filters Working Group's search hedge for locating clinical practice guidelines in PubMed [30].

Identification of eligible clinical practice guidelines
After identifying relevant CPGs from these searches, A.R. reviewed their reference sections to identify additional CPGs that were not previously located. We included CPGs published between January 1, 2006, and June 1, 2016. We defined the term "clinical practice guideline" a priori as "statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options," using the National Academies of Science, Engineering and Medicine's definition [31]. To be eligible, CPGs had to have been recognized by a national, governmental, or professional organization. For CPGs with multiple versions, we used the most recent version. If CPGs published addendums, the addendum was also included. To reduce errors through translation, we opted a priori to only include guidelines published in English; however, no guidelines were ultimately excluded based on this criterion.

Identification of eligible systematic reviews
Two authors (A.R. and J.R.) searched the reference section of each guideline with keyword searches (e.g., "systematic", "meta-", "rev", "Cochrane") to identify SRs or meta-analyses. No disagreements occurred between the authors, so no third party adjudication was needed. A priori, we used the National Academies of Science, Engineering and Medicine's definition for an SR: "a scientific investigation that focuses on a specific question and uses explicit, pre-specified scientific methods to identify, select, assess, and summarize the findings of similar but separate studies. It may include a quantitative synthesis (meta-analysis), depending on the available data" [31]. This definition was selected to be as inclusive as possible. We did not use definitions that set standards for the number of databases searched or required a meta-analysis to avoid conflict with our assessment tools. To be eligible, a SR had to have been referenced in an eligible guideline.

Data abstraction and scoring
Prior to abstraction and scoring, all authors were trained using video modules and detailed tutorials developed by P.S. that outlined the process. Authors then completed a piloted practice exercise to become acquainted with scoring and abstraction. Four authors (A.R., J.R., R.R., and J.H.) independently completed abstraction and scoring on a subset of SRs, using piloted abstraction forms. Following scoring, each score was verified by a second author. Disagreements were resolved by consensus between the pair. A third-party adjudication process was established in the protocol but not needed. This scoring and verification process was followed throughout. For each SR, authors abstracted the following study characteristics: year of publication, participant population, intervention, number of primary studies, sample size of primary studies, and research design of primary studies. Authors then independently scored each SR using the PRISMA checklist and the AMSTAR tool, described in the following sections.

AMSTAR Tool
AMSTAR (A Measurement Tool to Assess Systematic Reviews) is an 11-item measure used to determine the quality of SRs [32]. AMSTAR has been acknowledged as a valid and reliable tool with high interrater reliability, construct validity, and feasibility [33]. We used AMSTAR, instead of R-AMSTAR (the revised version), because AMSTAR is more easily applied. R-AMSTAR has also been criticized for inherent subjectivity and repetitiveness [34]. We applied recommended revisions made by Burda et al 2016 [35] to AMSTAR. These changes focus on improving validity, reliability, and usability in assessing methodological quality, and they include changes in order of items, wording of items and instructions, and modifications to the focus of original items 7, 8, and 11. These recommendations also address aspects noted to be problematic in numerous studies and improve specificity to methodological quality over quality of reporting or risk of bias [35,36]. However, the additional item described by Burda et al. 2016 [35] was not included since subgroup analyses are not applicable to all SRs and meta-analyses. The addition of the item complicates scoring of the tool. Additional instructions were provided to reviewers if modified instructions were unclear. Each item was initially answered with a "criteria met," "criteria not met," "criteria partially met," and "not applicable." The answer "not applicable" was only available on item 10 (concerning small study effects), and it was selected if the SR included fewer than 10 primary studies. This modification was made since funnel plot methods lack power to detect true asymmetry when the number of primary studies is fewer than 10 [37]. Points were then awarded for each answer as follows: 1 point for criteria met and 0 points for other answers. The total score was then categorized into three categories based on their score: Low (0-3), Moderate (4-7), High (8)(9)(10)(11) [38].

PRISMA checklist
We assessed the clarity of reporting in eligible SRs using the PRISMA (Preferred Reporting Items for SRs and Meta-Analysis) checklist [39]. It has been acknowledged for its usefulness in critically appraising SRs and meta-analyses even though it was originally developed for authors to improve the quality of their reviews [40]. However, the quality of reporting does not necessarily equate to methodological quality in SRs, necessitating use of tools that independently assess both qualities [39,41]. The assessment contains 27 items designed to evaluate reporting quality. Each checklist item was answered with "criteria met," "criteria partially met," or "criteria not met" based on the completeness of reporting. Unlike AMSTAR we allowed partial credit on PRISMA items since completeness of reporting was thought to be more adequately accounted for using this method. For example, Item 7 of the PRISMA checklist states, "Did the systematic review describe all information in the search and date last searched?". In this case, a systematic review was assigned a partial score of one if it described all information sources in the search but did not state the date last searched. Points were then awarded as follows: 2 points for criteria met, 1 point for criteria partially met, and 0 points for criteria not met.

Search results
Our guideline search initially yielded 25 guidelines (Fig 1). Eight were excluded for the following reasons: three did not reference any SRs, three were published prior to 2006, one was already included in another guideline, and one was an outdated version of a guideline already included. A total of 17 guidelines were included in this study. From these 17 guidelines, there were 5,459 references. After screening the references using the titles, and in some cases, the abstract, 5,361 references were excluded. Of the 98 studies that proceeded to full-text review, 41 studies were excluded. Reasons for exclusion are listed in Fig 1. Ultimately, 57 unique SRs were included in this study of which 22 were included in more than one guideline ( Table 1). Characteristics of included guidelines are presented in Table 2.

Scoring results
Scores for both PRISMA and AMSTAR were averaged for all SRs in each guideline. For the PRISMA checklist, each item was averaged to give a percentage of items met (Table 3). For the AMSTAR tool, each number was added for each guideline for a total score out of 11, and a quality rating was assigned (Table 4). For both tools, the average number of SRs addressing each item across guidelines was calculated to identify problematic items. The following results are based solely on the SRs referenced in the CPGs and are not evaluations of the guidelines themselves.
No guideline received a low SR methodological quality rating from the AMSTAR tool for their SRs (Table 4) and none scored below a 70% based on the PRISMA criteria (Table 3). Five guidelines received a moderate quality rating on the AMSTAR tool. Of the five guidelines, MPG [42] had the highest adherence to PRISMA criteria (86%) and NZG [43] had the lowest (69%). Twelve guidelines received a high methodological quality rating on the AMSTAR tool. [48] only contained one SR. Items 3, 4, and 6 (rational for review, objective statement, eligibility criteria; respectively) of the PRISMA criteria showed remarkable adherence (100%) by each guideline (Table 2). For the AMSTAR tool, items 10 (publication bias," x = 0.50) and 11 (conflict     [42] fully adhered to item 10 (publication bias," x = 1) but did not adhere to item 11 (conflict of interest," x = 0). Items 1 and 2(a priori design, comprehensive literature search; respectively) of the AMSTAR tool had the highest adherence across all guidelines (" x = 0.99 and " x = 0.95, respectively). Despite this high adherence, RCGP [51] showed a lower adherence (" x = 0.6) to item 2(a priori design) (Table 3).
Overall, the PRISMA and AMSTAR scores were highly correlated (r = .79) (Fig 2). For example, VaDoD [45], had an average of 0.9 on the PRISMA criteria and a total score of 10/11 on the AMSTAR tool. Likewise, SMG [50] had the lowest score on the PRISMA criteria (= 0.7) and was associated with a low score on the AMSTAR tool (6.5/11, 59%).
The  Fig  3 depicts the AMSTAR and PRISMA scores by the number of times the systematic reviews were included in guidelines. In general, systematic reviews referenced in multiple guidelines had consistently higher AMSTAR and PRISMA scores. Greater variability was found in systematic reviews cited in only 1 or 2 guidelines.

Discussion
Our study found that the overall methodological and reporting quality of SRs included in guidelines for treatment of opioid use disorder was moderate to high. There are, however, still areas in which SRs could be improved. Results from our study suggest that disclosed funding of studies, assessment of publication bias, and reporting a registered protocol were the most problematic areas. Each of them plays a known role in SR outcomes.
Funding of the primary studies was underreported; however, evidence suggests that funding plays a role in the magnitude of clinical effect sizes. One recent study of funding in cardiovascular trials found that half of the industry-funded studies reported positive outcomes compared with only one-fifth of the non-industry-funded studies [57]. Possible explanations for favorable results in industry-sponsored research include selective funding of superior drugs, poor quality research, not selecting an appropriate comparator, and publication bias [58]. Unknown funding sources of primary studies within an SR can thus affect the summary effect size of the SR if funding source, rather than the intervention, contributes to bias in the outcomes of these trials.
Publication bias is a second contributor to inflated summary effects of SRs. Reliance on statistically significant outcomes from the published literature likely exaggerates the treatment effects across studies since nonsignificant findings are smaller in magnitude, less often published, and less often included in SRs. A substantial body of evidence has demonstrated the negative effects of publication bias on clinical outcomes [59][60][61][62][63]. Conducting a publication bias evaluation is not always recommended. In some cases, too few studies are available to conduct these evaluations. For funnel plot-based tests, at least 10 primary studies are needed for sufficient power to detect true asymmetry [25,37]. Furthermore, in cases of reviews of diagnostic test accuracy, diagnostic odds ratios typically diverge substantially from 1, and funnel plot methods are not recommended. In cases like these, systematic reviewers should acknowledge the potential for publication bias and provide a rationale for omitting the assessments.
A final area of weakness among SRs was the lack of a pre-established protocol or registration. These mechanisms serve to limit arbitrary decision making by reviewers, to allow for investigation of selective reporting bias (between registry/protocol and the published review), to foster collaborations, and to reduce research waste [64]. There are currently limited options for registering SRs. Perhaps the most widely used registry is PROSPERO, developed by the Centre for Reviews and Dissemination and funded by the National Institute for Health Research. PROSPERO catalogues prospectively registered SRs of health-related outcomes in the fields of health and social care, welfare, public health, education, crime, justice, and international development. Features of the registration are maintained as a permanent record to limit selective reporting bias [65]. Some clinical trial registries also permit prospective SR registration. For example, Japan's University hospital Medical Information Network (UMIN) trial registry permits these registrations. SR protocols are also being published in academic journals as another measure of accountability and transparency. The journals Systematic Reviews and BMJ Open frequently publishes these protocols. In the event that the review is registered in PROSPERO, the protocol only receives editorial review by the handling editor of Systematic Reviews [66]. The PRISMA-P checklist was recently published as a means to guide reviewers on the completeness of reporting information for SR protocols. Hopefully, these mechanisms will promote greater use of prospective registration and protocol development.
We found that systematic reviews cited more frequently in guidelines had higher PRISMA and AMSTAR scores, suggesting greater rigor when conducting and reporting these systematic reviews. This finding is important, given that clinicians use these guidelines to inform patient care. These findings also have implications for future research, as there is limited evidence on the quality of systematic reviews referenced in guidelines and whether those referenced more frequently are of higher (or lower) methodological quality than systematic reviews referenced by fewer guidelines or not referenced at all.
There have recently been calls for more extensive partnerships between SR teams and guideline development bodies to align activities [31,67]. These partnerships would facilitate improved use of SRs in developing guideline recommendations because guideline developers would be more aware of the existence of SRs. In turn, systematic reviewers would have a greater sense of the clinical questions to address in their reviews. It has also been suggested that systematic reviewers should participate on guideline development teams to enhance application of research findings and bridge the gap between research and practice [67]. PROSPERO and GIN are currently working together to foster these important collaborations.
Our study contained limitations. One limitation was that only SRs included in CPGs were scored. If a SR was not clearly identified by title as a SR or meta-analysis, it may have been missed during the screening process. The reviews were also identified in the reference section of each CPG and were not tied to specific recommendations, since many CPGs failed to associate their practice recommendations with particular references. Furthermore, although SRs are important in development of CPGs, there are many other types of research that contribute to CPG development. Evaluating the quality of the SRs is not an indication of the quality of the CPG as a whole. Although the AMSTAR tool is a validated tool to assess methodological quality, our use of modified AMSTAR items based on recent recommendations has yet to be empirically validated [35]. The recommended changes theoretically improve specificity to methodological quality and address known issues present in the original AMSTAR tool, but the degree to which this tool, when modified, measures methodological quality is as yet unknown. Furthermore, we assumed equal weighting of items for both AMSTAR and PRISMA, and it is likely that particular items making up these measures have more relevance to guideline development panels than others. Last, as some systematic reviews were referenced in multiple guidelines, there is the possibility for bias in guideline scores.
Closer examination of CPGs for opioid addiction is timely and important given recent movement toward solutions for opioid addiction. The Comprehensive Addiction and Recovery Act of 2016 became legislation on July 22, 2016. This law was enacted to address improper use of prescription opioids and illicit opioid substances, like heroin, and address better access to treatment and recovery options [68]. This act will likely affect clinical treatment recommendations, thus a review of the underlying SR evidence in CPGs for opioid use disorder will provide greater confidence in current recommendations. The need may lead this area of medicine, and the need has never been greater for appropriate production of CPGs based on sound evidence from adequately structured SRs.