Systematic Review of Clinical Practice Guidelines Related to Multiple Sclerosis

Background High quality clinical practice guidelines (CPGs) can provide clinicians with explicit recommendations on how to manage health conditions and bridge the gap between research and clinical practice. Unfortunately, the quality of CPGs for multiple sclerosis (MS) has not been evaluated. Objective To evaluate the methodological quality of CPGs on MS using the AGREE II instrument. Methods According to the inclusion and exclusion criteria, we searched four databases and two websites related to CPGs, including the Cochrane library, PubMed, EMBASE, DynaMed, the National Guideline Clearinghouse (NGC), and Chinese Biomedical Literature database (CBM). The searches were performed on September 20th 2013. All CPGs on MS were evaluated by the AGREE II instrument. The software used for analysis was SPSS 17.0. Results A total of 27 CPGs on MS met inclusion criteria. The overall agreement among reviews was good or substantial (ICC was above 0.70). The mean scores for each of all six domains were presented as follows: scope and purpose (mean ± SD: 59.05±16.13), stakeholder involvement (mean ± SD: 29.53±17.67), rigor of development (mean ± SD: 31.52±21.50), clarity of presentation (mean ± SD: 60.39±13.73), applicability (mean ± SD: 27.08±17.66), editorial independence (mean ± SD: 28.70±22.03). Conclusions The methodological quality of CPGs for MS was acceptable for scope, purpose and clarity of presentation. The developers of CPGs need to pay more attention to editorial independence, applicability, rigor of development and stakeholder involvement during the development process. The AGREE II instrument should be adopted by guideline developers.


Introduction
Multiple sclerosis (MS) is a chronic disease that attacks the central nervous system, i.e. the brain, spinal cord and optic nerves. It is characterized by the destruction of the myelin sheath that surrounds neurons, resulting in the formation of plaques. The cause of MS is unknown. One of the widely supported hypotheses is that MS occurs in patients with genetic susceptibility and is triggered by certain environmental factors. Recent data shows that in the USA over 350,000 people have MS, and a report from Cleveland Clinic indicates that MS-related health care costs are thought to be over $10 billion per year in the United States alone. Symptoms usually appear initially between 15 and 45 years of age. Women are presently twice as likely to get MS as men [1].
In the past, the decisions for diagnosis and treatment in any disease, including MS, were primarily based on a physician's experience rather than on evidence. The resultant variability in clinical practice was recognized by medical organizations and consensus meetings were conducted to develop recommendations [2].
The intention of clinical practice guidelines (CPGs) is to provide clinicians with explicit recommendations on how to manage health conditions and bridge the gap between research and clinical practice [3]. Unfortunately, it is difficult to gauge how a guideline is applied and performs in clinical practice [4]. Of the CPGs used in 235 studies assessing the effectiveness and efficiency of dissemination and implementation strategies, only 3% of the guidelines used were based on good evidence [5]. A ''good'' guideline should be scientifically valid, usable, reliable, and should improve the outcome of patients [4]. Standards are needed to promote the rigorous development of such guidelines, which should also be internationally recognized and feasible [6].
The Appraisal of Guidelines, Research, and Evaluation (AGREE) instrument evaluates the process of CPG development and reporting quality based on theoretical assumptions [7]. The AGREE instrument was initially developed in 2003, and updated to AGREEII in 2010, consisting of 23 key items organized into 6 domains [8]. The last update of AGREE II was September 2013.
To our knowledge, there has been no critical evaluation performed regarding guidelines or consensus on management of MS. We have, therefore, evaluated the methodological quality with the AGREE II instrument. In addition, we compared the quality of CPGs according to different stratified factors including year of publication, country/region, level of development, number of authors, topics covered, type of CPGs, etc.

Eligibility criteria
We included guidelines/consensuses that provided recommendations on diagnosis, treatment, and management of MS. For inclusion in our study, the CPGs were required to (1) be published in English and Chinese, and (2) to explicitly identify itself as a ''guideline'' or ''consensus''. When more than one set of guidelines were produced by the same working groups or covered the same topics, only the most recently issued was considered; and (3) the cutoff time for inclusion of CPGs was September 2013. We excluded guidelines that (1) were Chinese versions of foreign CPGs and consensuses and adapted version of CPGs from other countries; (2) were duplications; and (3) were explanations or evaluations of CPGs.

Search
A systematic and comprehensive search was performed by two reviewers. The search strategy for PubMed is presented in Appendix S1.

Study selection
According to the inclusion and exclusion criteria, all searched records were classified using reference management software Endnote 63 (The Thomson Reuters, Britain), and duplicate studies were discarded. Next, we read all the abstracts to identify both potentially eligible articles and any articles for which a determination could not be made from the abstract alone. Then we obtained the full-text of these articles to determine whether or not they were eligible. Study selection was independently performed by two reviewers and disagreements between reviewers were resolved through consensus or by consulting the third expert adjudicator.

Data collection process and data items
An abstractive data extraction form was developed, piloted and modified as necessary. Two reviewers independently extracted the data and disagreements were resolved by discussion or the involvement of a third arbitrator. The extraction data included CPG characteristics (title, year of publication, organizations or countries of publication, number of authors, number of organi-zations, updated/period, developed methods, number of references, topics covered, number of pages) and the 23 items of AGREE II.

Quality evaluation
A training exercise was conducted prior to commencing the quality evaluations by using a random sample of 5 CPGs. After discussion of the disagreements, two trained reviewers independently evaluated the validity of each CPG using the AGREE II instrument. The instrument consists of 23 items organized in six domains: scope and purpose, stakeholder involvement, rigor of development, clarity and presentation, applicability, and editorial independence [8]. Each item was scored from 1 (strongly disagree) to 7 (strongly agree). The score for each domain was obtained by summing all the scores of the individual items in a domain and then standardizing as follows: (obtained score -minimal possible score)/(maximal possible score -minimal possible score). The minimum standardized score for each domain was 0% and the maximum was 100%. A guideline is ''strongly recommended'' if the majority of items (above 4 items) scored above 50%. A guideline is ''recommended'' if 3 main items scored above 50%. A guideline is ''not recommended'' if all items scored below 50%.

Synthesis of results
A descriptive statistical analysis for each domain was performed. Descriptive values include percentage, mean, and standard deviation (SD). Inter-rater reliability within each domain was determined by the Intraclass Correlation Coefficients (ICCs) with a 95% CI. The degree of agreement was classified according to the following scale proposed by Landis and Koch: poor (,0.00), slight (between 0.00 and 0.20), fair (from 0.21 to 0.40), moderate (from 0.41 to 0.60), substantial (from 0.61 to 0.80) and very good or almost perfect (from 0.81 to 1.00) [9]. Statistical significance was set at P,0.05. The software used for analysis was SPSS 17.0.
In addition, the overall domain scores were compared according to type of CPG, date of publication, performers, country/region, number of authors, updates, topics covered and whether it is a guideline or consensus.

Study selection
A total of 885 citations were identified through a comprehensive database search and 77 records were searched on website related to the CPG. 905 were excluded based on the eligibility criteria previously outlined, 57 were considered for full-text screening and 27 were included in the review (Figure 1)
Overall, the CPGs received the lowest scores for applicability across all six AGREE II domains (mean score: 27.08%617.66%, range: 4.17%-66.67%), whereas they scored highest on clarity of presentation (mean score: 60.39%613.73%, range: 33.33%-83.33%). CPGs developed by regional independent bodies received the highest scores for clarity of presentation (Domain 4, 54.94611.18) and the lowest scores for applicability (Domain 5, 20.6067.47). Updated CPGs received higher scores than ones that were not updated. Three updated CPGs were strongly recommended because the mean scores for all six domains were above 50%, and they scored highest on scope and purpose (Domain 1, 87.9664.24), the lowest scores for applicability (Domain 5, 53.4764.34). The topics for the 27 CPGs that were included covered diagnosis, treatment, and management. The stratified results showed that CPGs related to treatment received higher scores for all domains. Of the 27 CPGs assessed, 12 were evidence-based (EB) CPGs. The other 15 were considered non-EB CPGs. Table 2 shows that EB CPGs have higher quality scores for all of the AGREE domains. However, 4 of 6 domains scored below 50%, and the lowest scores appeared in applicability (Domain 5, 38.37620.21).

Discussion
This is the first study to systematically evaluate the methodological quality of CPGs on diagnosis, treatment, and management of MS published in English and Chinese. For the most part, the quality scores for scope and purpose (59.05%) and clarity of presentation (60.39%) are acceptable. However, the methodological quality of the CPGs in the study had some flaws, including the representation of all stakeholders (consumers, all relevant professional group, target users, 29.53%), developing guidelines with scientific rigor (31.52%), supporting implementation of the recommendations (27.08%), and declaring editorial independence (28.70%). Our results are similar to the study conducted by Alonso-Coello P et al.'s which assessed a total of 626 CPGs on different topics and showed that the mean quality scores were moderate (43% for rigor of development) to low (35% for stakeholder involvement, 30% for editorial independence, and 20% for applicability) [21]. 22.22% of the CPGs were recommended strongly because the majority of the items (above 4 items) scored above 50%, and 14.81% of CPGs were not recommended because all of the items scored below 50%. The results of a stratified analysis show that all domain scores of CPGs published in or after 2010 appear slightly higher except for editorial independence. The mean scores of all six domains are higher for CPGs developed by American organizations and AAN, CPGs with more than ten authors, updated CPGs, EB CPGs, and guidelines rather than consensuses.
There were serious methodological reporting flaws for the included CPGs in the items of stakeholder involvement, rigor of development, applicability and editorial independence. Most of CPGs lacked explicit statements on the views and preferences of the target population (e.g., patients, public, etc.) (item 5), but the target users of guidelines were well-defined (item 6). Rigor of development is considered to be the most important domain and more attention should be made to whether external reviews are performed before CPGs are published (item 13) and whether updating mechanisms for the guidelines are provided (item 14).
However, the quality of the ''applicability'' domain also plays a critical role in the implementation of a guideline. An effective guideline should provide advice as to how the recommendations can be implemented present discussion on the potential impact of recommendations on resources and requires clearly defined criteria derived from the key recommendations [8]. Unfortunately, flaws in CPGs were found in two items including whether or not the guidelines describe facilitators and barriers to their application (item 18) and whether or not the potential resource implications of applying the recommendations have been considered (item 20). The AGREE II instrument is used for the rigor and transparency of CPG development and to suggest how to improve existing CPGs [8], and it requires developers of guidelines to report potential conflicts of interest. Our results show that there are serious reporting flaws for potential conflicts of interest for the members of the guideline development group (item 23). Our study has several strengths. First, the latest instrument for guidelines assessment (AGREE II) was used to assess the methodological quality of CPGs related to MS. Second, we performed a stratified analysis and found the potential elements that most significantly influenced CPG quality. Third, we conducted a systematic and comprehensive literature searching, including three main English academic databases (PubMed, EMBASE, Cochrane Library), two web-based searches related to CPGs (NGC, and DynaMed), and one Chinese database (CBM). Lastly, the inter-reviewer consensus was high (above 70%), so our conclusions are reliable.
On the other hand, some limitations are noted in our study. First, although the processes of searching, study selection, data extraction and quality assessment were conducted independently by two reviewers, there are still some limitations due to the different level of understanding of the AGREE II instrument the two reviewers have. Second, we only included CPGs in English and Chinese, so CPGsin other languages were not considered. Third, this review only assessed the reporting of the different items and not the content validity of the recommendations. Finally, other instruments such as the four-item Global Rating Scale (GRS), which plays an important role in guideline evaluation, should be considered [22]. Although the GRS is less sensitive than the AGREE-II in detecting differences in guideline quality, its items did predict outcome measures related to guideline adoption [23].
Overall, the quality of CPGs on MS was acceptable for scope and purpose and clarity of presentation. The developers of CPGs need to pay more attention to editorial independence, applicability, rigor of development, and stakeholder involvement during the development process. The AGREE II instrument should be adopted by guideline developers.

Supporting Information
Appendix S1 Search algorithms for PubMed.