Evaluation of the Quality of Guidelines for Myasthenia Gravis with the AGREE II Instrument

Background Clinical practice guidelines (CPGs) are systematically developed statements to assist practitioners in making decisions about appropriate healthcare in specific clinical circumstances. The methodological quality of CPGs for myasthenia gravis (MG) are unclear. Objective To critically evaluate the methodological quality of CPGs for MG using AGREE II instrument. Method A systematical search strategy on PubMed, EMBASE, DynaMed, the National Guideline Clearinghouse (NGC) and the Chinese Biomedical Literature database (CBM) was performed on September 20th 2013. All guidelines related to MG were evaluated with AGREE II. The software used for analysis was SPSS 17.0. Results A total of 15 CPGs for MG met the inclusion criteria (12 CPGs in English, 3 CPGs in Chinese). The overall agreement among reviews was moderate or high (ICC >0.70). The mean scores (mean ± SD) for al six domains were presented as follows: scope and purpose (60.93% ±16.62%), stakeholder involvement (40.93% ±20.04%), rigor of development (37.22% ±30.46%), clarity of presentation (64.26% ±16.36%), applicability (28.19% ±20.56%) and editorial independence (27.78% ±28.28%). Compared with non-evidence-based CPGs, evidence-based CPGs had statistically significant higher quality scores for all AGREE II domains (P<0.05). All domain scores appear slightly higher for CPGs published after AGREE II instrument development and validation (P>0.05). The quality scores of CPGs developed by NGC/AAN were higher than the quality scores of CPGs developed by other organizations for all domains. The difference was statistically significant for all domains with the exception of clarity of presentation (P = 0.07). Conclusions The qualities of CPGs on MG were generally acceptable with several flaws. The AGREE II instrument should be adopted by guideline developers, particularly in China.


Instruction
Myasthenia gravis (MG) is caused by antibody-mediated autoimmunity against the nicotinic acetylcholine (ACh) receptor (AChR) at the neuromuscular junction (NMJ). The prevalence of MG in the United States is estimated to be approximately 20/ 100,000 population and it occurs in all genders, ethnicities and ages [1]. MG is characterized by varying degrees of weakness and rapid fatigue of skeletal and voluntary muscle groups. The ranging of the clinical course of MG is from remission in an early stage to acute exacerbation and even death [2]. In the past, diagnosis, treatment and prognosis have been remarkably improved due to the development of successful surgical and pharmacological treatments. However, there remains a notable challenge regarding spontaneous remission [3,4]. Currently, several guidelines for MG are available. However, in common with many other fields and medical disciplines, only a minority of physicians fully comply with guidelines and recommendations are slow to make their way into everyday practice [5][6][7][8].
Clinical Practice Guidelines (CPGs) are defined as systematically developed statements to assist practitioners and patient decisions about appropriate health care for specific circumstances [9]. CPGs are a major tool for improving the quality of healthcare [10]. To guarantee that CPGs can be an effective tool in healthcare to improve outcome for patients, internationally recognized standards should be developed to assess the quality of CPGs and to promote the rigorous development of CPGs. These standards should be valid, reliable and feasible [11].
The Appraisal of Guidelines for Research and Evaluation (AGREE) instrument was initially developed in 2003, and updated to AGREE II in 2010. Consisting of 6 domains covering 23 key items [12], it is an appraisal tool and validated instrument that has been endorsed by leading producers, raters and compilers of international CPGs to provide a framework for assessing their quality [13]. Previous studies have demonstrated that the quality of CPGs in different clinical areas was modest or variable [13][14][15][16]. The current study aims to assess the quality of CPGs for MG by the AGREE II instrument, and to stratify the quality according to type of CPGs, performers, and pubtime.

Inclusion and exclusion criteria
We included CPGs that were concerned with all areas of MG and that were published in both journals and the internet, including comprehensive CPGs and others which only concentrated on the management of MG. The following studies were excluded: Chinese versions of foreign CPGs and consensuses and adapted versions of CPGs from other countries; Duplication; Explanation, guidance, evaluation on literature of CPGs, and abstract.

Data extraction and quality assessment
We established a standard table on Microsoft Excel 2003 (Microsoft Corp, Redmond, WA, www.microsoft.com). In addition to the items of AGREEã À , the following data were also extracted for each study: title of guidelines, year of publication, organizations or countries of publication, number of authors, number of organizations, updated/period, developed methods, number of references, topics covered and number of guideline pages.
The AGREEã À instrument was used to assess the methodological quality of included CPGs. 23 key items of 6 domains were scored on a scale of 1-7, with 1 being strongly disagree and 7 being strongly agree. The score for each domain is obtained by summing all the scores of the individual items in a domain and then standardizing as follows: (obtained score -minimal possible score)/(maximal possible score -minimal possible score). A guideline is ''strongly recommended'' if the majority of items (above 4 items) scores are above 50%. A guideline is ''recommended'' if 3 main items scores are above 50%. A guideline is ''weakly recommended'' if 1-2 items score above 50%. A guideline is ''not recommended'' if all items score below 50%.
The data extraction and quality assessment were performed independently by two trained reviewers. Disagreement between reviewers was resolved through consensus or by consulting an independent expert adjudicator.

Statistical analysis
A descriptive statistic analysis was used for the total score by each reviewer and score per domain. In order to assess inter-rater reliability within each domain, the value of intraclass correlation coefficients (ICCs) was calculated [17]. Statistical significance was set at P,0.05 and the software used for analysis was SPSS 17.0.
In order to assess the quality according to type of CPGs, the date of publication, and performers, quality scores of evidencebased (EB) and non-evidence-based (non-EB) CPGs, scores of CPGs published before and after AGREE II instrument and scores of CPGs performed by AAN, NGC or other organizations were compared by a t-test.
The demographic characteristics for each of the included guidelines are presented in Table 1. The data extraction was performed by two reviewers, with 3 CPGs discussed due to the disagreement of topics covered and development methods. All guidelines were developed between 1999 and 2012, with 52.94% of guidelines developed from 2009 onwards. Most of the CPGs (58.82%) were developed by US-based organizations and 2 CPGs (11.76%) were developed in China. 11 CPGs (64.71%) were evidence-based guidelines. The majority of CPGs (88.24%) reported the number of authors and 8 CPGs (47.06%) had more than 10 authors. 4 CPGs (23.53%) recorded the details of any CPG guideline changes and updates. The average total number of pages of the CPGs was 10.47 (range: 2-51).
Overall quality Assessment of guidelines 6 CPGs [25][26][27][28][29]31] were strongly recommended as a result of the majority of item (above 4 items) scores being above 50%. These CPGs were produced mainly by the National Guideline Clearinghouse (NGC). One guideline [22] is recommended due to 3 main item scores being above 50%. One CPG [24] was not recommended due to all item scores being below 50% (see Table 2).

Scope and Purpose
This domain includes the overall objectives of the guidelines, the health questions covered by the guidelines, and the population for whom the guidelines are intended [12]. The range and mean 6 SD of the overall quality score for this domain were 27.78%-80.56% and 61.76% 615.29% respectively. Only 3 CPGs scored below 50% for this domain. 82.35% of the criteria of this domain were satisfied, although Chinese CPGs had poor reporting for this domain. The ICCs showed moderate agreement between reviewers (ICC = 0.738, 95% Confidence Interval [CI] = 0.277-0.905).

Stakeholder involvement
This domain assesses whether or not the compositions of the working group were represented, the patients' views and preferences on the development of guideline have been sought, and whether target users and pretesting among end users have been correctly defined [12]. The overall score in this domain was poor with a mean of 45.1% 619.65% (range: 13.89%-77.78%). 10 out of 17 CPGs (58.82%) scored below 50%. The ICCs showed moderate agreement between reviewers (ICC = 0.802, 95%CI: 0.454-0.928).

Rigor of development
This domain is the core of the guidelines methodology, involving eight items. It relates to ''the process for synthesizing and gathering the evidence, and the methods used for formulating the recommendations and to update them''. The mean score 6 SD for this domain was 41.79% 627.56%, and the range of overall quality score was 4.17%-83.33%. Only 35.3% of guidelines scored above 50%. The ICCs were high (ICC = 0.959, 95%CI: 0.888-0.985).

Clarity of presentation
This domain focuses on whether or not recommendations are specific and unambiguous, different options for management of the condition or health issue are clearly presented, and key recommendations are easily identifiable. 82.35% of CPGs scored above 50%. The range and mean 6 SD of the overall quality score for included CPGs were 41.67%-88.89% and 63.73% 615.42%, respectively. The ICCs were 0.704 (95% CI: 0.181-0.953).

Editorial independence
This domain evaluates how funding bodies influence the guideline content and whether the interests of all CPG contributing members have been recorded and addressed. The assessment result shows that 23.53% of CPGs scored above 50%. The range and mean 6 SD of overall quality score for included CPGs were 0-66.67% and 29.90% 627.37% respectively. The ICCs showed high agreement between reviewers (ICC = 0.884, 95% CI = 0.681-0.958).

Statistical analysis according to type of CPGs, performers, and pubtime
Of the 17 CPGs assessed, 11 were EB CPGs. The other 6 were considered non-EB CPGs. Table 3 showed that EB CPGs had higher quality scores for all of the AGREE domains when compared with non-EB CPGs, and the difference was statistically significant for all domains (P,0.05). All domain scores appear slightly higher for CPGs published after the development and validation of the AGREE II instrument (2010). However, the   Table 3) Table 3).

Discussion
According to our data, the overall quality of CPGs for MG is acceptable for two AGREE II domains: scope and purpose, and clarity of presentation (mean score above 50%). The remaining four domain scores are between 27.78% and 40.93%. 6 of 17 CPGs were strongly recommended. Compared with non-EB CPGs of MG, the EB CPGs had statistically significant higher quality scores for all of the AGREE domains. There were no differences for the quality between pre-AGREE II and the post-AGREE II instrument. This would suggest that for some reason the uptake of AGREE-II has not been that good in the MG community. We found that the best performers were CPGs published and endorsed by the AAN, or registered in the NGC. As such, it is recommended that the editors should use the AGREE-II as a compulsory checklist for publishing guidelines to improve on this.
Compared with the world average [35], the mean scores of CPGs for MG were higher for some domains (stakeholder involvement 45.10% vs. 35.0%, clarity of presentation 63.73% vs. 60.0%, and applicability 31.12% vs. 22.0%). The other domains had a slightly lower score (scope and purpose 61.76% vs. 64.0%, rigor of development 41.79% vs. 43.0%, and editorial independence 29.90% vs. 30.0%). We hypothesize that these inconsistencies could be related to the inclusion of Chinese CPGs and non-EB CPGs in our study.
Our study found that the domain with the lowest score was editorial independence (29.90%). Only 2 CPGs reported the detailed information on potential conflicts of interest, with most CPGs (88.24%) reporting as the information as ''not stated'' or ''the authors report no conflicts of interest''. This may be avoided by making it mandatory for authors to provide information that addresses the detailed process listed in the documents. It is noteworthy that the mean score for this domain was ''0'' regarding the two Chinese CPGs that were included.
The results of the AGREE-II assessment also showed the substandard methodological quality of CPGs in terms of ''applicability''. The main flaws presented were lack of advice and/or tools on how the recommendations could be put into practice (64.70% of CPGs did not have any information for this item), and potential resource implications and barriers to their application (70.59% of CPGs did not have any information for this item). It has been widely shown in other diseases that it is hard for clinicians to change their practice after guidelines become available. As such, it is suggested that a pilot test for the applicability of new guidelines should be performed before publication to ensure their feasibility in clinical practice. Moreover, the journal should ask the guideline developer to provide the results of any such pilot test before accepting for publication. 2 of the CPGs [26,31] were strongly recommended (scoring above 50%) as an example to develop future CPGs for this domain in our study.
The rigor of CPG development is considered to be the most important domains in the assessment of all guidelines, with a list of key items focusing on the methodology employed by the developers, starting from the literature search through to the updating procedure. The mean score for this domain was low. The quality may be improved by involving search experts and methodologists in the guideline development process, as well as clarifying the methods of guideline development [36]. The guidelines should be updated every 3 years [37]. In our survey, only 4 guidelines described updating procedures, and the stated updating periods were 1 time/12 years [20], 3 times/10 years [25], 1 time/4 years [27], and 1 time/15 years [28]. The mean score for stakeholder involvement was 45.1%. Most CPGs did not report any information related to the views and preferences of the target population and target users of the guidelines remained generally undefined. For this, an explicit mechanism for involving the target population should be conducted before guideline development. For example, the views of patients involved in guideline development can be achieved through actively contacting patient groups and patient representatives [36].
The mean quality scores of scope and purpose (61.76%) and clarity of presentation (63.73%) were better than all other domains for this study. Most CPGs described the overall objective and their specific and focused clinical questions. The accurate reporting of the target population was flawed due to incomplete reporting information on data such as age range, clinical details and gender. The overall quality scores of scope and purpose were above 50% for almost all foreign CPGs. Unfortunately, the overall quality scores of the Chinese CPGs were low at 33.33% [23] and 27.78% [24] respectively. The most significant difference between CPGs is the process by which developers used the same evidence to produce different guidelines [38]. We assessed the CPGs related to MG by applying the AGREE II instrument, and found that the scores of CPGs developed by the AAN or registered in the NGC were statistically higher than other organizations or individual. A variety of studies concluded that there were better scores for EB CPGs when compared with non-EB CPGs [10,13]. Our study showed that the quality of EB CPGs was statistically significantly higher than non-EB CPGs for all domains. The purpose of the AGREE II instrument is to provide a framework for assessing the quality of CPGs and assist CPGs developers to improve the quality and applicability of CPGs. Table 3 showed that the quality of CPGs for MG improved after publication of the AGREE II instrument, the difference was no statistically significant.
Our study has several strengths. Firstly, the latest instrument for guidelines assessment (AGREE II) was used to assess the methodological quality of CPGs related to MG. Secondly, we performed a subgroup analysis and found the potential elements influenced CPG quality. Thirdly, we performed a systematic and complete literature search, included prominent academic databases (PubMed and EMBASE), two websites specifically related to CPGs (NGC and DynaMed) and one Chinese database (CBM). Furthermore, agreement between reviewers was strong (above 70%), ensuring our conclusions were valid and reliable. The review also had its limitations as we only included English and Chinese CPGs, giving no consideration to reports published in other languages. Also, the review only assessed the reporting of the different items and not the content validity of the recommendations.
Overall, we found the quality of CPGs on MG to be acceptable but flawed. The developers of CPGs need to pay more attention to editorial independence, applicability, rigor of development and stakeholder involvement during the development process. The AGREE II instrument should be adopted by guideline developers, especially so in regards to Chinese guideline development.