Methodological quality of cohort study on rheumatic diseases in China: A systematic review

Objective To evaluate systematically the quality of the cohort studies on rheumatic diseases in China. Methods Relevant databases were searched to find cohort studies on rheumatic diseases in China, and the basic information included in the literature was extracted and analyzed. Chinese and English literature were then compared with regard to methodological quality, according to the Newcastle–Ottawa Scale (NOS). Results In total, we included 46 cohort studies, with 19 studies published in English and 27 studies published in Chinese. With regard to the basic characteristics of the literature, 78.26% of the studies were published in the past four years; 16 studies were associated with hyperuricemia, followed by eight studies involving systemic lupus erythematosus. The sample size of the studies in Chinese was lower than that in English studies (P< 0.05). The English literature was superior to the Chinese literature in terms of informed consent, ethical review and selection of statistical analysis methods. The methodology quality of the 46 included studies showed that the English and Chinese NOS scores were 5.59 ± 1.25 and 6.06 ± 1.11, respectively, and the difference was significant (P< 0.01). The “representativeness of the exposed group”, “demonstration that outcome of interest was not present at start of study”, and the “adequacy of follow up of cohorts” scores were relatively low in Chinese and English studies. The score for “was follow-up long enough for outcomes to occur” item in English was higher than that in the Chinese studies; however, the “study controls for the most important factor” score for Chinese papers was better than that for the English papers. Conclusion The Chinese rheumatic disease cohort studies started late, with a small sample size and fewer types of rheumatism. The quality of Chinese studies was better than English studies, and all reports were insufficient. In particular, “selecting exposed groups”, “controlling the outcomes before study implementation” and “adequacy of follow-up” needed improvement.


Introduction
The cohort study is the second level of evidence in evidence-based medicine. It has more accessible data sources and lower costs than randomized controlled trials, and the results are more in line with clinical practice [1]. Its range of application has developed from healthrelated influencing factors to assessing the effectiveness of medical control measures [2]. Since the mid-19th century, many classic cohort studies have been carried out internationally. For example, a multicenter AIDS cohort study conducted in 1984 concluded that even if patients were seropositive for human immunodeficiency virus type 1 , those without AIDS would not develop Pneumocystis carinii pneumonia unless their CD4+ cells were depleted to 200 or fewer per cubic millimeter [3]. Another study, using data from 31,546 people in 40 countries, found that high-quality diets can reduce the incidence of cardiovascular events in people over 55 years of age [4]. Therefore, cohort studies play an important role in the assessment of risk factors, outcomes, and preventive measures for disease occurrence.
Rheumatism is an ancient disease encompassing nearly 200 diseases in 10 categories. the discipline was founded later but grew faster than many others [5]. In recent years, its increasing incidence and prevalence have attracted the attention of researchers all over the world. Cohort studies based on etiology and pathology, to guide clinical treatment, have become commonly used in the field of rheumatology [6][7][8]. For example, the Framingham cohort study investigating risk factors for osteoarthritis (OA) showed no significant relationship between inflammatory markers and OA [9,10]. In order to observe the risk of gout in people consuming fructose-rich beverages, 78,906 women were followed-up for 22 years, and the results showed that fructose-rich beverages increased the incidence of gout events [11]. China is a large country with many rheumatism patients, and some cohort studies have been carried out in recent years [12]. However, until now, their research status and quality have not been reviewed and evaluated.
This paper collects the literature on rheumatism cohort research conducted in China, discusses the progress of research in recent years, and evaluates the quality of research in the Chinese and English literature. The purpose of this study is to provide a reference for the improvement of future cohort studies.

Search strategy
The following databases were searched from their inception to January 2019. The English databases included PubMed, Web of Science, EMbase, and the Cochrane Library. The Chinese databases included the Chinese National Knowledge Infrastructure (CNKI), Wan Fang Database, the Chongqing Vip Information Database (VIP) and the Chinese Biomedical Database (SinoMed). For the English databases, the keyword searches used were "cohort studies", "rheumatic diseases", "autoimmune diseases" and "connective tissue diseases". The Medical Subject vocabulary (MeSH) searches used were "rheumatoid arthritis", "ankylosing spondylitis", researchers conducted a preliminary screening, and they subsequently read full text of each paper. Two researchers independently judged whether the inclusion criteria were met, and the divergences were referred to the third investigator for assistance.

Inclusion criteria
Studies that met the following criteria were selected for further analysis: the study type was a cohort study; the study participants were patients with rheumatic diseases from China; the study provided complete research data; languages were limited to Chinese or English; and the study was published in an official journal.

Exclusion criteria
Studies that met the following criteria were excluded: repeated published literature; reviews; systematic reviews; clinical control studies; randomized controlled trials; animal experiments; the full text remained unavailable after contacting the author.

Data extraction
Two reviewers (H Zhang and GX Yi) screened all titles and abstracts of the studies independently. Full texts of potentially-included studies were retrieved for further identification, according to the above criteria. A data extraction form was created, including the author, the year of publication, journal name, case collection location, sample size, disease researched, etc ( Table 1). The disease was classified according to the International Classification of Diseases (10th Edition). Disagreements were resolved by consultation with other authors, and a final decision was made through discussions and consultations.

Assessment of method quality
The study quality was evaluated according to the Newcastle-Ottawa Scale (NOS). The NOS is divided into selection (4 points), comparability (2 points) and outcome (3 points). There were eight items in three columns, and the final score was 0-9 points. The higher the score, the higher the quality. 0-4 points indicated low quality, and 5-9 points indicated high quality. Supplementary items for evaluation of methodological quality were as follows: design type of the study, calculation of sample size, informed consent and ethical review, statistical analysis methods, etc.

Statistical analysis
Statistical analysis was performed using SPSS version 23.0 (IBM Corp., Armonk, NY, USA). Data were presented as mean and standard deviation or frequency and percentage. Comparisons were conducted between the two groups. For normally distributed variables, means were compared using the t-test and nonparametric variables were analyzed using Pearson's chisquared test. Two-tailed P values were used, with a P < 0.05 considered statistically significant.

Search results
A total of 573 studies were retrieved. According to the aforementioned screening criteria and double assessment by two reviewers, we excluded 151 duplicate documents by reading topics and abstracts, removed 330 documents that did not meet the inclusion criteria, ruled out 46 non-conforming documents by reading the full text, and finally incorporated 46 articles,

Sample size.
A total of 313,564 subjects were included in the 46 studies, with a minimum of 49 cases and a maximum of 26,006 cases in Chinese, a minimum of 109 cases and a maximum of 101,510 cases in English. The sample size was less than 500 cases in 26.32% and 55.56% of English and Chinese studies, respectively. This difference was statistically significant (P< 0.05), and the specific sample size distribution is shown in Fig 3B. 3.2.5 Informed consent and ethical review. The informed consent and ethical review data are shown in Table 1. Informed consent reached 78.95% in English, but only 29.63% in Chinese. The difference between the two groups was obvious (P< 0.01). A total of 84.21% of English literature studies conducted ethical reviews, while only 14.81% conducted such reviews in the Chinese literature. As a result, statistically significant differences were found (P< 0.01).

Statistical methods.
In this study, 73.68% of English studies and 55.56% of Chinese studies analyzed the baseline data, and only one reported data loss [35]. Three English and nine Chinese studies used only single-factor statistical analysis methods, such as single-factor variance analysis, t-tests, χ 2 test, non-parametric testing, etc. One Chinese and one English study used Cox regression analyses to analyze effects over time; 15 English and 17 Chinese studies used logistic regression and Cox regression analysis to correct for the effects of confounding factors. The distribution of statistical analysis methods in Chinese and English studies is shown in Fig 4.

NOS methodological quality evaluation
The NOS [59] was used to evaluate the 46 articles included in the study. First, both Chinese and English studies had low scores for the "representativeness of the exposed cohort". Most of

PLOS ONE
the studies were based on a specific group of people, such as a group of retired employees, medical examination staff of a company, and so on. Second, the English papers scored worse than the Chinese papers on the "comparability between groups" entries, because they did not control for important factors and confounding factors that could affect the results. Finally, the follow-up was described in only 13 English articles and eight Chinese articles. The Chinese literature scored lower than the English literature for follow-up time, and the follow-up time was generally insufficient. The scores for the "sufficient follow-up" entries in both groups were low. The specific evaluation scores are shown in Table 3. According to the NOS evaluation criteria, the average score for the English literature was 5.59 ± 1.25, and the average score for the Chinese literature was 6.06 ± 1.11. The quality of the two groups was good, and the scores were mainly concentrated between 4-7. The average score of Chinese studies was higher than that of English studies. The difference between the groups was significant (P< 0.01). Table 4 gives the specific scores.

Discussion
An international cohort study on rheumatism is ongoing, and many large-sample multi-center clinical studies have been carried out. For example, the Framingham cohort study in the United States on the risk factors for osteoarthritis found that meniscal injury in the middleaged and elderly populations increased with age and had no parallel relationship with clinical symptoms [60]. A retrospective cohort of 37,338 twins across Denmark showed that genes play only a minor role in the pathogenesis of rheumatoid arthritis [61]. Scholars at the University of St. Petersburg in Russia followed 498 women with systemic lupus erythematosus for 14 years. The research results suggested that early death was most common in those with active

PLOS ONE
lupus or concurrent infections, and late death was more common in patients with atherosclerotic coronary heart disease or acute myocardial infarction [62]. It is clear that cohort studies have played an important role in the prevention and treatment of rheumatism.
According to the analysis of the basic characteristics of the 46 studies evaluated herein, a total of 11 diseases were studied in all cohort studies. In the past four years, 78.26% the literature has been published, indicating that cohort research on rheumatism in China developed late, and the degree of emphasis has risen in recent years. Most reports described retrospective cohort studies, and 14 reports did not allow us to judge the type of research. Only one bidirectional cohort study was published, which may be related to its difficulty in implementation and high cost. The average sample size of studies in the Chinese literature was smaller than that of those in the English literature. Six Chinese studies had fewer than 100 participants. With small sample sizes, studies show shortcomings of under representation [63,64], leading to a low generalizability of Chinese cohort research. As can be learned from the statistical results, the large-scale Chinese reports with more than 500 cases accounted for 44.44%, while this percentage was 73.68% in the English reports. Of the 46 studies, only one reported an estimate of the sample size. With regard to informed consent and ethical review, the English literature was significantly better than the Chinese literature. However, 26.32% of English reports and 44.44% of Chinese reports did not describe baseline characteristics, which will affect the

PLOS ONE
interpretation of the results [65].Nine of the Chinese studies used only univariate analyses to compare differences between groups, and because there are no other factors controlled, this may result in bias in the results. In summary, the Chinese rheumatism cohort study developed late, involves fewer disease types, shorter follow-up time, and lack of multi-center large sample studies, which limit the theoretical and applied value of rheumatism cohort research in China. Therefore, it is particularly important to improve the quantity and quality of rheumatism cohort studies [66]. The NOS scale is a commonly used method for the evaluation of cohort studies [67]. In this paper, the 46 documents retrieved were evaluated using the NOS. First, the results show that the Chinese score was higher than the English score, although the overall scores were mainly between 4 and 7 points. This may have been related to the exclusion of non-standardized and low-quality papers during the process of selecting the literature, resulting in a higher overall score. Second, both the Chinese and English literature scored higher on items such as "selection of the non-exposed cohort", "ascertainment of exposure", and "assessment of outcome", but the score for "representation of the exposed group" was lower. The main reason for this is that the exposed group was generally a specific group, such as outpatients, inpatients or a group of employees, and cannot represent the entire population. In addition, both Chinese and English studies scored lower on "demonstration that outcome of interest was not present at start of study", because the paper did not explicitly explain or did not mention the relevant information. Third, 17 Chinese and nine English reports did not describe follow-up, and did not score on the item "whether the group was adequately followed". Finally, the Chinese studies were inferior to the English studies on "was follow-up long enough for outcomes to occur", because the Chinese studies did not specify or anticipate the required follow-up time and we were unable to determine whether the follow-up time was sufficient. However, Chinese studies scored better than English studies on the "select the most important factor" entry. The Chinese studies not only controlled important factors such as time, but also controlled other confounding factors such as age, gender, and duration of disease. Obviously, both the Chinese and English reports have different advantages and disadvantages, and it is necessary that researchers should learn from each other.
Although rheumatism cohort studies in China have shown a great deal of progress, there are still many areas that need improvement. For instance, researchers should try to avoid the choice of specific populations for exposure groups. Attention should be paid as to "Demonstration that outcome of interest was not present at start of study", so as not to affect the observation of the results. In addition, the follow-up time should be based on the time of occurrence of the observations, to estimate a sufficiently long follow-up time, and the follow-up process should be standardized and detailed. In addition, we should also pay attention to more rheumatic diseases that have not been studied, conduct multi-center large sample cohort studies as much as possible, improve informed consent and ethical review, and select appropriate statistical methods. The above methods can provide information for future rheumatism cohort studies and improve the standardization and quality of cohort research in China.

Conclusion
In this paper, we reviewed the progress and evaluated the quality of cohort studies on rheumatic diseases in China. On the one hand, the results showed that the Chinese rheumatic disease cohort studies developed late, with small sample sizes and fewer types of rheumatism. On the other hand, NOS quality assessment results show that the quality of Chinese reports is better than English reports. Some shortcomings were also discovered. In particular, "selecting exposed groups", "controlling the outcomes before study implementation" and "following up groups" need to be improved. Therefore, well-designed multicenter and large-scale cohort studies are needed to improve clinical studies of rheumatic diseases in China.