Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Find Duplicates among the PubMed, EMBASE, and Cochrane Library Databases in Systematic Review

  • Xingshun Qi ,

    Contributed equally to this work with: Xingshun Qi, Man Yang, Weirong Ren, Jia Jia

    xingshunqi@126.com (QX); guohhan@126.com (HG); fandaim@fmmu.edu.cn (FD)

    Affiliations Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China, Department of Gastroenterology, No. 463 Hospital of Chinese PLA, Shenyang, China

  • Man Yang ,

    Contributed equally to this work with: Xingshun Qi, Man Yang, Weirong Ren, Jia Jia

    Affiliation Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China

  • Weirong Ren ,

    Contributed equally to this work with: Xingshun Qi, Man Yang, Weirong Ren, Jia Jia

    Affiliation Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China

  • Jia Jia ,

    Contributed equally to this work with: Xingshun Qi, Man Yang, Weirong Ren, Jia Jia

    Affiliation Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China

  • Juan Wang,

    Affiliation Department of Gastroenterology, No. 463 Hospital of Chinese PLA, Shenyang, China

  • Guohong Han ,

    xingshunqi@126.com (QX); guohhan@126.com (HG); fandaim@fmmu.edu.cn (FD)

    Affiliation Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China

  • Daiming Fan

    xingshunqi@126.com (QX); guohhan@126.com (HG); fandaim@fmmu.edu.cn (FD)

    Affiliation Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Xi’an, China

Find Duplicates among the PubMed, EMBASE, and Cochrane Library Databases in Systematic Review

  • Xingshun Qi, 
  • Man Yang, 
  • Weirong Ren, 
  • Jia Jia, 
  • Juan Wang, 
  • Guohong Han, 
  • Daiming Fan
PLOS
x

Abstract

Background

Finding duplicates is an important phase of systematic review. However, no consensus regarding the methods to find duplicates has been provided. This study aims to describe a pragmatic strategy of combining auto- and hand-searching duplicates in systematic review and to evaluate the prevalence and characteristics of duplicates.

Methods and Findings

Literatures regarding portal vein thrombosis (PVT) and Budd-Chiari syndrome (BCS) were searched by the PubMed, EMBASE, and Cochrane library databases. Duplicates included one index paper and one or more redundant papers. They were divided into type-I (duplicates among different databases) and type-II (duplicate publications in different journals/issues) duplicates. For type-I duplicates, reference items were further compared between index and redundant papers. Of 10936 papers regarding PVT, 2399 and 1307 were identified as auto- and hand-searched duplicates, respectively. The prevalence of auto- and hand-searched redundant papers was 11.0% (1201/10936) and 6.1% (665/10936), respectively. They included 3431 type-I and 275 type-II duplicates. Of 11403 papers regarding BCS, 3275 and 2064 were identified as auto- and hand-searched duplicates, respectively. The prevalence of auto- and hand-searched redundant papers was 14.4% (1640/11403) and 9.1% (1039/11403), respectively. They included 5053 type-I and 286 type-II duplicates. Most of type-I duplicates were identified by auto-searching method (69.5%, 2385/3431 in PVT literatures; 64.6%, 3263/5053 in BCS literatures). Nearly all type-II duplicates were identified by hand-searching method (94.9%, 261/275 in PVT literatures; 95.8%, 274/286 in BCS literatures). Compared with those identified by auto-searching method, type-I duplicates identified by hand-searching method had a significantly higher prevalence of wrong items (47/2385 versus 498/1046, p<0.0001 in PVT literatures; 30/3263 versus 778/1790, p<0.0001 in BCS literatures). Most of wrong items originated from EMBASE database.

Conclusion

Given the inadequacy of a single strategy of auto-searching method, a combined strategy of auto- and hand-searching methods should be employed to find duplicates in systematic review.

Introduction

Systematic review is characterized as explicitly formulated, reproducible, and up-to-date summary of the effects of health care interventions [1], [2]. It provides the top level of evidence for clinical decision [3], [4]. More than 2500 new systematic reviews every year can be retrieved in PubMed [5]. Compared with the traditional narrative review, the most prominent specialty of the systematic review is that literature search is comprehensive and literature selection is unbiased. Recently, the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) statement has recommended that a four-phase flow diagram should be employed for literature search and selection in systematic review [1]. The first phase is to identify all relevant literatures through databases and subsequently to remove the duplicates simultaneously recorded by different databases or published by different journals. The process of finding duplicates among databases is so critical that the researchers can avoid the repetitive evaluation of data from the same study and the readers can accurately understand the quantity of scientific publications in the field. Based on our previous systematic reviews [6], [7], [8], [9], a high prevalence of duplicates can be frequently observed among different databases. More importantly, not all duplicates can be readily found, because wrong information is occasionally recorded. However, no consensus regarding the methods to find duplicates and the prevalence of duplicates among different databases has been given yet.

Herein, we attempted to describe our methods to find duplicates among the PubMed, EMBASE, and Cochrane library databases in systematic review and to evaluate the prevalence and characteristics of duplicates.

Methods

Literature search

Literatures in two fields were retrieved to minimize the potential selection bias. They included “portal vein thrombosis” and “Budd-Chiari syndrome” literatures. The selection of the two fields was primarily attributed to our research interests in the two vascular disorders of the liver [6], [7], [8], [9], [10], [11]. QX searched the PubMed, EMBASE, and Cochrane library databases (from the database inception to November 12, 2012). Our search strategy aimed to maximize the quantity of literatures recorded by these databases. The search items were discussed by all review authors. For the literatures regarding portal vein thrombosis, the search items were: (portal vein thrombosis) OR (portal venous thrombosis) OR (portal vein obstruction) OR (portal venous obstruction). For the literatures regarding Budd-Chiari syndrome, the search items were: (budd chiari) OR (hepatic vein obstruction) OR (hepatic venous obstruction) OR (hepatic vein thrombosis) OR (hepatic venous thrombosis).

Definitions and classifications of duplicates

Duplicates were divided into type I (duplicates among databases) and II (duplicate publications). Type I duplicates were defined as one paper was simultaneously recorded in one database twice or more times or in two or three databases (see examples in Table 1). Type II duplicates were defined as one study was published in different journals or issues. According to the type of publication, type II duplicates were classified as Abstract-Abstract, Abstract-Full text, and Full text-Full text. The first two types were often permitted, but the last one type was unethical in most of cases [12] (see examples in Table 2).

Duplicates consisted of one index paper and one or more redundant papers. For type I duplicates, index paper was considered as one paper of the duplicates had more accurate and/or adequate reference information; and for type II duplicates, index paper was considered as one paper of the duplicates was published earlier and/or had a larger sample size [13]. According to the number of redundant papers, duplicates were classified as follows: double duplicates were defined if only one redundant paper was found, triple duplicates if two redundant papers were found, quadruple duplicates if three redundant papers were found, and so on. According to the origin of index and redundant papers, duplicates were classified as PubMed-PubMed, PubMed-EMBASE, PubMed-Cochrane, EMBASE-EMBASE, EMBASE-Cochrane, Cochrane-Cochrane, and PubMed-EMBASE-Cochrane.

Auto-search duplicates

QX imported all literatures retrieved by the three databases into an Endnote library (ENDNOTE X3, Thomson Reuters, USA). All literatures were expressed in Vancouver reference type. In the Endnote library, QX used the “Find Duplicates” command on the “References” menu to identify the auto-searched duplicates among the three databases. Prior to this step, “Find Duplicates” preferences could be defined on the “Edit” menu. To maximize the quantity of auto-searched duplicates, our preference was consistent with the Endnote default setting. In this setting, duplicates were identified as references of the same reference type with matching “author”, “title”, and “publication date” items, but “journal’s name”, “volume”, “issue”, and “page” items were not compared. QX further verified the accuracy of auto-searched duplicates.

Hand-search duplicates

After auto-searched redundant papers were removed, the remaining literatures were alphabetically ordered according to the first authors’ names. Then, duplicates were identified among the literatures by the same first author. In details, if

Notably, if the first author’s name was wrongly spelt or missing or the authors’ order was reversed in some database, we would miss some duplicates. Accordingly, to minimize the quantity of missed duplicates, the literatures were also alphabetically ordered according to the titles. Then, duplicates were identified among the literatures with the same titles. YM and JJ were responsible for the literatures regarding portal vein thrombosis, and QX and RW for the literatures regarding Budd-Chiari syndrome. QX and YM were also responsible for rechecking the accuracy of their tasks. Disagreement would be resolved by discussion among the four review authors.

Difference between index and redundant papers of type I duplicates

We just compared the difference of reference items between index and redundant papers of type I duplicates, but not type II duplicates. This behavior was primarily attributed to the fact that nearly all type II duplicates had different journal’s name, volume, issue, and page between index and redundant papers. QX and YM extracted the detailed information of type I duplicates (i.e., author, title, journal’s name, publication date, volume, issue, and page) into an Excel table (Microsoft Office Excel 2003, Microsoft Corporation, USA). Then, QX and YM compared the difference of reference items between index and redundant papers, and identified “acceptable or unacceptable” duplicate publications in order to distinguish whether or not they had wrong information.

Difference between index and redundant paper(s) would be considered acceptable to readers and reference reviewers, if the information was expressed in different styles. These different styles included: 1) punctuation, space, or case was different; 2) author’s middle name was omitted; 3) title of non-English language paper and non-English language journal’s name were translated into different words, but their meanings were identical; 4) journal’s name was expressed in full or abbreviated style; 5) publication date was expressed in “year” or “year month (day)” style; and 6) volume, issue, or page was expressed in different styles, but their meanings were identical (see examples in Table 1).

Difference between index and redundant paper(s) would be considered unacceptable to readers and reference reviewers, if the information was wrongly expressed. These wrong styles included: 1) author’s name and order, title of English language paper, and/or journal’s name was wrongly recorded, added, or missing; and 2) publication date, volume, issue, and/or page was wrong or missing (see examples in Table 1). QX further obtained the full-texts of the corresponding papers to identify the database which the wrong information originated from. In the cases where some full-text papers could not be obtained, we were uncertain about which database the wrong information originated from.

Data analysis

The count data and/or percentage were reported in texts or bar charts. The prevalence of duplicates with 95% confidence intervals (CI) was calculated as follows:

The proportion of type I and II duplicates was compared between auto-searching and hand-searching methods. The prevalence of different and wrong items in type I duplicates was compared between auto-searching and hand-searching methods. Two-tailed P values <0.05 were considered statistically significant. The statistical analyses were performed in SPSS 12.0 (SPSS Inc, Chicago, Ill).

Results

Portal vein thrombosis literatures

Overall, 10936 papers were identified via the three databases, including 6733 from PubMed database, 4002 from EMBASE database, and 201 from Cochrane library database (Figure 1A).

thumbnail
Figure 1. Study flowchart of finding duplicates in the literatures regarding portal vein thrombosis (panel A) and Budd-Chiari syndrome (panel B).

https://doi.org/10.1371/journal.pone.0071838.g001

Auto-searched duplicates.

Initially, 2401 papers were identified as auto-searched duplicates. Notably, 2 papers with the same author, title, and publication date were excluded from duplicates, because both of them reported different contexts in different issues. Thus, 2399 papers were auto-searched duplicates, including 1198 index papers and 1201 redundant papers (Table 3). The prevalence of auto-searched redundant papers was 11.0% (95%CI: 10.4%–11.6%).

thumbnail
Table 3. Characteristics of duplicates in literatures regarding portal vein thrombosis and Budd-Chiari syndrome.

https://doi.org/10.1371/journal.pone.0071838.t003

Of 2385 type I duplicates, 14 had the completely same items between index and redundant papers. The remaining 2371 duplicates had at least one different item between index and redundant papers. Publication date (92.8%, 2213/2385) was the most commonly different item, followed by journal’s name (88.7%, 2115/2385), title (31.2%, 744/2385), page (5.8%, 139/2385), issue (3.0%, 71/2385), volume (0.9%, 21/2385), and author (0.6%, 14/2385) (Table 4). Only 2.0% (47/2385) of duplicates were considered unacceptable due to wrong information. Page (1.8%, 43/2385) was the most commonly wrong item, followed by issue (1.0%, 23/2385), volume (0.9%, 21/2385), publication date (0%, 0/2385), journal’s name (0%, 0/2385), title (0%, 0/2385), and author (0%, 0/2385).

thumbnail
Table 4. Type I duplicates – difference between index and redundant papers.

https://doi.org/10.1371/journal.pone.0071838.t004

EMBASE database had the highest proportion of wrong information regarding page, issue, and volume items (Figure 2A).

thumbnail
Figure 2. Proportion of wrong information of auto-searched (panel A) and hand-searched (panel B) type I duplicates from the literatures regarding portal vein thrombosis and that of auto-searched (panel C) and hand-searched (panel D) type I duplicates from literatures regarding Budd-Chiari syndrome.

https://doi.org/10.1371/journal.pone.0071838.g002

Hand-searched duplicates.

After auto-searched redundant papers were removed, 1307 papers were further identified as hand-searched duplicates, including 642 index papers and 665 redundant papers (Table 3). The prevalence of hand-searched redundant papers was 6.1% (95%CI: 5.6%–6.5%).

Of 1046 type I duplicates, all had at least one different item between index and redundant papers. Journal’s name (87.1%, 909/1046) was the most commonly different item in hand-searched duplicates, followed by publication date (82.1%, 857/1046), author (53.2%, 555/1046), title (52.3%, 546/1046), page (22.1%, 231/1046), volume (15.4%, 161/1046), and issue (3.3%, 34/1046) (Table 4). 47.6% (498/1046) of duplicates were considered unacceptable due to wrong information. Author (27.4%, 286/1046) was the most commonly wrong item in hand-searched duplicates, followed by page (17.7%, 185/1046), volume (15.4%, 161/1046), title (2.8%, 29/1046), issue (2.1%, 22/1046), publication date (1.5%, 16/1046), and journal’s name (0.4%, 4/1046).

EMBASE database had the highest proportion of wrong information regarding author, title, journal, and publication date items. Cochrane library database had the highest proportion of wrong information regarding volume and page items. PubMed database had the highest proportion of wrong information regarding issue item (Figure 2B).

Comparison.

The number of duplicates identified by auto-searching methods was larger than that identified by hand-searched duplicates (2399 versus 1307). Most of type I duplicates were identified by auto-searching methods (69.5%, 2385/3431). The proportion of type I duplicates among the auto-searched duplicates was significantly higher than that among the hand-searched duplicates (2385/2399 versus 1046/1307, p<0.0001). Nearly all type II duplicates were identified by hand-searching methods (94.9%, 261/275). The proportion of type II duplicates among the auto-searched duplicates was significantly lower than that among the hand-searched duplicates (14/2399 versus 261/1307, p<0.0001).

Compared with those identified by auto-searching methods, type I duplicates identified by hand-searching methods had a significantly higher prevalence of different and wrong items (different items: 2371/2385 versus 1046/1046, p = 0.008; wrong items: 47/2385 versus 498/1046, p<0.0001).

Budd-Chiari syndrome literatures

Overall, 11403 papers were identified via the three databases, including 5894 from PubMed database, 5278 from EMBASE database, and 231 from Cochrane library database (Figure 1B).

Auto-searched duplicates.

3275 papers were identified and verified as auto-searched duplicates, including 1635 index papers and 1640 redundant papers (Table 3). The prevalence of auto-searched redundant papers was 14.4% (95%CI: 13.7%–15.0%).

Of 3263 type I duplicates, 18 had the completely same items between index and redundant papers. The remaining 3245 duplicates had at least one different item between index and redundant papers. Publication date (94.7%, 3091/3263) was the most commonly different item, followed by journal’s name (87.3%, 2847/3263), title (31.0%, 1011/3263), page (4.7%, 154/3263), issue (4.1%, 133/3263), author (0.9%, 28/3263), and volume (0.6%, 20/3263) (Table 2). Only 0.9% (30/3263) of duplicates were considered unacceptable due to wrong information. Page (0.9%, 30/3263) was the most commonly wrong item, followed by issue (0.6%, 18/3263), volume (0.5%, 16/3263), publication date (0%, 0/3263), journal’s name (0%, 0/3263), title (0%, 0/3263), and author (0%, 0/3263).

EMBASE database had the highest proportion of wrong information regarding page, issue, and volume items (Figure 2C).

Hand-searched duplicates.

After 1640 auto-searched redundant papers were removed, 2064 papers were further identified as hand-searched duplicates, including 1025 index papers and 1039 redundant papers (Table 3). The prevalence of hand-searched redundant papers was 9.1% (95%CI: 8.6%–9.6%).

Of 1790 type I duplicates, all had at least one different item between index and redundant papers. Journal’s name (85.1%, 1523/1790) was the most commonly different item in hand-searched duplicates, followed by title (59.3%, 1062/1790), author (56.0%, 1002/1790), publication date (55.0%, 985/1790), page (8.9%, 160/1790), volume (5.4%, 97/1790), and issue (4.0%, 72/1790) (Table 4). 43.5% (778/1790) of duplicates were considered unacceptable due to wrong information. Author (36.0%, 644/1790) was the most commonly wrong item in hand-searched duplicates, followed by volume (5.1%, 92/1790), page (4.1%, 74/1790), issue (2.3%, 42/1790), title (2.1%, 37/1790), publication date (0.4%, 8/1790), and journal’s name (0.2%, 4/1790).

EMBASE database had the highest proportion of wrong information regarding author, title, journal’s name, and publication date items. Cochrane library database had the highest proportion of wrong information regarding volume, issue, and page items (Figure 2D).

Comparison.

The prevalence of duplicates identified by auto-searching methods was significantly higher than that identified by hand-searching methods (3275/11403 versus 2064/11403, p<0.0001). Most of type I duplicates were identified by auto-searching methods (64.6%, 3263/5053). The proportion of type I duplicates among the auto-searched duplicates was significantly higher than that among the hand-searched duplicates (3263/3275 versus 1790/2064, p<0.0001). Nearly all type II duplicates were identified by hand-searching methods (95.8%, 274/286). The proportion of type II duplicates among the auto-searched duplicates were significantly lower than that among the hand-searched duplicates (12/3275 versus 274/2064, p<0.0001).

Compared with those identified by auto-searching methods, type I duplicates identified by hand-searching methods had a significantly higher prevalence of different and wrong items (different items: 3245/3263 versus 1790/1790, p = 0.001; wrong items: 30/3263 versus 778/1790, p<0.0001).

Discussion

Finding duplicates among different databases is an indispensable and important phase of systematic review. The phase is not as easy as we expected according to our previous experiences of systematic reviews [6], [7], [8], [9]. However, little attention has been paid to this phase. To our knowledge, this study is the first systematic analysis of duplicates among the three databases commonly used by systematic review (i.e., PubMed, EMBASE, and Cochrane library database). We attempted to devise a scheme to identify duplicates in a systematic review (Figure 3). In this scheme, we employed two methods to find duplicates (i.e., auto-search and hand-search duplicates) and two approaches to find hand-searched duplicates (i.e., alphabetical order of literatures according to the first authors and titles). Indeed, the process of auto-searching duplicates can be easily accomplished by Endnote library software. By comparison, the process of hand-searching duplicates is really a time-consuming and careful work. Four review authors spent more than four weeks on finding hand-searched duplicates, and two of them also paid another two weeks for checking the accuracy of these works. Certainly, further studies should be designed to assess the practical utility of this method in systematic review.

thumbnail
Figure 3. Simplified scheme to identify duplicates in systematic review.

The scheme includes the third main steps. First, all literatures retrieved from different databases are combined into one Endnote library. In this Endnote library, “Find Duplicates” preferences are defined on “Edit” menu. Thus, duplicates can be automatically searched by Endnote library. Subsequently, the review authors should check the accuracy and identify the type of duplicates. Finally, the redundant papers are excluded. Considering that a single strategy of auto-searching method was inadequate, additional search should be very necessary. Second, the remaining literatures are alphabetically ordered according to the first authors’ names in the Endnote library. If the first authors were the same between two or more articles, the review authors would further read the titles, journals’ names, volumes, issues, and pages. Subsequently, if these articles had the same titles, journals’ names, and issues, they would be attributed to the type I duplicates. Notably, the review authors should identify whether the difference between index and redundant papers was acceptable or not. On the other hand, if these had the same or similar titles but different journals or issues, the review authors would further read the abstracts and/or full-texts to judge whether or not they could be attributed to the type II duplicates. Third, the remaining literatures were also alphabetically ordered according to the titles in the Endnote library. If the titles were the same between two or more articles, the review authors would further read the journals’ names, volumes, issues, and pages. Subsequently, if these articles had the same journals’ names and issues, they would be attributed to type I duplicates. Notably, the review authors should identify whether the difference between index and redundant papers was acceptable or not. On the other hand, if these articles had the same or similar titles but different journals or issues, the review authors would further read the abstracts and/or full-texts to judge whether or not they could be attributed to the type II duplicates. Finally, review authors should check the accuracy.

https://doi.org/10.1371/journal.pone.0071838.g003

A major finding of our study was that a large number of duplicates could be found among the three databases in systematic review. Notably, about 10% of literatures remained duplicates among the three databases after auto-searching duplicates, which strongly suggested the necessity of hand-searching duplicates in systematic review.

We further compared the difference of reference items between index and redundant papers of type I duplicates. Nearly all type I duplicates had different items between index and redundant papers. Regardless of the literatures regarding portal vein thromobosis or Budd-Chiari syndrome and auto-searched or hand-searched duplicates, “journal’s name”, “publication date”, and “title” were three most commonly different items. Most of them were acceptable, for example, journal’s name was expressed in full or abbreviated style, publication date was expressed in “year” or “year month” style, and titles used different punctuations and/or cases in different databases. This finding could be potentially explained by the fact that each database had its own special reference type. Other items were uncommon, but were mostly unacceptable. For example, author, volume, issue, or page was wrong or missing. These mistakes should be corrected, thereby decreasing the prevalence of type I duplicates.

In addition, our study explored the origin of wrong information in type I duplicates. Regardless of the literatures regarding portal vein thrombosis or Budd-Chiari syndrome, EMBASE database had the highest proportion of wrong information regarding author, title, journal, and publication date items. These mistakes in EMBASE database were severe (see examples in Table 1), because they not only misled the readers but also disrespected the researchers. Cochrane library database had the highest proportion of wrong information regarding volume and page items in type I duplicates. This was primarily due to the reference type of Cochrane library database (volume and page were not provided). By comparison, only a minority of wrong information in type I duplicates originated from PubMed database. These findings suggested the following: 1) the accuracy of reference information recorded by EMBASE database should be substantially improved; and 2) the same reference type among these databases may be beneficial for literature screening.

Auto-searching methods could identify a larger number of duplicates, especially type I duplicates. However, only a very small proportion of type II duplicates could be identified by auto-searching methods (5.1% in portal vein thrombosis literatures; and 4.8% in Budd-Chiari syndrome literatures). This phenomenon could be readily explained by the fact that the authors, titles, and publication years were often different between index and redundant papers among type II duplicates. Additionally, the wrong reference items were rarely observed among type I duplicates identified by auto-searching methods, but very frequently among those identified by hand-searching methods. This finding also suggested the limitation of auto-searching duplicates, in which “author”, “title”, and “publication date” items should be exactly matched between two literatures. Accordingly, the necessity of combining auto- and hand-searching methods should be fully recognized in finding duplicates in systematic reviews.

Limitations

Several limitations of this study should be clearly recognized. First, the selection of portal vein thrombosis and Budd-Chiari syndrome literatures was based on our subjectivity. Accordingly, the conclusions achieved by analyzing these literatures might be unsuitable for the literatures from other fields. But it should be noted that we employed a comprehensive search strategy and literatures of two fields to strengthen our conclusions. And given that the results were similar between portal vein thrombosis and Budd-Chiari syndrome literatures, it was possible that these findings of our study might be generalizable. Certainly, further studies should be warranted to compare the frequency of wrong information from a random sample of literatures among the three databases. Second, only three databases were searched in our study. This behavior might underestimate the prevalence of duplicates among databases. However, given that PubMed, EMBASE, and Cochrane library were three most common databases used for systematic review, our results should be a representative sample. Third, only two approaches were employed in this study to identify hand-searched duplicates. It was not easy to find duplicates as both the first author’s name and title were different between index and redundant papers. Thus, the prevalence of duplicates might be underestimated. Fourth, a minority of full texts could not be obtained to identify the origin of wrong information. However, it should be noted that we tried our best to contact with the authors and seek help from our and other University libraries. And these unavailable full texts did not substantially influence our judgment on the proportion of wrong information in different databases.

Conclusions

In conclusions, a high prevalence of duplicates could be identified among the PubMed, EMBASE, and Cochrane Library databases in systematic review. These findings were primarily attributed to the effect of a pragmatic strategy of combining auto- and hand-searching methods to find duplicates. Indeed, a single strategy of auto-searching method was inadequate to find duplicates, especially type II duplicates. In general, to enhance the transparency of systematic review, PRISMA might require the reporting of the detailed information regarding the methods to find duplicates and the quantity of duplicates identified by different methods. In addition, considering that wrong reference items were frequently observed in type I duplicates identified by hand-searching methods, we strongly recommended that the information of every reference should be strictly examined and carefully inputted by database administrators.

Supporting Information

File S1.

Duplicates identified by auto-searching method in portal vein thrombosis literatures.

https://doi.org/10.1371/journal.pone.0071838.s001

(XLS

File S2.

Type-I duplicates identified by hand-searching method in portal vein thrombosis literatures.

https://doi.org/10.1371/journal.pone.0071838.s002

(XLS)

File S3.

Type-II duplicates identified by hand-searching method in portal vein thrombosis literatures.

https://doi.org/10.1371/journal.pone.0071838.s003

(XLS)

File S4.

Duplicates identified by auto-searching method in Budd-Chiari syndrome literatures.

https://doi.org/10.1371/journal.pone.0071838.s004

(XLS)

File S5.

Type-I duplicates identified by hand-searching method in Budd-Chiari syndrome literatures.

https://doi.org/10.1371/journal.pone.0071838.s005

(XLS)

File S6.

Type-II duplicates identified by hand-searching method in Budd-Chiari syndrome literatures.

https://doi.org/10.1371/journal.pone.0071838.s006

(XLS)

Acknowledgments

We were greatly indebted to the Associate Editor and anonymous reviewers for constructive comments on our manuscript.

Author Contributions

Conceived and designed the experiments: XQ WR MY JJ JW GH DF. Performed the experiments: XQ WR MY JJ JW. Analyzed the data: XQ WR MY JJ JW GH DF. Contributed reagents/materials/analysis tools: XQ WR MY JJ JW. Wrote the paper: XQ. Critical revision of the manuscript for important intellectual content: XQ WR MY JJ JW GH DF. Responsibility for the integrity of the data and the accuracy of the data analysis: XQ.

References

  1. 1. Moher D, Liberati A, Tetzlaff J, Altman DG (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ 339: b2535.
  2. 2. Cochrane Handbook for Systematic Reviews of Interventions. In: Higgins JPT, Green S, editors. Version 5.1.0 ed. Available: http://handbook.cochrane.org/. Accessed 2013 Jul 11.
  3. 3. Cook DJ, Greengold NL, Ellrodt AG, Weingarten SR (1997) The relation between systematic reviews and practice guidelines. Ann Intern Med 127: 210–216.
  4. 4. Cook DJ, Mulrow CD, Haynes RB (1997) Systematic reviews: synthesis of best evidence for clinical decisions. Ann Intern Med 126: 376–380.
  5. 5. Moher D, Tetzlaff J, Tricco AC, Sampson M, Altman DG (2007) Epidemiology and reporting characteristics of systematic reviews. PLoS Med 4: e78.
  6. 6. Qi X, De Stefano V, Wang J, Bai M, Yang Z, et al. (2013) Prevalence of inherited antithrombin, protein C, and protein S deficiencies in portal vein system thrombosis and Budd-Chiari syndrome: a systematic review and meta-analysis of observational studies. J Gastroenterol Hepatol 28: 432–442.
  7. 7. Qi X, Wu F, Ren W, He C, Yin Z, et al. (2013) Thrombotic risk factors in Chinese Budd-Chiari syndrome patients. An observational study with a systematic review of the literature. Thromb Haemost 109: 878–884.
  8. 8. Qi X, Chen H, Han G (2013) Effect of antithrombin, protein C and protein S on portal vein thrombosis in liver cirrhosis: a meta-analysis. Am J Med Sci 346: 38–44.
  9. 9. Qi X, Yang Z, Bai M, Shi X, Han G, et al. (2011) Meta-analysis: the significance of screening for JAK2V617F mutation in Budd-Chiari syndrome and portal venous system thrombosis. Aliment Pharmacol Ther 33: 1087–1103.
  10. 10. Qi X, He C, Han G, Yin Z, Wu F, et al. (2013) Prevalence of paroxysmal nocturnal hemoglobinuria in Chinese patients with Budd-Chiari syndrome or portal vein thrombosis. J Gastroenterol Hepatol 28: 148–152.
  11. 11. Qi X, Zhang C, Han G, Zhang W, He C, et al. (2012) Prevalence of the JAK2V617F mutation in Chinese patients with Budd-Chiari syndrome and portal vein thrombosis: a prospective study. J Gastroenterol Hepatol 27: 1036–1043.
  12. 12. Qi X, Ren W, Liu L, Yang Z, Yang M, et al. (2013) Prevalence of covert duplicate publications in Budd-Chiari syndrome papers in China: a systematic analysis. Am J Med 126: 633–639.
  13. 13. von Elm E, Poglia G, Walder B, Tramer MR (2004) Different patterns of duplicate publication: an analysis of articles used in systematic reviews. JAMA 291: 974–980.