The Efficacy and Safety of Different Kinds of Laparoscopic Cholecystectomy: A Network Meta Analysis of 43 Randomized Controlled Trials

Background and Objective We conducted a network meta analysis (NMA) to compare different kinds of laparoscopic cholecystectomy [LC] (single port [SPLC], two ports [2PLC], three ports [3PLC], and four ports laparoscopic cholecystectomy [4PLC], and four ports mini-laparoscopic cholecystectomy [mini-4PLC]). Methods PubMed, the Cochrane library, EMBASE, and ISI Web of Knowledge were searched to find randomized controlled trials [RCTs]. Direct pair-wise meta analysis (DMA), indirect treatment comparison meta analysis (ITC) and NMA were conducted to compare different kinds of LC. Results We included 43 RCTs. The risk of bias of included studies was high. DMA showed that SPLC was associated with more postoperative complications, longer operative time, and higher cosmetic score than 4PLC, longer operative time and higher cosmetic score than 3PLC, more postoperative complications than mini-4PLC. Mini-4PLC was associated with longer operative time than 4PLC. ITC showed that 3PLC was associated with shorter operative time than mini-4PLC, and lower postoperative pain level than 2PLC. 2PLC was associated with fewer postoperative complications and longer hospital stay than SPLC. NMA showed that SPLC was associated with more postoperative complications than mini-4PLC, and longer operative time than 4PLC. Conclusion The rank probability plot suggested 4PLC might be the worst due to the highest level of postoperative pain, longest hospital stay, and lowest level of cosmetic score. The best one might be mini-4PLC because of highest level of cosmetic score, and fewest postoperative complications, or SPLC because of lowest level of postoperative pain and shortest hospital stay. But more studies are needed to determine which will be better between mini-4PLC and SPLC.


Background
Laparoscopic cholecystectomy (LC) has been considered the golden standard for cholecystectomy to manage benign gallbladder disease since 1986 [1][2][3]. Usually, the standard LC is done using four trocars [3]. These include one port for the camera; one port for instruments used to carry out the dissection, diathermy, clip application; and two ports for manipulation of the gallbladder for adequate exposure of the field of surgery [4]. The fourth (lateral) trocar is used to grasp the fundus of the gallbladder so as to expose Calot's triangle [3,5]. With increasing surgeon experience, it was argued that the fourth trocar may not be necessary, and LC can be performed safely without using it [3,5]. As a result, three ports laparoscopic cholecystectomy (3PLC) was developed [6,7]. It was thought that reduced size, smaller incision, and fewer ports for LC will improve cosmetic results, decrease pain, and minimize postoperative complication [8,9]. So a trend toward even more minimally invasive approaches, such as smaller ports, mini-ports, and reduced ports, has led to the advent of laparoscopic surgery and its continuous development of laparoscopic surgery [10]. Until 1997, Navarra et al. [11] described the first single port laparoscopic cholecystectomy (SPLC), the LC underwent four stages: four ports (4PLC), three ports (3PLC), two ports (2PLC) and single port (SPLC) according to reduced ports. Then a mini-laparoscopic cholecystectomy (mini-PLC) with smaller ports and incisions was also developed. It was said that SPLC represents the next step in laparoscopic surgery in further reducing the invasiveness of surgical procedures with cosmetic advantages [12]. Although current guidelines recommend performing cholecystectomy via laparoscopy [13], we were not sure what kinds of LC will be the golden standard with minimizing morbidity, decreasing pain and improving cosmetic results. So we conducted a network meta analysis [NMA] to compare different kinds of LC (SPLC, 2PLC, 3PLC, 4PLC, and four ports minilaparoscopic cholecystectomy (mini-4PLC)).

Methods
We did this systematic review of the available literature in accordance with the PRISMA guidelines [14] for the conduct of meta-analyses of intervention trials.

Data sources
PubMed, the Cochrane library, EMBASE, and ISI Web of Knowledge were searched to find randomized controlled trials (RCTs) and meta analysis using laparoscopic cholecystectomy. Medical Subject Headings terms were also added in all searches for Pubmed, Embase, and the Cochrane Library. Reference lists from the meta-analysis, review articles about this topic and identified trials were hand-searched to identify further relevant citations. The search strategy was developed by two reviewers (Lun Li and Jinhui Tian who is a professional searcher over ten years' experience) and peer-reviewed by a third reviewer (Kehu Yang). And the searches were conducted independently by two reviewers (Lun Li and Jinhui Tian) using the same search strategy to avoid the potential mistakes by anyone of them. The search was conducted in August 2013 without language, date, and publication status restrictions; differences were checked by each other and resolved by discussion. The search was updated in 2013, 1 ST December.

Inclusion criteria and study selection
The study type should be RCT which used randomized methods according to what they reported. Those studies which used quasi-randomized methods according to what they reported were excluded. The studies should compare two or three surgery instruments (SPLC, 2PLC, 3PLC, 4PLC, and mini-4PLC). SPLC was defined as laparoscopic excision of the gallbladder performed through a single abdominal incision using either a multiport device or different individual ports through the same single skin incision [15]. For 2PLC, 3PLC, and 4PLC, the instruments should be at least 5 mm. For mini-4PLC, two to three of the four instruments should be at least less than 5 mm. Only published articles in English were included, meeting abstracts, and unpublished data were not included in this NMA.
Two independent reviewers (Lun Li and Hongliang Tian) selected the retrieved citations based on titles and abstracts, and full-texts of potential eligible studies were read to decide whether to include based on inclusion criteria. Disagreements were resolved by discussion, and if not, a third reviewer (Kehu Yang) was involved.

Data abstraction and quality assessment
Data was entered into an Excel database by two authors (Lun Li and Jinhui Tian). The following fields were abstracted: country, patient characteristics (age, sex and other baseline characteristics), disease, follow-up duration, and outcomes. Outcomes were extracted preferentially by intention to treat method. Any disagreements were resolved by a third reviewer (Kehu Yang).
The methodological quality was evaluated by two independent reviewers (Lun Li and Rao Sun) and resolved differences by consultation with a third reviewer (Kehu Yang). The following items were assessed according Cochrane handbook 5.0 [16]: randomization, blinding, concealed allocation, selective reporting, incomplete outcome data, and other biases.

Data analysis
The outcomes we evaluated were postoperative pain using visual analogue scale (VAS) at the first day, the number of patients who needed additional analgesics, postoperative complications, intra-operative blood loss, cosmetic score, hospital stay and operative time.
Direct pair-wise meta analysis (DMA) was conducted by Review Manager Version 5.0. For dichotomous outcomes, results were expressed as odds ratio (OR) with 95% confidence interval (CI). If there were continuous scales of measurement, the mean difference (MD) was used to assess the effects of treatment. The percentage of variability across trials attributable to heterogeneity beyond chance was estimated with the I 2 statistic, which was deemed significant when p was less than 0.05 or I-square was more than 50%. Data was pooled using the fixed-effect model but the random-effects model was also considered to ensure robustness of the model in case of significant heterogeneity.
When direct evidence was lacked, indirect treatment comparison meta analysis (ITC) was retrieved from available evidence. Indirect data was got using ITC software (http://www.cadth.ca/ en/resources/about-this-guide/download-software). Here we only calculated an indirect result between two comparisons. For example, if there were two comparisons (A vs. B, B vs. C), an indirect result (A vs. C) was calculated. If there were three or more comparisons (A vs. B, B vs. D, D vs. C), we did not carry an indirect calculation, although it is feasible. For those with different pathways to produce the indirect evidence (we mean different comparators, such as A vs. B, B vs. C and A vs. D, D vs. C), we calculated different indirect results and then the pooled indirect results were calculated using inverse variance method and each estimate is 'weighted' by the inverse of the variance.
Network Meta-Analysis (NMA) is a technique to meta-analyze more than two drugs at the same time. Using a full Bayesian evidence network, all indirect comparisons are taken into account to arrive at a single, integrated, estimate of the effect of all included treatments based on all included studies. NMA was conducted using ADDIS software. We also produced the rank probability plot by ADDIS software to show which LC was the best. The data was expressed as odds ratio (OR) or MD with 95% Credibility Interval (CrI).
For inconsistency, we undertook a node-splitting analysis by ADDIS software to assess whether direct and indirect evidence on the split node is in agreement [17]. Meanwhile, the methods described by Song [18] were also used to test the difference between DMA or ITC and NMA evidence. A Z value was calculated to show the difference. If the absolute value of Z was more than 1.645, we thought the p value for Z test was less than 0.05. It is deemed significant when p was less than 0.05.

Search results
We got 7644 citations from databases and 89 citations from reference checking. Finally we included 43 RCTs [5][6][7]. The searching results and selection process was presented in Figure 1.

Quality assessment results
All studies mentioned randomization, but only 13 studies reported the details of the randomized methods and 17 studies mentioned the details of concealed allocations. 11 studies mentioned the methods of blinding, however, patients and assessors were blinded in five studies, patients were blinded in three studies, assessors were blinded in two studies and surgeons were blinded in one study. (Table 2).

Direct pair-wise meta analysis (DMA)
According to the results of DMA, we could see that SPLC was associated with more postoperative complications and higher cosmetic score than 4PLC, longer operative time and higher cosmetic score than 3PLC, more postoperative complications than mini-4PLC. Mini-4PLC was associated with longer operative time than 4PLC. No significantly statistical differences were found in other outcomes between different comparisons. (Table 3, Table 4, Table 5).

Indirect comparison (ITC) and network meta analysis (NMA)
According to the results of ITC, 3PLC was associated with shorter operative time than mini-4PLC and lower postoperative pain level than 2PLC. 2PLC was associated with fewer postoperative complication and longer hospital stay than SPLC. The NMA showed that SPLC was associated with more postoperative complications than mini-4PLC, and longer operative time than 4PLC. (Table 3, Table 4, Table 5).
Inconsistency between DMA/ITC and NMA, heterogeneity for DMA Node-splitting analysis (Table S1) did not detect any inconsistency among DMA, ITC and NMA except postoperative complications between mini-4PLC and SPLC. Node-splitting analysis showed that there might be inconsistency for postoperative complications (p = 0.01) among DMA, ITC and NMA. Z test did not find any inconsistency DMA/ITC and NMA (Table S2). Even so, high heterogeneity existed for most outcomes in DMA (Table S3).

Rank probability
From the rank probability plot (Table 6), we could see that mini-4PLC has the highest level of cosmetic score, fewest postoperative complications, and lowest amount of intra-operative blood loss. 4PLC has the highest level of postoperative pain, most patients who needed additional analgesics, longest hospital stay, and lowest level of cosmetic score. SPLC has the most post-operative complications, highest amount of intra-operative blood loss, longest operative time, lowest level of postoperative pain, fewest patients who needed additional analgesics and shortest hospital stay. 2PLC has shortest operative time.

Summary of finding
Although DMA showed some statistical differences between different groups regarding to the outcomes we focused on, the NMA did not find any significant statistical differences except postoperative complications. However, evidence for this outcome from NMA was not consistent between DMA, ITC and NMA by node-splitting analysis. The rank probability plot suggested 4PLC might be the worst one due to the highest level of postoperative pain, most patients who needed additional analgesics, longest hospital stay, and lowest level of cosmetic score. The best one might be mini-4PLC because of highest level of cosmetic score, fewest postoperative complications, and lowest amount of intraoperative blood loss or SPLC because of lowest level of postoperative pain, fewest patients who needed additional analgesics and shortest hospital stay. However, SPLC has most post-operative complications and highest amount of intra-operative blood loss.
For postoperative pain at the first day, significant differences existed between 3PLC and 4PLC (DMA), 3PLC and 2PLC (ITC). The rank probability showed SPLC might be the best in reducing the first day postoperative pain, and 4PLC might be the worst. Although the inconsistency between DMA or ITC and NMA could not be detective by node-splitting analysis and Z test, the heterogeneity among included studies for direct evidence existed. That might be because of different anesthetics used before surgery and anesthetic prophylaxes after surgery. Due to this point, we did not calculate the amount of anesthetics consumption; we calculated the number of patients who required additional analgesics. And that was why we used the postoperative first day pain level that was measured using VAS at the first postoperative day. This is consistent with the results of the number of patients who required additional analgesics. The rank probability showed that patients in 4PLC group used the most additional analgesics and patients in SPLC group used the fewest additional analgesics, although no significant differences were found in DMA, ITC and NMA.
For postoperative complication, significant differences existed between SPLC and mini-4PLC (DMA), SPLC and 4PLC (DMA and NMA), SPLC and 2PLC (ITC). Rank probability showed that mini-4PLC was associated with fewest postoperative complications, and SPLC was associated with most postoperative complications. Among the included studies, 18 studies reported postoperative complications for SPLC with a median rate of   heterogeneity for the direct evidence. So inconsistence model in ADDIS software was used, but similar results were found. For cosmetic scores, statistical significances existed between SPLC and 3PLC, SPLC and 4PLC. And the rank probability showed that mini-4PLC has the best cosmetic scores, and 4PLC has the worst cosmetic scores. Although no any inconsistency existed, high heterogeneity was common among direct comparisons. The high heterogeneity might be because of different measurements for cosmetic score. Some studies used a five-point scale [20,33,47], some studies used a ten-point scale [31,44,49,56,58], some studies used other scale, such as 24 points [40], 40 points [42]. Sensitive analysis was conducted to analyze the cosmetic scores among studies who used ten-point scale. The results for DMA, ITC and NMA did not show any statistical differences. And the rank probability for sensitive analysis was consistent with the previous probability.
For hospital stay, DMA and NMA did not show any significant differences; only ITC showed that 2PLC was associated with longer hospital stay than SPLC. And the rank probability showed SPLC was associated with shortest hospital stay, and 4PLC was associated with longest hospital stay. Due to some studies used hours to measure the length of hospital stay, we conduct sensitive analysis. Sensitive analysis of DMA, ITC and NMA showed no differences among any two comparisons. And the rank probability of sensitive analysis was consistent with the previous one. As LC has a faster recovery, many hospitals conducted day-surgery rather than overnight stay surgery. And culture and hospital types could also affect the length of hospital stay. And these factors might be the reasons for the heterogeneity of the direct evidence.
Two operative outcomes, operative time and intra-operative blood loss, were evaluated. Significances existed between SPLC and 4PLC, SPLC and 3PLC, mini-4PLC and 4PLC (DMA), mini-4PLC and 3PLC (ITC), SPLC and 4PLC (NMA) for operative Table 3. Meta analysis for postoperative pain, additional analgesics and intra-operative blood loss.   time. The rank probability showed that SPLC was associated with the longest operative time, and 2PLC was associated with the shortest operative time. For intra-operative blood loss, no significant differences were found in DMA, ITC and NMA. There were several systematic reviews [1,11,13,15,[59][60][61][62] published in 2013. And the results of our DMA were consistent with their results. Similar to these meta analysis, high heterogeneity was common, although we strictly restricted studies to those which used the same measurement at the same time, for example, postoperative pain using VAS at the first day. We also conducted sensitive analysis by excluding studies which used different measurement units, but results did not change for DMA. ITC was also conducted when there was no DMA evidence. Although no inconsistencies were found between DMA/ITC and NMA using Z test, node-splitting analysis showed there were not any inconsistencies among DMA, ITC and NMA except postoperative complications. Although we used inconsistency model to analyze the data, the results and conclusions did not change.

Strengths and limitations
This is the first ITC and NMA which compared different kinds of LC. We also calculated the inconsistency using node-splitting analysis and Z test. Inconsistency model and sensitive analysis were used to test the stability of the results, and the results did not change for DMA and NMA. However, our NMA has its own limitations: first, our NMA only included studies which specified how many ports they used during their surgery. For those studies that it is hard to judge whether 4PLC or 3PLC, we excluded them. For example, study conducted by Vilallonga [63] did not specify what their standard LC is, so we excluded it. Second: we did not include quasi-randomized studies. For example, we excluded two studies [64,65] as they used quasi-randomized study design. We included lots of studies (30/43) which just mentioned randomization, but they did not report the detail of the randomization. Due to the high risk of bias in most of the studies, the results of our DMA, ITC and NMA might be biased. Third: the heterogeneity for DMA is high. It was said that heterogeneity between the sets of studies that contribute direct comparisons to an indirect comparison or a network meta-analysis would indicate a lack of similarity [66]. We checked the clinical and methodological similarity among all included studies, and then we found indeed there were some differences among all included studies, such as different analgesics used before and after surgery, different instruments during the surgery, studies from different countries, and some other variances for the LC. Even so, inconsistency was not found for most outcomes, except postoperative complications. However, the inconsistency model did not change the results. Fourth: there were many factors that might affect length of hospital stay, such as culture differences and hospital types; however, we did not conduct subgroup analysis due to limited data.  Implications to future research and practice Most included studies did not mention the details of randomization and concealed allocation, nearly all of them were of small sample size. In the future randomized controlled studies of big sample size should be well conducted and adequately reported. For outcomes, such as postoperative pain, hospital stay should be measured using international standards, such as VAS for pain, day for hospital stay. Regarding to cosmetic scores, too many scales were used in the primary studies, which scale will be better to measure the cosmetic satisfaction? This needs a comparative study to test the validity of different scales. Based on our NMA, we could see that 4PLC might be the worst, but it is hard to decide which one is the best, as few studies compared SPLC with mini-4PLC. The rank probability showed that either SPLC or mini-4PLC will be the best, although SPLC has the most post-operative complications, highest amount of intra-operative blood loss, and longest operative time. As a result, future more studies were needed to compare SPLC with mini-4PLC.
Based on the rank probability, we should make sure to let patients know that SPLC was associated with lowest postoperative pain, most postoperative complications, and longest hospital stay, mini-4PLC was associated with high level cosmetic score and fewest complications. For surgeons, when conducting SPLC, please pay attention to the intra-operative blood loss and postoperative complications.