The development of laparoscopic skills using virtual reality simulations: A systematic review

Background Teaching based on virtual reality simulators in medicine has expanded in recent years due to the limitations of more traditional methods, especially for surgical procedures such as laparoscopy. Purpose of review To analyze the effects of using virtual reality simulations on the development of laparoscopic skills in medical students and physicians. Data sources The literature screening was done in April 2020 through Medline (PubMed), EMBASE and Database of the National Institute of Health. Eligibility criteria Randomized clinical trials that subjected medical students and physicians to training in laparoscopic skills in virtual reality simulators. Study appraisal Paired reviewers independently identified 1529 articles and included 7 trials that met the eligibility criteria. Findings In all studies, participants that trained in virtual simulators showed improvements in laparoscopic skills, although the articles that also had a physical model training group did not show better performance of one model compared to the other. Limitations No article beyond 2015 met the eligibility criteria, and the analyzed simulators have different versions and models, which might impact the results. Conclusion Virtual reality simulators are useful educational tools, but do not show proven significant advantages over traditional models. The lack of standardization and a scarcity of articles makes comparative analysis between simulators difficult, requiring more research in the area, according to the model suggested in this review. Systematic review registration number Registered by the Prospective Register of Systematic Reviews (PROSPERO), identification code CRD42020176479.


Findings
In all studies, participants that trained in virtual simulators showed improvements in laparoscopic skills, although the articles that also had a physical model training group did not show better performance of one model compared to the other.

Introduction
In 1950 the time it took for medical knowledge to double was estimated to be 50 years, whilst in 2020 that time would be 73 days [1]. To keep up with this growth and adapt to the challenges that healthcare presents, new technologies involving both the social role of the profession and changes in the healthcare environment are considered promising complementary tools [2]. They can be used both to treat diseases and promote health (e.g. smokers [3] and the chronically ill [4]), as well as to help professionals with their practice and training (e.g. recognition of sepsis [5] and trauma screening [6]). Among these new technologies, virtual reality (VR)-a computer simulation where the physical presence of the user is projected in a virtual environment [7]-is gaining in popularity and becoming more accessible [8][9][10]. Its development has been rapid, showing great application to education and training [11,12]. Within the medical field, where its use is already widespread in procedures, diagnoses and professional training, virtual reality has great potential for expansion [13].
In the surgical context, the use of VR is highlighted in training for minimally invasive procedures, such as laparoscopy, through the use of different simulators on the market-MIST-VR, LapSim, laparoscopy VR and SINERGIA [14]. Specifically for this type of procedure, learning occurs empirically through trial and error until the technique is perfected, with the learning curve revolving around 65 procedures for laparoscopists [15]. In this way, the use of virtual reality simulators (VRS) provides safe, controlled environments, with reusable resources and with techniques that can be more easily measured when compared to practice in real models, reducing the learning curve [16].
Over the past decade, although some literature reviews have sought to analyze the use of VRS for the education and training of health professionals in a general surgical context, most do not address its use for laparoscopy specifically. In 2004, Aggarwal R. et al. [17] suggested the need to validate curricula to improve medical teaching in surgery. Willis R.E. et al. [18] noted that this technology would be more similar to a video game than to a training method.
On the other hand, most reviews that sought to analyze the VRS within the laparoscopy training field faced significant limitations [19][20][21][22]. According to Nagendran M. et al. [19], the use of VRS could decrease the surgical time of surgical trainees with laparoscopic experience-the articles analyzed showed approximate reductions of between 30-58%. Despite the promising results, this review faced many difficulties, as all trials analysed were classified as high risk of bias, only two compared VRS with a different training method, and one of them did not fully disclose the magnitude of results. Another review by Alaker M et al. [20] also studied the effects of VRS and showed encouraging results, though the bulk of its articles focused on comparisons between VRS and no training-groups, and it did not make considerations for the costs of modern VRS and how they compare to other training methods.
For these reasons, a new systematic literature review is needed, analysing the effects of the use of virtual reality simulations on the development of laparoscopic skills in medical students and physicians.

Objectives
Develop a systematic review of the literature to analyse the effects of using virtual reality simulations on the development of laparoscopic skills in medical students and physicians.

Methodology
This systematic review was carried out in accordance with the items of Preferred Reports for Systematic Reviews and Protocol Meta-Analysis (PRISMA-P) [23]. This study was registered by the Prospective Register of Systematic Reviews (PROSPERO, identification code CRD42020176479) before the research was carried out.
The elaboration of the scientific question was based on the PICO strategy [24], considering: medical professionals or students (patient or problem); use of virtual reality and physical model simulations (Intervention); there is no standard comparator to be considered in this study (Control or Comparison); all outcomes available in the literature were considered in the analysis (outcome).

Types of studies
The articles were selected based on their titles and abstracts according to the relevance of their data regardless of their publication status. Only clinical trials were considered.

Exclusion criteria
Studies will be excluded if: (1) they have heterogeneous populations in terms of academic degree; (2) do not use a standard assessment method for the entire duration of the study, or do not have pre-assessment; (3) use VRS or augmented reality simulators as the single evaluation method or in a control group; (4) are not related to the question in the review; (5) are in a language other than english, portuguese or spanish; (6) are incomplete, unpublished or inaccessible articles to authors.

Literature review
The survey was conducted on April 20, 2020, without language restrictions, in the Medline database (via PubMed) -www.pubmed.com, EMBASE -www.embase.com and Database of the National Institute of Health Using the search tool, we selected MeSH terms from the most relevant publications to conduct a new search, in order to obtain articles that could be included in this systematic review.
In addition, a manual search of theses, meetings, references, study records and contact with experts in the field was carried out.

Search strategy
The keywords were equally used in all databases, respecting their heterogeneities (for example, terms "Emtree" and terms "MeSH" were mapped in Embase and Medline, respectively).

Data extraction
The data for each study were extracted independently by three authors (JVT, VC and WAM). Disagreements were resolved by consensus. If no consensus was reached, a fourth author (AM) would be consulted. Data extraction was carried out using the Rayyan tool -https:// rayyan.qcri.org/ [25].
All studies were analyzed based on their titles and abstracts, according to inclusion and exclusion criteria. If the eligibility criteria were met, the full text would be extracted. All studies that were eligible for qualitative analysis were described in the "Results" section.
Missing data were clarified by contacting the authors directly.

Data validation
Three authors (JVT, VC and WAM) carried out the data validation through the discussion of the selected works. If no consensus was reached, a fourth author (AM) would be consulted. The risk of bias for intervention-type studies was analyzed using the guidelines of the Cochrane Back Review Group (CBRG) [26].
All selected studies were considered.

Statistical analysis
A descriptive synthesis will be produced with tables and figures and, if a number of studies with sufficient quality are available, a meta-analysis will be carried out with measures of heterogeneity and publication bias. The data will also be presented through forest-plots, according to their statistical relevance.

Responsibility/Author contributions
All the authors involved participated in the drafting of the systematic review project, by identifying key articles, selecting keywords and writing the review project. The first author (JVT) was responsible for coordinating the group, guiding the organization of the review project, searching for articles to be reviewed and writing the text. The second (VSC) and the third author (WAM) were also responsible for the review and writing of the project, as well as for the search for articles to be reviewed. The remaining authors, in turn, were responsible for guiding and evaluating the final text.

Research flow
The electronic search found 1904 results for the keywords used. After removing 375 duplicates, we considered 1529 potentially eligible studies. Of these, 1476 did not respect the inclusion criteria. After accessing the full text, three were excluded because they had a heterogeneous population, 24 who did not use a standard evaluation method or did not have a pre-evaluation, four because they used VRS or augmented reality simulator as a single evaluation method or in a control group, six for presenting inaccessible full text and nine for not complying with the inclusion criteria. Only seven studies were considered eligible for qualitative analysis and only one article was eligible for meta-analysis [ Fig 1].

Quality of evidence
After reading the articles included in the systematic review, the following factors were analysed to determine the level of evidence: study design and selection, detection, loss, reporting and information bias. The summary of the risk of bias analysis for each of the included articles was shown in Figs 2 and 3. Only 2 of 7 articles had a low risk of bias by the randomization process: Ahlberg G et al. and Palter VN et al. reported clearly that the allocation process was carried out randomly and blindly [27,28]. The other 5 were classified as having some concerns for not reporting this information.
Regarding the bias due to intended intervention (effect of adhering), 6 of the 7 articles presented low risk due to the non-applicability of this criterion to most signaling questions in this domain. Van Bruwaene S et al. was the only study that presented a high risk of bias. He reported preliminary considerations on the procedures in case of any technical failures in the implementation of the experiment. However, it cannot be said that the researchers took appropriate compensatory measures, since the establishment of a minimum training time alone does not guarantee the adequate implementation of such training for the groups [29].
In the context of intended intervention (effect of assignment) bias, no article has reached the low level of bias because the nature of a training study requires that participants and supervisors are aware of their interventions. In this context, four articles presented a high risk of bias, three due to the non-completion of all assignments by participants; and one for having an ineligible participant. The other articles were classified as some concerns because they did not show any flaws in the analysis of the participants.
All the articles analysed showed a low missing outcome data bias. It is worth mentioning that Diesen DL et al., despite being classified under low bias, had a significant loss of data, which is more likely related to the study methodology, and not to an exclusion of unfavorable results [30]. The other authors presented the data in full of almost all study participants, without compromising the quality of the information.
For measurement of outcome bias, four of the seven articles obtained low risk of bias because they had blinded observers for the intervention and use validated measurement

PLOS ONE
The development of laparoscopic skills using virtual reality simulations methods that are identical between groups and with high interrater reliability. Diesen DL et al. and Munz Y et al. were classified as some concerns because they do not measure the reliability of the observers [30,31]. Van Bruwaene S et al. had a high risk of bias due to its low interrater reliability, between 47 and 65% [29].
As for selection by the reported result bias, all studies had a low risk of bias with the exception of Ahlberg G et al., who had a high risk. This study differed from the others in that it presented different parameters in the post-intervention assessment compared to the preintervention. The data reported in the first evaluation included time, economy of movement and precision, whereas those in the second included only precision variables [27].
None of the seven articles analyzed presented a high risk of general bias, four of which were classified as some concerns, mainly due to the influence of the intended intervention (effect of assignment) bias and the lack of information about the randomization process.

Study characteristics
All included studies are complete, published and have no conflict of interest. Any doubts about the available data were supplemented by contacting the respective authors. The demographic profiles are shown in Table 1; the characteristics of the methodology of the experiments are shown in Table 2; the main changes, conclusions and results are available in Tables  3-6.
Collectively the studies elected a total of 156 participants, with 40 residents and 116 medical students. It is worth considering that Diesen DL et al. partially analysed its sample, reporting only a distribution of 18 of its 23 total participants [28] [30][31][32]. The remaining studies reported little or no experience in laparoscopic cholecystectomy through self-reported questionnaires or practical tests.
The training sessions in the studies lasted about 1 hour; and the time between assessments ranged from 1 week to 6 months, with Munz Y et al. and Torkington J et al. not reporting the period between assessments [31,33].  [30].
Of the 7 articles, 5 performed their assessments on PMS using laparoscopic cholecystectomy with in vivo models (3 swine and 2 humans), and Ganai S et al. only performed a telescope navigation assessment during the porcine procedure [32]. Only 2 studies carried out their assessments on PMS with non-living models, with Munz Y et al. using exercises in a water-filled glove that mimicked a gallbladder; and Torkington J et al. specific exercises in Box-trainer (BT) [33]. Regarding the different parameters analyzed, 4 of the 7 articles measured the total time of the procedures during the assessments, with an improvement in the average time of the VRS groups in relation to the control groups. The study with the most expressive results was that of Ahlberg G et al., who reported that the intervention group performed the surgery 58% faster when compared to the control group [27]. On the other hand, that of Van Bruwaene S et al. showed a difference of just over 5% between these groups, with a reduction of 21.3% within the VRS group and 17% for the control group [29].
The economy of movement was explored by 2 articles, which analyzed different variables. Ahlberg G et al. considered the distance covered in meters and angular movement of the instruments, whereas Torkington et al. assessed the number of movements per hand, hand speed and variation in hand distance during the experiment. In both studies, improvements in these criteria do not appear to be clear or significant [27,33].
In the precision parameters, the total number of errors was analyzed by 2 articles, both of which showed a significant improvement in the average of errors, especially in Ganai S et al., whose number of errors in the VRS group was less than half of the control group [32]. On the other hand, although Ahlberg G et al. searched for total errors only in the post-assessment, the drop rates for this parameter are similar [27].

PLOS ONE
The development of laparoscopic skills using virtual reality simulations on the instrument handling exercises, especially in needle transfer (p <0.0002) and in camera navigation (p <0.006) [28,30].
No study reported significant differences between groups in the pre-intervention assessment. In the post-intervention assessment, All seven of the assessments showed that VRS produced an improvement in the laparoscopic skills of the participants, although Munz Y et al. and Torkington J et al. have not found differences in performance between VRS or PMS training that would justify the superiority of one method over the other [31,33]. Van Bruwaene S et al. was the only one whose group with training in PMS obtained better results in all parameters in relation to the VRS group, also expressing the need for further studies to define the quality of teaching by VRS [29].

Discussion
VRS-based teaching is expanding in medicine due to the limitations of more traditional methods, especially for surgical procedures [34,35]. There are already several areas, such as neurosurgery [36], ophthalmic surgery [37] and digestive endoscopy [38], considering the implementation of this technology in their training curricula. In this scenario, systematic reviews were produced to analyse the relation between the use of VRS and the learning of surgical techniques, considering the expansion of this technology in medical teaching curricula [39][40][41][42][43]. The field of laparoscopic surgery also follows this trend, with conclusions that vary depending on the studies. In general terms, reviews conclude that VRS is a method with the potential

PLOS ONE
The development of laparoscopic skills using virtual reality simulations to develop varied surgical skills. However, most articles only compare VRS with untrained groups; whilst articles that use comparative groups with other methods show mixed results [19,20]. Despite recommending the incorporation of VRS in laparoscopic surgical training curricula, Alaker M et al. did not observe any statistically significant difference between VRS and BT groups regarding time and score. In addition, their comparison of 'virtual reality vs box and video trainers combined' did also fail to find statistically significant differences between VRS and these more traditional methods [20]. Another review, by Nagendram et al. found that operative performance was significantly better in the VRS group in comparison to the BT group, but this was limited to only two articles, of which, one did not fully disclose the magnitude of the difference or other quantitative results [19]. Thus, there is not enough evidence to justify the use of this new technology in place of more traditional training in PMS.
In this systematic review, 6 of the 7 articles analysed compared the performance between a VRS group and a control group, 3 of which also used a third group in PMS. All observed an improvement of the new technology in relation to the control, evidencing the VRS as a viable alternative for the teaching [27][28][29][31][32][33]. However, part of our results were similar to those of Alaker M et al. [20]: in all the articles we analysed that compare VRS with PMS, the use of technology has not shown significant gain [29][30][31]33].
One explanation for this outcome lies in the sample limitations of these studies, whose population includes medical students and residents. There is a significant difference between the level of surgical and clinical experience between these two population types and, in this sense, medical students may not fully benefit from the training they have undergone [29][30][31]33]. In addition, the sample size of these studies, between 18 to 30 participants [29,30,33], may not have been enough to achieve an adequate population representation. Thus, the results obtained in VRS may have been underestimated. Another factor that can contribute to better performance results in PMS when compared to VRS is the tool used to carry out the assessments. Of the 4 articles, 2 (Munz Y et al. And Torkington J et al.) performed their pre-and post-training measurements on box-trainers [31,33]. This can give an advantage to groups that trained in PMS over VRS groups due to the similar nature in training and assessment.
Due to the very nature of digital simulations, with software and engines that are constantly updating, it is difficult for the literature to follow and be able to produce studies that are consistent with the current situation of VRS. The studies analysed in this systematic review are

Comparison between assessments
Number  (2007) and Munz Y et al. (2004) also used this same simulator, but in even older and different versions [27,31]. Thus, there is a great temporal difference between the development of a VRS, its validation and the review of its results in the literature. Due to this constant updating of technology, the need to repeatedly carry out new studies to keep up to date with newer versions of VRS can bring high costs to the research team, since the equipment is usually expensive and the production of RCTs is time consuming. Speich B et al. compared the average production cost of RCTs between 2012 and 2016, concluding that, although the value may fluctuate depending on the scope of the project and study design, there was an average cost of USD 72,000.00 in preparing a survey in both times [44].
In this review, of the 7 articles analysed, 4 use VRS LapSim or LapMentor in their methodologies. Currently, the purchase price of this equipment is approximately USD 70,000.00 and USD 84,000.00, respectively, excluding USD 15,000.00 for additional modules (2018 versions) [45]. Therefore, the acquisition of only a single piece of equipment is enough to more than double the average costs of research [44], contributing to the scarcity of literature to compare this technology with PMS methods, since it is not attractive for a research group to constantly produce such expensive update articles.
In contrast to the high prices of VRS, traditional PMS, especially box-trainers, are much more accessible for acquisition and use in training healthcare professionals. A complete boxtrainer, including camera and equipment, costs between USD 1,000 and USD 6,000.00 [46], up to 84 times cheaper than current VRS. It should be noted that the VRS could save significant costs for the institution by reducing the need for proctors and replenishment of animal and synthetic material. Though they have other maintenance needs typical of any software/hardware, and some demand subscriptions or the continued purchase of new modules, effectively nullifying this effect. Nevertheless, there is still a huge disparity in costs that may justify the fear of investing in VRS for medical curricula. Especially considering the absence of solid evidence about the advantages of the new technology in relation to regular methods, which was verified by 4 of our 7 articles [29][30][31]33].

Study limitations and methodologies
The most common limitation reported in the studies analyzed was the small sample size [ Table 7], with an average of approximately 22.3 participants per study and 9.2 per group. Ergo, it may have decreased the statistical power of the studies and the results obtained may not be representative enough of the population. In addition, of the 7 studies, 4 used medical students and 2 used residents. Diesen DL et al. was the only one that used a mixed training sample [30]. Although not reported by the author, this population heterogeneity is a potential confounder factor in the study.
According to Van Bruwaene et al., The use of medical students in this type of training may not represent reliable results, given the lack of experience in relation to residents [29]. In this sense, residents would be a more suitable population for studies on surgical training. On the other hand, we could consider that students represent a more adequate sample due to their lesser degree of contact with laparoscopy and, therefore, would be less biased than residents, who are more exposed to practices outside the study. However, the current literature does not present significant information about the advantages of one group in relation to another.
We grouped the performance parameters of the participants into three main categories: time, economy of movement and precision. The time item was exposed quantitatively by 4 of the 7 articles [27,29,32,33], while the other categories were only addressed by 2 articles each [27,32,33]. Thus, despite comparing the performance of VRS groups with PMS groups, the analysed articles show great methodological differences between them. This made it difficult to compare the results for this review, making it impossible to develop a meta-analysis based on the articles.
Some studies have presented different methodological elements. Torkington et al. presented its data through the variation between the initial and final results, which would be more appropriate for a statistical analysis [33]. In addition, the use of validated and standardized scores could be a way to circumvent the heterogeneity of parameters, as was the case with Palter VN et al. when using OSATS [28]. Although Diesen DL et al. adopted a score in his study, it was personalized and has no widespread use in the literature [30].
Among the limitations of our systematic review, our results might have been influenced due to the use of different models and versions of VRS between studies, since the analysed articles date from 2001 to 2015. No more recent articles were found that were eligible during the screening of literature.
To better assess the use of VRS in the development of laparoscopic skills, we propose the use of RCTs that use homogeneous samples, submitted to equal assessments before and after training, which do not favour one group over the other, either by the nature of the simulator or the type of task. We suggest that they report in more detail the training process and time, as well as the participants' history of laparoscopic experience. The parameters evaluated in the assessments must be standardized so that it is possible to build a comparative analysis between the different studies. This can be done with scores validated by the literature, containing variables such as total time, number of errors and distances covered by the instruments. of articles in the literature, hindering the comparative analysis of the performance in VRS in relation to PMS, therefore, more research in this area is necessary.