Innovative Approaches to Clinical Data Management in Resource Limited Settings Using Open-Source Technologies

The primary objective of clinical datamanagement (CDM) is to provide high-quality, reliable data for reporting ran-domized controlled trials (RCTs) in linewith good clinical practice (GCP) require-ments. For treatment trials of neglectedtropical diseases (NTDs) in endemic coun-tries, CDM systems need to be efficientand affordable. Challenges include poorinfrastructure, license costs associated withGCP-compliant software, and limitedhuman resources to provide the requiredexpertise. We argue that high-qualityCDM for NTDs can be achieved and thatchallenges can be overcome through theuse of readily available open-access tools.The Drugs for Neglected Diseasesinitiative (DNDi) [1,2] sponsors severalRCTs of treatments for visceral leishman-iasis (VL) in eastern Africa, working withregional partners through the Leishmani-asis East Africa Platform (LEAP). Thesetrials are conducted according to GCPstandards to ensure internal and externalvalidity, with CDM conducted by theauthors at the Data Centre (DC) basedat DNDi Africa in Nairobi.Choice of proprietary or open-sourcetools is dependent largely on budget anduser experience, with tools used indepen-dently or in combination. We have beenworking to overcome two main challenges:(1) Telecommunication, in particularwith unstable internet connectivity.An ideal tool for CDM would allowimplementation with or without aninternet connection (online and offlinemodes respectively) and be configuredfor use across multiple remote sites.(2) Automated query generation andresolution to provide an audit trailensuring consistency, rigor, and limit-ed risk of human error.We describe our development of two newtools for CDM: an offline module forOpenClinicausersandaquerymanagementsystem (QMS) that allows automation andstandardization of query management withmultiple-user access.


Introduction
The primary objective of clinical data management (CDM) is to provide highquality, reliable data for reporting randomized controlled trials (RCTs) in line with good clinical practice (GCP) requirements. For treatment trials of neglected tropical diseases (NTDs) in endemic countries, CDM systems need to be efficient and affordable. Challenges include poor infrastructure, license costs associated with GCP-compliant software, and limited human resources to provide the required expertise. We argue that high-quality CDM for NTDs can be achieved and that challenges can be overcome through the use of readily available open-access tools.
The Drugs for Neglected Diseases initiative (DNDi) [1,2] sponsors several RCTs of treatments for visceral leishmaniasis (VL) in eastern Africa, working with regional partners through the Leishmaniasis East Africa Platform (LEAP). These trials are conducted according to GCP standards to ensure internal and external validity, with CDM conducted by the authors at the Data Centre (DC) based at DNDi Africa in Nairobi.
Choice of proprietary or open-source tools is dependent largely on budget and user experience, with tools used independently or in combination. We have been working to overcome two main challenges: (1) Telecommunication, in particular with unstable internet connectivity. An ideal tool for CDM would allow implementation with or without an internet connection (online and offline modes respectively) and be configured for use across multiple remote sites. (2) Automated query generation and resolution to provide an audit trail ensuring consistency, rigor, and limited risk of human error.
We describe our development of two new tools for CDM: an offline module for OpenClinica users and a query management system (QMS) that allows automation and standardization of query management with multiple-user access.

Use of Open-Source Tools for CDM in Remote Settings
A popular tool for RCTs is OpenClinica [3], a GCP-compliant database package used in our DC since 2009. OpenClinica Community Edition has many advantages for CDM but was only available to use in an online mode. Without a validated offline mode of working with OpenClinica, the system's ability to support multicenter trials is limited to paper data collection and centralized data entry. Offline electronic data capture (EDC) in line with GCP would lead to substantial reductions in time between data entry and database lock [4].
We developed a validated offline module for OpenClinica to meet GCP requirements and to be made freely available to the user community ( Figure 1, Table 1). Data synchronization was enabled by developing codes using python scripts (Table S1), coupled with dedicated technical support and user training. As a result, actual entry of data into the system in a remote setting is faster because the delays associated with poor connectivity are eliminated and the time taken for the data to reach the DC is shortened due to the fact that paper case report forms (CRFs) do not require physical transportation, site capacity has been strengthened, and the site staff feel engaged with the project beyond patient management through participation in CDM.
Our tool builds offline functionality into OpenClinica to be freely available to community users. This differs from the integration tool of Mi-Forms created by the Mi-Corporation [5]. Mi-Forms require installation of new external software.

A New Approach to Query Management
Because of the large volume of data generated in VL trials (.1000 fields per patient), comprehensive cross-checking and query identification in OpenClinica would cause severe operational delays even with strong, stable internet connectivity. We used OpenClinica for entry and basic consistency checks and developed the Query Management System Plus (QMSPlus) ( Figure 2 and Table 1). Following data entry validation, the system  Competing Interests: The authors have declared that no competing interests exist. * Email: romollo@dndi.org exports data into a statistical package with a program editor function that can save large amounts of code to run edit checks (for this component, we use STATA [6] based on a historical preference).There are three main modules in QMSPlus: (1) Query Assignment, which produces standardized query reports to be sent to different destinations depending on the nature of the query (site, trial, or data manager); (2) Query Resolution for data correction; and (3) Data Resolver, which produces programs to update the STATA database. These processes are repeated until a satisfactory response is achieved for all queries raised. Modules 1 and 3 are fully automated; module 2 is semiautomated in that data resolution values must be entered manually into the system on an ongoing basis as resolutions are received from the sites.  Advantages Advantages N Internet connectivity with high bandwidth not needed during data entry and only periodic connectivity needed for synchronization, so data entry process is much faster.
N More efficient because of reduction in data load on the main study database and multiple-user capacity.
N Development of an on-site data management capacity and improved motivation. N Automated, standardized query management, improving efficiency and reducing the risk of human error.
N Potential to simplify for adaption for local and national data capture. N Potential to adapt into open-access tool for wider use, e.g., local and national data capture with simple automated, standardized reporting output.

Disadvantages Disadvantages
N Need for remote data support to sites. N Set up of the trial-specific edit check programming is time consuming, which is common with any system. QMSPlus is a web-based Java application using PostgreSQL, allowing multiple users access to the database simultaneously. The data output from the QMSPlus (module 3) can also be exported to any other statistical analysis package with a program editor function to therefore be applicable for query management irrespective of preference of analysis software. With further development, QMSPlus could become an open-source tool if open-source statistical software such as the R package (Table 2) is used in place of STATA.

Discussion
We have developed two innovative tools to enhance the productivity and rigor of our CDM systems: the offline mode for OpenClinica and the QMSPlus. We have intentionally utilized open-source tools to demonstrate that high-quality, GCP-compliant CDM is possible in endemic countries. We demonstrate that it is possible to overcome connectivity challenges and move beyond a spreadsheet system of queries with its associated drawbacks such as lack of authentication and audit trail, single-user use, formatting constraints, and maintenance challenges [7]. We estimate that QMSPlus has substantially reduced query management turnaround time by approximately 60%. Previously, 100 queries would take at least a day to transcribe onto the data clarification forms (DCF) before sending to the sites. Now the same volume takes just 2-3 hours to review and electronically send to the sites for resolutions. In the future, it should be possible to replace STATA in the QMSPlus program, which requires purchase of a license, with an alternative open-source package in order for QMSPlus to be made available as an open-access tool.
Since resources are needed for training, open-source software is not necessarily cost-free, but this expense is justifiable when compared to the cost of proprietary tools for noncommercial research [8]. Within the DC, the skill sets available for the implementation of these innovative approaches are primarily in data management, programming, system validation, and statistics, with experience in CDM for clinical trials. Knowledge sharing could lead to jointly created validation instruments and sharing of best practices.
Further adaptations could enable local data capture of information relating to patient case load, response to treatment and safety reporting to Ministries of Health, and for pharmacovigilance studies. Simplified systems could facilitate standardized data collection, reporting of prevalence surveys, and mass treatment coverage, ideally using mobile devices. National control programs will need to capture, collate, and report vast amounts of data on disease prevalence and treatment coverage as scale-up of mass treatment distribution for NTDs gains momentum and monitoring of progress towards elimination goals is required [9].

Conclusion
Through the innovative use of opensource tools, knowledge sharing, and strengthening of research networks, highquality data management is possible for all those working towards improved control of NTDs [10] while adhering to the principles of GCP.