A code for clinical trials centralized monitoring, sharing open-science solutions to high-quality data

Monitoring of clinical trials is critical to the protection of human subjects and the conduct of high-quality research. Even though the adoption of risk-based monitoring (RBM) has been suggested for many years, the RBM approach has been less widespread than expected. Centralized monitoring is one of the RMB pillars, together with remote-site monitoring visits, reduced Source Data Verification (SDV) and Source Document Reviews (SDR). The COVID-19 pandemic promoted disruptions in the conduction of clinical trials, as on-site monitoring visits were adjourned. In this context, the transition to RBM by all actors involved in clinical trials has been encouraged. In order to ensure the highest quality of data within a COVID-19 clinical trial, a centralized monitoring tool alongside Case Report Forms (CRFs) and synchronous automated routines were developed at the clinical research platform, Fiocruz, Brazilian Ministry of Health. This paper describes how these tools were developed, their features, advantages, and limitations. The software codes, and the CRFs are available at the Fiocruz Data Repository for Research—Arca Dados, reaffirming Fiocruz’s commitment to Open Science practices.

This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.

Unfunded studies
Enter: The author(s) received no specific funding for this work.No -some restrictions will apply The COVID-19 pandemic has highlighted that clinical trials conducted in accordance with good clinical practices are needed to support evidence-based policies with high-quality data while protecting the safety, well-being, and rights of the participants.Monitoring of clinical trials is critical to the protection of human subjects and the conduct of highquality research.Thus, the COVID-19 pandemic produced a paradox: whilst the highquality data was most needed, the conduction of clinical trials on-site monitoring visits were adjourned [1] as self-isolation, travel bans, social distance and lockdowns were mandatory to prevent the spread of the virus.This was the essential drive to implement risk-based monitoring (RBM) in many clinical trials.
Even though the adoption of risk-based monitoring has been recommended by regulatory agencies since 2013 [2,3] and it is also suggested by the ICH guidelines in their last 2016 revision [4] the RBM approach has been less widespread than expected.The slow RBM adoption was due to the former complexity of implementing these practices, the need for new technologies, an incorrect assumption that RBM methodology data are less likely to satisfy regulators [5] and the lack of familiarity with RBM.
In short, risk-based monitoring comprises of remote-site monitoring visits and centralized monitoring; besides reduced Source Data Verification and Source Document Reviews.
Source Data Verification of all case report forms (CRF) (100% SDV) may avert systematic errors at the initial monitoring visits to clinical sites.All along the study, a complete verification of a fixed set of essential documents and data, such as informed consent, serious adverse events (SAE), drug accountability and primary outcomes measurements, may be needed to avert fraud, and ensure data integrity.Though, reducing the amount of source data verification (SDV) in favour of a more targeted approach, i.e., reduced SDV and SDR alongside the modern data-cleaning statistical approaches might be adequate to ensure the overall data quality [6], improving the efficiency, timelines, and costs of the clinical trials oversight process [7], including COVID trials [8].Remote-site monitoring with reduced SDV was a sine qua non condition during the pre-vaccine period of the pandemic [9].
Centralized monitoring consists of remotely assessing the quality of electronic data to identify data discrepancies through automated assessments.It encompasses data review and analysis performed remotely from the trial site to scrutinize the collected subjects' data to identify unusual patterns, protocol deviations or missing or invalid data [10].Electronic data capture systems that are secure and provide audit trails for tracking data manipulation, such as the electronic case report form (eCRF) REDCap [11], have made one of the pillars of the RBM, centralized monitoring, a feasible task.Still, the lack of availability of open-source software, tools, detailed guidance, and challenges in operationalization [12] have hindered centralized monitoring adoption.
Herein we share the code of a centralized monitoring tool alongside CRFs and synchronous automated routines that were developed at the clinical research platform,  Efficacy and safety evaluation of Favipiravir for treatment of COVID-19: an adaptive, multicentre, double-blind, randomised, placebo-controlled clinical trial.In brief, it was a clinical trial to repurpose favipiravir, an antiviral drug registered in Japan to treat influenza [13], as a treatment to reduce the rate of SARS-CoV-2 infection progression to severe disease.High-risk patients [14] presenting mild to moderate clinical profile of COVID-19 [15] meeting all the inclusion criteria were enrolled and treated for 10 days in this trial.Further study details are available at the Brazilian Clinical Trials Registry (RBR-85vh8fx), a primary WHO repository.

The databases
The REDCap [16] (version 12.0.28),an electronic data capture system available free of charge for non-profit research organizations, was used to this trial data management.Within the REDCap system, the study was divided into four projects, each one with its respective databases.Initial data were entered in a Recruitment/Inclusion project.It included the data from all recruited patients, their eligibility or recruitment failure and when applicable the randomization and the numbers of labels of the selected bottles (either placebo or test drug).The main project recorded the follow-up data of participants enrolled in the trial.These were the two main project databases.The third project managed the drug bottles' accountability.The last project was the participant´s diary with self-answered questionnaires linked to a reminder message (SMS) operator.

The tools
To implement a centralized monitoring routine, an application was developed using the R (version 4.0.This application accessed and analysed the project databases designed in REDCap [16]. The application is connected to REDCap using an application programming interface (API) token, which is only known to the REDCap administrator and the application developer.
The In addition to the R script, two scripts in PHP programming language were developed to carry out the automatic export and import of data between the specified projects.The aim of these PHP scripts was to avoid errors in the distribution of medication/placebo bottles and mitigate the risk of including patients in the non-assigned arm causing unbalanced randomization and biased results.
These two scripts in PHP used a combination of two REDCap features to perform a synchronous automated replication; an API that can be used to programmatically retrieve or modify data within REDCap and Data Entry Trigger.This is an advanced feature, its purpose is to execute actions by a remote website, such as making a call to the REDCap API, via HTTP Post request.
These scripts linked the Recruitment/Inclusion and the bottles accountability project selecting the numbers of the labels to be used according to the patient ID at the randomization list and blocking the future use at the drug accountability project.This project had a pre-filled database with all numbers assigned to each bottle available.
These values were used to feed the SQL fields on drugs bottle selections at Recruitment/Inclusion project.Additionally, these scripts automatically save the patient ID across the two other projects ensuring a perfect linkage.These two PHP scripts that perform synchronous automated replication had been used for other clinical trials management purposes such as scheduling patients visits.
These codes and the trial CRFs are available at the data-sharing platform ARCA Dados/Fiocruz (doi:10.35078/QCXI6N)upon request.

Results display
An actual dashboard linked to test databases can be accessed at https://www.shiny.fiocruz.br/teste/pce0121/.Herein a brief description of the parameters measured, their features and their applicability in clinical trial assessments are presented, alongside illustrative print screens.
The initial page of the application presents a brief tutorial on how to use the centralized monitoring and a description of each of the ten tabs: patient inclusion rates, recruitment failure, missing data, inclusion and exclusion criteria, adverse events, laboratorial abnormalities, data validation, follow-up visits according to the study schedule and queries.

Subjects' inclusion rates
Subjects' inclusion rates are the main indicator of trial performance.They allow for to optimisation of resource allocation and implement recruitment strategies if the rates are lower than expected.The patients' inclusion section (figure 1) informs the absolute numbers of subjects recruited and enrolled, the study duration (days), and a set of four performance indicators of the overall trial and trial sites: recruitment and inclusion enrolment rates (number of subjects recruited or included per month), the proportion of enrolled subjects (number of enrolled subjects vs. recruited subjects) and current inclusion enrolment rate (number of an actual subjects enrolled vs. the expected ones).
Additionally, the overall current inclusion enrolment rate and the trial performance indicators per trial site are presented in graphs.

Screening failure and queries evaluations
The screening failure is a critical trial performance issue that might imply financial losses,  The site engagement might be also appraised through queries evaluations including the trial sites' query response performance.A dashboard tab informs the mean time to respond or close a query as well as the frequency of queries per study site.Discrepancies in the number of queries between trial sites may indicate the need for training in protocol procedures.The frequency of queries per variable is also important evidence that the trial site staff should be clarified on how to fill out the CRFs to avoid systematic errors.

Follow-up visits according to the trial schedule
One of the most common protocol deviations is follow-up visits performed out of the time range defined in the protocols' study schedule.Adherence to the protocol can be measured by the percentage of follow-up visits around the due date range.The mean number of days outside the allowed time interval can also be described either per trial site or per subject's follow-up visit.As with every previous feature of this code, if a table line is selected, a new table is loaded underneath presenting the secondary ID of the subject (Figure 3).This allows the clinical research assistant (CRA) to open a query in the electronic data capture system promptly.Figure 3 shows the follow-up visit deviations either per trial site (left) or per subject's visit.

Inclusion and exclusion criteria, adverse events and laboratory results.
Including subjects outside the inclusion criteria age range is a protocol violation that must be avoided in order to ensure the safety of the trial subjects.Age and pregnancy are the sort of data that needs 100% SDV, however, this code adds an additional safety layer, allowing to check remotely the difference between the date of birth and the date of inclusion enrolment, as well as pregnancy.
Similarly, pharmacovigilance safety flags cannot be missed.If a result is selected, a table listing the subjects' secondary ID and individual results is shown.

Data Quality Checks: validation and completeness
The dates validation tabs cross-check the inform consent (IC) signing date and the AE and concomitant medicine report date, as these reports cannot be produced before the IC signing date.The REDCap CRF's fields to report laboratory and physical exam results were designed to show warnings when an unexpected result is inputted (internal check).
However, data entry is not blocked as these may be the real results.Discrepant values must be double-checked, though.The data validation feature allows a quick check of unlikely or out-of-range values for laboratory and physical exam results, such as blood pressure and weight.
Unchecked outcomes missing values may undermine sample size calculation reducing the power of results and utterly damaging the entire clinical trial result.Both, overall and per trial site, safety and efficacy outcomes completeness can be checked at a glance using the outcomes missing values tab.Alike, the absence of missing values in the subject´s inclusion and exclusion criteria fields might preserve the integrity of the trial's internal validity as it ensures that the study population is in accordance with the study protocol.

Discussion
While the GCP core principles remain the same since it was first issued in 1996, when clinical research was largely paper-based, the means to ensure that the safety and wellbeing of human subjects participating in clinical trials are protected and that the results of the clinical trials are credible have evolved.
To avoid interruptions of the most needed therapeutic trials for COVID-19 and assure the quality of data according to GCP and regulatory requirements, the industries contract research organizations (CRO) and regulatory agencies have been encouraged to transition to risk-based monitoring (RBM).Although RBM is not a response designed to mitigate the impact of COVID-19 on clinical trials, its adoption, which has been recommended for almost 10 years, was strongly needed in the context of the pandemic.
The centralized monitoring tool for the remote appraisal of electronic data was designed to address the most common protocol deviations, pharmacovigilance alert signs, study performance and data completeness through automated evaluations.A script (R software) accesses the clinical trial dataset to perform these evaluations.None of these breaks the blind or information about the outcomes.The data are presented in dashboards, but, they cannot be modified using this tool.To preserve the trial dataset integrity and audit logs for tracking data handling, all discrepancies identified by the script must be manually corrected through the electronic data capture system (REDCap) query module.
There are many approaches to evaluating the quality of a clinical trial dataset, and although they are complementary, using R script-based central monitoring has advantages.An R script assures the reproducibility of these assessments across actors and trials [17].
Cleaning the data before freezing the dataset for statistical analysis is a hard and timeconsuming process.Central monitoring does not replace this analysis step, but it accelerates it as the data quality evaluation is performed throughout the trial.Similarly, the number of monitoring on-site visits can also be decreased, which has a positive impact on trial costs.
These real-time data evaluations also enhance the subject's safety, as the AEs control is improved.The tool also facilitates the DSMB procedures, meetings presentations and reports.Additionally, online access can be granted to DSMB members allowing them to perform safety overviews anytime.The real-time trial site performance metrics also drive study coordination informed decisions, such as the need for new recruitment strategies, pieces of training or even interruption of trial sites.
This version of the code has limitations.The results are descriptive and, in large trials, further statistical tests may be needed.There is no restricted access per role within the trial, as for most clinical data software.The query assessments use data from a REDCap audit log, meaning that a routine to download this dataset is needed to update the site performance metrics dashboard.Still, the inclusion of trial sites alongside the trial requires adjustments to the code to reproduce the actual expected recruitment rate.
Finally, as the tool was developed using the REDCap data capture system and longformat datasets, additional work is needed to run the script in other formats or systems.
However, as it is an open source software, future improvements of this open-science solution through collaborations and further data sharing are expected [18].

Conclusion
Sharing data between researchers worldwide is an increasingly important aspect of Centralized monitoring is an additional data quality assurance layer that might be regarded as a mandatory quality standard in clinical trials in a near future.

Consent for publication
This manuscript does not contain any individualized trial subject's data.The confidentiality of the subjects' clinical records has been observed according to the Brazilian ethical regulations.
requirements.View published research articles from PLOS ONE for specific examples.
Funded studies Enter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• NO -Include this sentence at the end of your statement: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.• YES -Specify the role(s) played.• * typeset Competing Interests Use the instructions below to enter a competing interest statement for this submission.On behalf of all authors, disclose any competing interests that could be perceived to bias this work-acknowledging all financial support and any other relevant financial or nonfinancial competing interests.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate and that any funding sources listed in your Funding Information later in the submission form are also declared in your Financial Disclosure The authors declare no competing interests.Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation statement.View published research articles from PLOS ONE for specific examples.NO authors have competing interests Enter: The authors have declared that no competing interests exist.Authors with competing interests Enter competing interest details beginning with this statement: I have read the journal's policy and the authors of this manuscript have the following competing interests: [insert competing interests here] /A" if the submission does not require an ethics statement.General guidance is provided below.Consult the submission guidelines for detailed instructions.Make sure that all information entered here is included in the Methods section of the manuscript.NA Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Format for specific study types Human Subject Research (involving human participants and/or tissue) Give the name of the institutional review board or ethics committee that approved the study • Include the approval number and/or a statement indicating approval of this research • Indicate the form of consent obtained (written/oral) or the reason that consent was not obtained (e.g. the data were analyzed anonymously) • Animal Research (involving vertebrate animals, embryos or tissues) Provide the name of the Institutional Animal Care and Use Committee (IACUC) or other relevant ethics board that reviewed the study protocol, and indicate whether they approved this research or granted a formal waiver of ethical approval • Include an approval number if one was obtained • If the study involved non-human primates, add additional details about animal welfare and steps taken to ameliorate suffering • If anesthesia, euthanasia, or any kind of animal sacrifice is part of the study, include briefly which substances and/or methods were applied • Field Research Include the following details if this study involves the collection of plant, animal, or other materials from a natural setting: Field permit number • Name of the institution or relevant body that granted permission • Data Availability Authors are required to make all data underlying the findings described fully available, without restriction, and from the time of publication.PLOS allows rare exceptions to address legal and ethical concerns.See the PLOS Data Policy and FAQ for detailed information.

Fiocruz,
Brazilian Ministry of Health for a COVID-19 clinical trial.This centralized monitoring tool remotely reviews the quality of electronic clinical data to identify protocol deviations, pharmacovigilance alert signs, study performance metrics and data completeness through automated assessments.Using an R script to assure the reproducibility of this data quality assessments, the tool might enhance subjects' safety, as the control of AE is improved; accelerate data cleaning before statistical analysis; evaluate performance metrics of trial sites; and decrease the number of on-site monitoring visits, which has an impact on trial costs, timelines, and subjects' safety.Additionally, to contribute to data quality assurance, synchronous automated routines were also built in PHP programming language to automatically transfer data between the different databases.Thereby, core data entry is improved, ensuring 100 % of precision and consistency in critical data, such as randomization.These open-science solutions are powerful tools for data management, pharmacovigilance, and clinical teams involved in clinical trials.Together with other open science initiatives, these tools may help push toward the adoption of centralized monitoring and RBM.Centralized monitoring is a layer within data quality assurance that might be regarded as a mandatory quality standard in clinical trials in a near future.The objective of this paper is to describe and share these open science solutions.Fiocruz is reaffirming its commitment to Open-Science practices by encouraging the availability of data and information at each stage of the research process.The codes for these tools are available at ARCA Dados (doi:10.35078/QCXI6N),an important element of the Fiocruz-Brazilian Ministry of Health policy for the management and sharing of research data.
These open-science solutions for a COVID-19 clinical trial were developed at the clinical research platform, Fiocruz, Brazilian Ministry of Health.Fiocruz clinical research platform funds and provides support to clinical research that evaluates new technologies to the Brazilian National Health System (SUS).Support includes clinical research crosscutting activities, such as monitoring, data management, ethical and regulatory affairs, and pharmacovigilance.The platform was the coordinator of the trial:

2 )
statistical software to write the code and the Shiny R package, used to build webpages based on R codes.This provided web-based data interactive monitoring dashboards through a web server with the Shiny server (version 1.5.13.995).Both the Shiny package for R and the Shiny server are open-source software and are developed and supported by RStudio.(Rstudio Team (2020).Rstudio: Integrated Development for R. Rstudio, PBC, Boston, MA URL http://www.rstudio.com/).

Figure 1 .
Figure 1.Subjects' recruitment vs. inclusion enrolment rates overall (left) and per trial site.
the trial sites spend staff work hours and trial supplies running this activity.Moreover, recording the reasons for screening failure allows reporting of the study population according to the CONSORT diagram.It depicts the passage of subjects through the trial avoiding selection bias underreporting.The comparison of screening failure reasons between trial sites offers an opportunity to identify the need for additional training as these frequencies should be equally distributed among sites in a randomized clinical trial.Figure 2 presents the absolute frequency of failures per trial site, and a pie chart displays the relative frequency (left).Similarly, on the right, the reasons for screening failure are displayed.

Figure 2 .
Figure 2. Dashboard of screening failures per trial site (left) and their reasons.

Figure 3 .
Figure 3. Screenshot of follow-up visits evaluation per trial site (left) and per subject' visit.

Figure 4 .
Figure 4. AE dashboard: AE frequency and ID by causality.
addressing diseases and developing new therapeutic, diagnostic, and epidemiological approaches.Faced with the perspective of ensuring sustainability in Public Institutions, it is important to consider free methodologies and tools that support greater access and execution of quality research.These open science solutions are a complementary tool to clinical trials' data management and monitoring teams rather than an on-site monitoring substitute.Together with other open science initiatives, these powerful tools may help push towards the adoption of centralized monitoring and RBM.

Corporation 1. Introduction Statement of significance
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Powered by Editorial Manager® and ProduXion Manager® from Aries Systems The CRFs were designed to allow automatic statistical analysis.Whenever possible, open text fields were replaced by closed questions reducing misunderstandings and time to answer.These closed questions were presented in three different multiple-choice field types: checkboxes, to select multiple answers at the same time; dropdown lists; and radio buttons, to select a single answer within mutually exclusive options.Further on, these questions were analysed as categorical variables.Moreover, calculated fields were used to perform the automatic computation of two or more fields, such as age calculation or Body Mass Index.Text Boxes were designed with field validation to ensure quality.Branching logic guaranteed that conditional fields just pop up depending on the previous answer, for instance only females could have a positive pregnancy response.When relevant, some of these CRFs were developed according to CDISC (Clinical Data Interchange Standards Consortium) standards, including Concomitant Medications and Adverse Events forms.
The dashboard allows monitoring all adverse events (AE) aggregated as follows: frequency of AE per subject and per study trial site; frequency of AE per causality in each trial site; frequency of AE per intensity in each trial site; frequency of unexpected or expected AE according to the drug factsheet; frequency of serious adverse event (SAE) per site; ongoing SAE per trial site and actions taken after the SAE.As per the previous code's features, a table of the secondary ID, trial site, and start and end dates are provided for each of these AE's