REPLICCAR II Study: Data quality audit in the Paulista Cardiovascular Surgery Registry

The quality of data in electronic healthcare databases is a critical component when used for research and health practice. The aim of the present study was to assess the data quality in the Paulista Cardiovascular Surgery Registry II (REPLICCAR II) using two different audit methods, direct and indirect. The REPLICCAR II database contains data from 9 hospitals in São Paulo State with over 700 variables for 2229 surgical patients. The data collection was performed in REDCap platform using trained data managers to abstract information. We directly audited a random sample (n = 107) of the data collected after 6 months and indirectly audited the entire sample after 1 year of data collection. The indirect audit was performed using the data management tools in REDCap platform. We computed a modified Aggregate Data Quality Score (ADQ) previously reported by Salati et al. (2015). The agreement between data elements was good for categorical data (Cohen κ = 0.7, 95%CI = 0.59–0.83). For continuous data, the intraclass coefficient (ICC) for only 2 out of 15 continuous variables had an ICC < 0.9. In the indirect audit, 77% of the selected variables (n = 23) had a good ADQ score for completeness and accuracy. Data entry in the REPLICCAR II database proved to be satisfactory and showed competence and reliable data for research in cardiovascular surgery in Brazil.


Introduction
The very foundation of healthcare and clinical studies, as well as clinical trials and follow-up studies, is the quality of the data collected. Despite the lack of consensus regarding a standardized method to measure healthcare data quality, it is of utmost importance to establish the confidence and validity of the outcome. Hence, the research design, the variable selection and the data collection are pivotal points in asserting the accuracy of the conclusion achieved [1].
Observational studies are subject to bias, confounding and a lack of information in retrospective medical records. Publications such as Zhang et al., 2014, Salati et al., 2015, and Dreyer et al., 2016, were important initiatives on how to devise data validation tools aimed at enhancing the quality of the results needed for decision-making [2][3][4][5]. It is essencial that healthcare databases are reliable, as they will not only be used as the basis for future academic research, but also to evaluate and derive guidelines leading to the improvement of healthcare decisionmaking [6,[7][8][9][10].
For decades, cardiac surgeons have systematically collected and analyzed data to continually improve outcomes in healthcare [10]. The initiatives taken by the Society of Thoracic Surgeons (STS) and European Society of Thoracic Surgeons (ESTS) are aimed at collection of reliable data on a large scale in order to improve outcomes, especially regarding mortality. Adhering to a quality improvement initiative with data registries can already reduce mortality rates [11][12][13][14][15][16][17][18][19][20].
In the 2010 audit report for the STS Adult Cardiac Surgery Database (ACD), Brown and colleagues [21] emphasize the importance of data quality, which had previously not been taken into consideration. That same year, Grukemeier and Furnary [22] also addressed this curiously neglected problem. One initiative is the Paulista Registry for Cardiovascular Surgery (REPLIC-CAR II).
The the REPLICCAR II registry is a voluntary initiative with 9 participating centers located in Sao Paulo (Brazil) and is coordinated by the Instituto do Coração do Hospital das Clínicas da Faculdade de Medicina da USP (InCor-HCFMUSP). The main objective is to evaluate morbidity and mortality predictors in patients undergoing Coronary Artery Bypass Graft Surgery (CABG). The adoption of quality-oriented data analysis was the next step taken in REPLIC-CAR II to assure the validity of outcomes and enhance its clinical impact.
The aim of the present study is to present direct and indirect audit results of data quality in the REPLICCAR II database, after 6 months and 1 year.

Data source and collection
The project included 9 institutions in the State of São Paulo, thus combining the analysis of public and private reference hospitals linked to institutions such as philanthropic organizations and universities. Funding was provided by Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP).
Data collection and management used REDCap (Research Electronic Data Capture) hosted at University of São Paulo Medical School General Hospital (HCFMUSP) accessible from any computer with an Internet connection, with access restricted to selected researchers [23,24].
The criteria and definitions in REPLICCAR II were exactly the same as in the STS ACD (version 2.9, 2017) and includes more than 700 variables, such as general characteristics, risk factors, pre-, intra-, and postoperative assessments and their 30-day follow-up.
Data collection began in August 2017 and each participating center mobilized a task force headed by a supervisor, usually a medical resident. The participating centers received training and a codebook with the description and criteria of all study variables to collect from medical records after isolated CABG surgery.
Thirty variables (27 mandatory fileds and 3 additional variables) were selected for the direct and indirect data quality analysis. These variables were selected based on the principal variables reported in CABG surgery studies. Variables related the samples caracteristics, pre, intra and postoperative were collected. The complications are a combination of clinical outcomes after surgery, such as: renal failure, reoperation, atrial fibrillation, deep sternal wound infection, stroke, myocardial infarction, respiratory complications, prolonged use of mechanical ventilation, etc.

Direct audit
A direct audit was conducted 6 months after the initiation of data collection. Seven percent (n = 107 records) of the medical records for each surgical patient at each center through February 2018 were randomly selected with STATA 13.1 software (StataCorp, Texas, USA). Data were re-collected by two experienced auditors and 2 data managers (who also collect data routinally). Having full access to each center's data, the audit was performed under two conditions: (i) that data managers were blinded to the original record and (ii) each data manager would not re-collect the same cases they had originally input. The original and the re-collected data then underwent statistical analysis to check for accuracy in data collection.

Indirect audit
In the second phase of the study, a direct audit was impracticable due to the amount of data and the lack of financial and human resources. The 30 variables used in direct audit were used to compare data collected between 6 months and 12 months.
We evaluated all records inputed in REDCap (indirect audit) using the data management tools available in the platform [23,24]. To ensure the completeness of data elements in the REPLICCAR, all electronic data collection forms were programmed to alert the data manager of failure to fill a field (variable). Numeric fields were validated to prevent the entry of nonnumeric characters or numbers outside an acceptable range. Futhermore, REDCap has a module for assessing the quality of recorded data. This module has standard tools that quickly describe all missing values; fields with invalid characters; outliers values and incorrect values for calculated fields. In addition to the REDCap data quality standard tools, specific queries were defined to identify inconsistent values. The customization of specific queries for the REPLICCAR enabled construction of measures to screen for data inconsistencies. Table 1 shows some examples of REPLICCAR queries.

Statistical analysis
For the direct audit, the data was analyzed in STATA v.13.1 (StataCorp, Texas, USA).
The ICC varies between 0 and 1, with the former suggesting no agreement, and the latter suggesting perfect agreement. Values lower than 0.5 are indicative of poor reliability. Those between 0.5 and 0.75 were moderate, those between 0.75 and 0.9 good, and those higher than 0.9 were of excellent reliability [9].
For the indirect audit, we adapted the methodology suggested by Salati et al, implemented by the REPLICAR II responsible team, by assessing completeness (COM) of data, and the accuracy (ACC) of inconsistent or out of range answers for all data included in the study (n = 2229 medical registries).
In this fashion, as follows [4]: Negative ADQ score indicates that the observed ACC/COM is inferior to the sample average, whereas positive values show that the sample average is superior to the observed ACC/ COM.

Direct audit
A total of 107 random records for direct data audit analysis were collected in the initial 6 months of REPLICCAR II Study (7% of the total sample) and are summarized in Table 2. The sample average kappa (κ) was 0.70, with standard error of 0.06 (95%CI 0.59-0.83). We observed in this analysis mostly substantial kappa's coefficient (n = 4) to almost perfect (n = 3), with 2 variables presenting a moderate κ coefficient. Table 3 presents the ICC of numerical variables collected in the study in the direct audit. Preoperative hemoglobin had an average ICC = 0.7, but it later became clear that there were many different admissions in laboratory examinations, leading to data disagreement about this situation between data managers (inter rater agreement). Data for glycated hemoglobin, total bilirubin, and albumin levels were insufficient for the analysis, but these variables were not mandatory in the registry. The mandatory variables for the direct audit had on the overall satisfatory results with agreement up to 90%, demonstrating a good adherence of data managers to the study definitions and criteria. Table 4 describes the completeness and accuracy for all variables included in the current evaluation and the ADQ score. The variables with less than 90% of completeness (COM) and low ADQ score in preoperative period were: (i) total bilirubin (14%), (ii) total albumin (21.8%), (iii) HbA1c (41.7%), (iv) glucose (60.5%), and (v) ejection fraction (75.13%). In the postoperative period, there were only two variables in this condition: (i) ejection fraction (22.3%) and (ii) glucose (73.9%). The ADQ score for body mass index (BMI) was -4.7, because weight and height inconsistences. The values outside the expected range were notificated to the responsible data manager within query to revise the data ( Table 4). The remaining variables presented more than 90% completeness and accuracy. The variable of ejection fraction was absent in the hospital database and was not included by the data managers. However, this could account for exams given directly to a surgeon and lost in the REPLICCAR II database. Fig 1 shows that 77% of the records (n = 23) had an acceptable ADQ score, considering that the positive values had a larger ADQ score than the sample average, which can be considered of good data quality. The values under the first quartile were considered relevant for review. Considering this interpretation, we propose better criteria and definitions (Table 5) for some variables in the REDCap tool, including the BMI, with ranges for weight and height. During data input, investigators received an alert for each piece of data determined to be out of range, thus guaranteeing improvement of accuracy, data consistency and acceptable variability (queries).

Indirect audit
The other variables with low completeness are not mandatory but reflect our reality and highlight this opportunity to improve clinical evidence and quality protocols.

Discussion
In summary, the REPLICCAR II study had satisfactory concordance in the first stage, and the results of the indirect analysis were essential to develop methods of data confidence and quality improvement.
Lauricella et al, 2018 [7], published a data quality analysis on a similar initiative developing such a database. The São Paulo Lung Cancer Registry (PLCR), also developed by InCor, cannot be directly compared with REPLICCAR II, due to its different parameters. However, it is possible to analyze and compare some of these parameters, such as COM. With 511 analyzed records, 21 out of 105 variables (20%) had COM < 0.9%. In our study, 7 out of 30 variables (23.3%) showed the same results. The work by Salati et al, 2011 [1], showed that 5 out of 15 variables (33.3%) selected for the study had < 90% completeness.
In analyzing direct audit results, it's important to remember that ICC considers that close numerical values might be concordant, even if they are different. This has important implications in a clinical study, because different values within close range will show good ICC values. Considering that different researchers (or even the auditor) may collect information, such as exam values, from different dates for the same subject, these values may show good ICC in the statistical analysis [23]. Only 2 out of 15 variables (13.3%) had ICCs inferior to 0.9. Our lowest value was 0.7 ICC for preoperative hemoglobin. Lauricella et al. (2018) reported equivalent results found in 5 out of 12 numerical variables (41.7%). Their lowest ICC value was 0.51 for the "time from first symptom" variable. The comparison, however, cannot be applied directly to our groups, due to the completely different parameters in each work.
Grunkemeier and Furnary (2010) have commented on the methodology for direct audit in the STS ACD, published by Brown et al. (2010), with the main objective of determining the variability of disease etiology and operative data elements in the STS ACD, when abstracted by untrained physician abstractors. Their discussion focused on which method was the most  trustworthy in the direct audit, and argued that a more interesting analysis could be conducted by comparing the data managers with themselves [22]. In our study, the data managers evaluated other data managers to ensure comparable analysis. According to Shahian et al. (2010): "Determining optimum clinical care and developing evidence-based guidelines require the highest quality study data. As we enter an era of greater transparency and accountability, data accuracy has even more widespread implications". Direct audit increases costs and human resources required and in large databases such stragedy is not feasible. [25].
Regarding outcome variables in our study, mortality had 85% completeness and 92% accuracy in the first audit phase. Among the inconsistencies related to mortality, we verified that cases of intraoperative death were negligible for the variable death in the operating room. To rectify such inconsistency, we inserted in the REDCap platform queries that considers the cases of surgery without admission to intensive care unit at the immediate postoperative period as death on the day of surgery. The variables "Mortality in Hospital" or "30 Days Vital Status" had 96% completeness and 99.7% accuracy.
The STS (2008) reported similar results, where only 83% of cases had agreement related to patient status (alive or dead). Brown et al. (2010) reported a total agreement median of 93% (range 35-100%) in CABG procedures in the STS ACD. Patients with unknown or incomplete 30-day mortality status could potentially introduce bias into any analyses not adopting a statistically valid strategy to handle missing information. This is a concerning situation, considering that mortality is the most common outcome used for both quality indicators and research [21].
However, "in-hospital mortality" completeness was almost entirely recorded, representing the vast majority of 30-day deaths. The use of simulations suggests that any errors, committed by the unrealistic assumption that a patient with missing or unknown mortality status is alive, have negligible influence on hospital mortality results when compared with random sampling error. [15].
Newly proposed parameters, such as ADQ, may provide faster, more practical and lowercost analysis of generic data quality. Another evaluation with ADQ score to evaluate quality between institutions, as Salati et al. (2015) made, could then be used to orient the centers about the strengths and weaknesses of their respective variables, thereby helping them to improve data quality.
The STS ACD study adapted a conceptual framework of quality measurement with a comprehensive methodology for quality assessment, which strive every day to continually improve it, like a process of evolution and not a static product [25].

Limitations
1. Our work was limited due to financial and human resources. Thus, a direct audit or a more restricted follow-up of the centers was impracticable. Nonetheless, faced with such difficulty, our team is looking forward for new perspectives in data quality analysis, such as ADQ, thus contributing to the development of the area.
2. As expected from such a pioneering project, there were many challenges regarding the education of the professionals engaged in the collection and ensuring data quality, as shown by the unexpected discrepancies in our results. Considering solely the analysis made (κ coefficient and ICC), we cannot understand the causes of these errors. In addition, it was not possible to confirm if all cases meeting the criteria (primary procedure and isolated CABG) were included, we were unable to properly evaluate the centers' adherence.
3. We have found satisfactory concordance, but these results only show the capacity and understanding of the investigators to collect data in the first phase of study (6 months after beginning). Considering that most centers rely on medical residents for the data collection, we cannot ensure long-term adherence of each center because it is expected that a short to medium term rotation of these professionals will occur with the training having to be restarted. Consequently, the continuous quality analysis is imperative to keep COM/ACC (preferably above 98%).
4. This work shows that there is still a long way to go before we can develop a Brazilian national database comparable to the STS or the ESTS databases. It is still not possible to ensure that professionals, researchers, healthcare centers, and the government will adhere to the promotion and adoption of electronic registries long term. Nonetheless the development of a consensus for a broad database is growing.
5. Another limitation is related to the 30 days follow up. We haven't made an audit about the outcomes in this period or re-call patients to confirme data.
In summary, this work shows seriousness and commitment to REPLICCAR project, being concerned not only with the study development and implantation, but also with the quality of its data.

Conclusion
Completeness and accuracy of the information abstracted from medical records are essential to the validity and accuracy of the results obtained. Indirect auditing gave clear directions for data improvement, without the need to recollect a sample to evaluate concordance. Above all, it is important to maintain a scientific partnership between institutions with regular meetings, thereby closely integrating with working groups in each institution. Findings of discrepancy within the data only reinforce the need for quality-oriented statistical studies, because it directly influences validity, analysis and conclusions performed in research. In places where such studies and their application are still underdeveloped like Brazil, studies in this field become even more indispensable. Focusing on data quality is a sure factor that ultimately leads to a more efficient and safer healthcare system and will play an increasingly major role in its development. The main objective of the present study was to implement improvement actions in such a way that guarantees safety and validity to the results, as well as to provide feedback on REPLICCAR II itself. As an STS-based database, this project can provide the basis for a wider and more reliable quality-oriented program, with the prospect of a positive impact on clinical outcomes.
Our experience reinforces the importance of training, encouraging and standardizing the staff responsible for collecting and filling out the forms (data managers). In addition, correctly entering data substantially lowers the costs of direct audit with the traditional Raters Agreement Analysis. Indirect auditing was more practical in determining strategies for data quality improvement, but direct audit was essencial for evaluate outcomes definitions and improve education and training. ADQ scores consider the completeness and accuracy of each variable in the study and show the best data quality parameters in prospective observational studies. It is therefore expected that it will attract more attention in studies yet to come.