Data linkage and computerised algorithmic coding to enhance individual clinical care for Aboriginal people living with chronic hepatitis B in the Northern Territory of Australia – Is it feasible?

Background Chronic hepatitis B (CHB) is endemic in the Aboriginal population of Australia’s Northern Territory (NT). However, many people’s hepatitis B virus (HBV) status remains unknown. Objective 1. To maximise the utility of existing HBV test and vaccination data in the NT by creating a linked dataset and computerised algorithmic coding. 2. To undertake rigorous quality assurance processes to establish feasibility of using the linked dataset and computerised algorithmic coding for individual care for people living with CHB. Methods Step 1: We used deterministic data linkage to merge information from three separate patient databases. HBV testing and vaccination data from 2008–2016 was linked and extracted for 19,314 people from 21 remote Aboriginal communities in the Top End of the NT. Step 2: A computerised algorithm was developed to allocate one of ten HBV codes to each individual. Step 3: A quality assurance process was undertaken by a clinician, using standardised processes, manually reviewing all three databases, for a subset of 5,293 Aboriginal people from five communities to check the accuracy of each allocated code. Results The process of data linking individuals was highly accurate at 99.9%. The quality assurance process detected an overall error rate of 17.7% on the HBV code generated by the computerised algorithm. Errors occurred in source documentation, primarily from the historical upload of paper-based records to electronic health records. An overall HBV prevalence of 2.6% in five communities was found, which included ten cases of CHB who were previously unaware of infection and not engaged in care. Conclusions Data linkage of individuals was highly accurate. Data quality issues and poor sensitivity in the codes produced by the computerised algorithm were uncovered in the quality assurance process. By systematically, manually reviewing all available data we were able to allocate a HBV status to 91% of the study population.


1.
A computerised algorithm was used to allocate a HBV code to each individual 2. Quality assurance was then conducted using manual interrogation of individual EHRs by a trained RAN with oversight of the project medical officer (GS) 3. Sources of error were identified and used to refine the algorithm further 4. Steps 1-3 were then repeated

Notes:
If HBsAg pos apply code 6 regardless of other data If Anti-HBs pos, Anti-HBc neg and NO vaccinations apply code 2 This process identified error and omission in the source dataset, some of which were able to be corrected for subsequent iterations. Several coding discrepancies were identified: Anti-HBc+ in presence of vaccinations Anti-HBs+ in absence of vaccination the issue of Anti-HBc results varying over time and which result is used to determine the serocode.
As a result, the number of computer-generated codes expanded to address coding discrepancies identified during the QA process.

Notes:
If HBsAg pos apply code 6 regardless of other data.

HBcAb, Code 11
There are individuals with discordant Anti-HBc results (earlier positive test results turning negative in later tests)

Code 10
There are individuals for whom there is a positive surface antibody result and less than 3 immunisations recorded or for whom the only data available is Anti-HBs pos.

Code 7
These are people with discrepant results: i.e. ones that do not make sense. A clinician will need to review and order new tests to sort them out on a case by case basis.
• Pos for all 3 of Anti-HBc, HBsAg and Anti-HBs • HB e antigen pos but negative for HBsAg • HB e antibody pos but negative for all 3 of Anti-HBc, HBsAg and Anti-HBs • HBsAg pos and Anti-HBc neg • HBsAg pos and Anti-HBs pos

Code 8
This is a left-over code from all the others. These people have results but not enough to be able to classify them into one of the other categories. On a case by case basis, extra tests will need to be ordered for them to get classified.

Coding Algorithm order
Each new line excludes records that have already been assigned a code Therefore, for example as all HBsAg pos people are coded as "6" in the first instance, they cannot be recoded as anything else so all other codes will not contain anyone who is HBsAg pos The exception to this is code 7 which contains discrepant results. Specific discrepant combinations (see above) and not excluding records not already coded Code 7 Code remaining missingCode 8  Discrepant core antibody Needs interpretation a HBsAg -hepatitis B surface antigen, if positive indicates active infection, b Anti-HBc -hepatitis B core antibody, if positive is a marker of immunity from infection (past or current), c Anti-HBsAbhepatitis B surface antibody, if positive (>10IU/ml) indicates immunity (from vaccination or infection), d Individuals with discrepant results that need a clinician to review in order to interpret, including scenario where hepatitis B "e" antigen/antibody is positive (markers of infection) and HBsAg negative Further refinements to the algorithm were made after the study was complete, when analysing all sources of potential error. Several limitations in the computerised algorithm were identified throughout the various iterations of both algorithm and clinical manual review audit process. In line with contemporary NT hepatitis B public health and vaccination guidelines the algorithm allocated 3 HBV containing vaccinations to the "fully vaccinated" code. We identified that such an approach failed to consider appropriate dose intervals for vaccination and that 62 individuals received vaccinations at intervals that were less that the minimum dosing intervals and were thus invalid (this had been incorporated into the version in Table 5). Additionally, the algorithm failed to distinguish individuals vaccinated according to the birth and infancy vaccination schedule and children and adults vaccinated later in life. These indiviuals require serology testing to ensure they have not become HBV infected during the period in which they were susceptible to HBV prior to vaccination. Given the natural history of CHB, that acquisition in infancy and childhood is more likely to lead to chronical infection, there would potentially be CHB infections in this cohort. In the manual review process this potential was considered (and adjusted for when calculating the error rate per code).
On further review and discussion with the designer of the algorithm, practically code 7 and code 11 could be amalgamated, reflected in refined algorithm in Table 5. Anti-HBc -hepatitis B core antibody, if positive is a marker of immunity from infection (past or current) c Anti-HBs -hepatitis B surface antibody, if positive (>10IU/ml) indicates immunity (from vaccination or infection) d Individuals with discrepant results that need a clinician to review in order to interpret, including previous Anti-HBc positive but most recent test result Anti-HBc neg Emphasis was placed on the requirement for clinical utility and accuracy of the coding was validated using the QA methodology due to implications in patient care (see Table 6).