Creating a Powerful Platform to Explore Health in a Correctional Population: A Record Linkage Study

We used record linkage to create a data repository of health information of persons who were federally incarcerated in Ontario and Canada. We obtained records from 56,867 adults who were federally incarcerated between January 1, 1998 and December 31, 2011 from the Correctional Service of Canada; 15,248 records belonged to individuals residing in Ontario, Canada. We linked these records to the Registered Persons Database (RPDB) which contained records from 18,116,996 individuals eligible for health care in Ontario. Out of 56,867 OMS records, 22,844 (40.2%) were linked to the RPDB. Looking only at those incarcerated in Ontario, 98%, (14 953 of 15248) records were linked to RPDB. Most records of persons in Ontario-based facilities were linked deterministically. Linkage rates were lower for women, minority groups, and substance users. In conclusion, record linkage enabled the creation of a valuable data repository: there are no electronic medical records for correctional populations in Canada, making it more difficult to profile their health.


Introduction
An estimated 37,000 individuals are incarcerated in Canada on any given day and these individuals tend to be disproportionately burdened by poor health, including high rates of infectious disease, substance use, and mental illness [1][2][3][4][5][6]. Most data on the health of persons experiencing incarceration is based on cross-sectional surveys collected while in custody; there is little longitudinal research on the health of incarcerated persons outside of correctional facilities either prior to or post incarceration [7][8][9]. The one exception is mortality: several studies report a high risk of death immediately following custodial release [10,11].
There are increasing calls to enrich our understanding of the health of persons who experience incarceration [12][13][14][15]. Such individuals typically draw from the most vulnerable and marginalized groups, groups that are less likely to access health care. Incarceration may improve health, providing an opportunity to diagnose and manage illness in traditionally under-serviced populations. On the other hand, it has been argued that incarceration may engender illness through environments (e.g. close person to person contact) and behaviours (e.g. risky sexual activities) that may lead to health implications. In order to truly understand if and how incarceration impacts health, longitudinal data are needed. Currently, there are no existing Canadian datasets that capture the health of federally incarcerated persons pre and post custody.
Record linkage could be a cost-effective strategy to generate data on the health of persons experiencing incarceration. Linkage studies capitalize on data collected for other purposes and the indicators in the resultant repositories enable research that would not have been possible from any data source alone [16]. Linkage studies can increase capacity to conduct research in vulnerable populations and such an approach has been used to examine the mortality of persons incarcerated post-release [17].
We performed a record linkage study, joining together Canadian federal correctional data and administrative health data. The resultant database is powerful and unique: one of only a few population-based, longitudinal datasets to capture a wide range of health and correctional indicators from incarcerated individuals in the world [17]. The present study describes the record linkage and examines the characteristics of individuals whose correctional records were more likely to be matched to administrative health data as well as the characteristics of those linked deterministically (i.e. exact matches).

Database Descriptions
Correctional data were obtained from the Correctional Service of Canada (CSC), the federal government agency responsible for administering sentences of two years or more, for adults aged 18 years or older. We extracted correctional data from the Offender Management System (OMS), a computerized record system that tracks information from admission until warrant expiry. The OMS contains: demographic characteristics; alternative names (i.e. aliases); criminal history; sentencing information (e.g. length); and behaviour while incarcerated (e.g. violent acts). These data elements are completed by institutional personnel. The OMS also contains information on risk factors that could have contributed to incarceration (e.g. substance use history, criminal history) used to inform programming needs as prescribed by the risk-needresponsivity model [18]. This model posits that rehabilitation will be more successful if treatment matches criminal needs [19]. Risk factor data are collected by self-report using validated instruments (e.g. Drug Abuse Screening Test) [20] and an overall criminal risk rating is calculated by CSC staff based on response patterns. The OMS will have data on all persons under federal custody.
Health data were obtained from the Institute for Clinical Evaluative Sciences (ICES), an autonomous not-for profit agency that holds administrative health data (e.g. physician billings, emergency room visits) for services provided in Ontario, Canada. In the present study, we linked the OMS to the Registered Persons Database (RPDB), the master list of anyone who is, or who has ever been, eligible for health care in Ontario. The RPDB contains basic demographics-surname and first name, date of birth, sex, postal code-as well as a unique health card identifier, enabling linkage with other health utilization data. It is maintained by Ontario's Ministry of Health and Long Term Care. Because Canada has universal health care coverage, all Canadian citizens and permanent residents living in Ontario should be included in the RPDB [21].

Inclusion Criteria
Our OMS database contained records from 56,867 adults admitted to a federal correctional facility between January 1, 1998 and December 31, 2011. The database excluded those incarcerated before January 1, 1998; who served only a provincial or territorial sentence; or who had their record suspended (i.e. pardoned). The RPDB contained records from 18,116,996 individuals eligible for health care in Ontario between April 1, 1990 and July 2014, the latter date being when the linkage was performed.

Data Sharing Agreement
Prior to data linkage, two privacy impact assessments were conducted and a data sharing agreement was struck between ICES and CSC in July 2013. This legal agreement outlined the terms and conditions of data sharing, including security measures, approved uses of data, and destruction of information. Personal identifiers were available to three data covenantors for the purposes of record linkage and these individuals have approved access to such information by the Ontario Information and Privacy Commissioner. All data covenantors underwent a security background check by CSC. Once completed, covenantors were granted "reliability status", or ability to access the data. This study was approved by the institutional review board at Sunnybrook Health Sciences Centre, Toronto, Canada. We also received approval from the Research Ethics Boards of St. Michael's Hospital and the University of Toronto.

Database preparation
Aliases are common in correctional populations and failing to account for these may decrease the sensitivity of linkage [22]. To prepare the data for linkage, one record was added to the OMS database for each unique surname/alias. These records also included the first name, gender, date of birth, and postal code.

Data linkage
All linkage was performed using the Automatch software [23]. We used surname, given name, sex, date of birth, and residential postal code to link OMS records to the RPDB. These data elements are assumed to be accurate in both databases and contain no missing fields. We linked records using both deterministic and probabilistic approaches. Deterministic linkage requires perfect agreement on specified data fields; these data fields should be sufficiently unique so that only one person can be ascribed to the record. Probabilistic linkage is used when linking data fields may not be unique. In the face of such complexity, statistical probabilities, called linkage weights, are generated reflecting the likelihood two records are a true match [23].
In the present study, linkage occurred in five different passes: records linked on the most recent pass were removed from the linkage pool. The first pass used deterministic linkage, requiring an exact match on surname, given name, date of birth and sex. This combination of data elements produced a correct linkage rate of over 98% in another Canadian study [24]. The second through fifth passes used probabilistic linkage, requiring agreement on fewer data fields or strings of data (e.g. first three letters of a surname). Match requirements for each pass are found in Table 1. At each probabilistic linkage pass, linkage weights were generated by the software. Weights were based on the m and u probabilities which are the conditional probability that a field agrees, given the pair is a true match and the conditional probability a field agrees, given the pair is a true mismatch, respectively. A total linkage weight is calculated from these m and u probabilities [23]. Linked pairs with weights falling above pre-determined thresholds were considered automatic matches whereas pairs with weights falling below thresholds were considered non-matches. Linked pairs with weights falling within the threshold range were considered possible matches and were manually reviewed by two data covenantors. Postal codes were considered during the manual review. Rules were created a priori to determine if manually reviewed pairs would be accepted or rejected as matches and are listed in S1 File.
Because of the matching algorithm and decision rules, it was possible that multiple unique correctional records could be matched to the same health record on the same pass. When this occurred, we considered the true match to be the record with the highest linkage weight, i.e. the record that was statistically more likely to be a true match.

Study outcome
We examined the proportion of OMS records that were linked to the RPDB (the linkage rate). We acknowledge the percent of records linked is a proportion and not a true rate and use this terminology to be consistent with terminology used elsewhere [16,24]. We then subdivided linked individuals by probabilistic or deterministic linkage.

Indicators of interest
We examined linkage rates by demographic characteristics and criminal risk and needs as recorded in the OMS. Demographic characteristics were age, sex, race, marital status and number of aliases. Criminal needs were substance use and dependence history, measured using the Alcohol Dependence Scale (ADS) [25], the Problems Related to Drinking Scale (PRD), based on the Michigan Alcohol Screening Test (MAST) [26], and the Drug Abuse Screening Test (DAST) [20]. These scales have been validated in correctional populations [27,28]. Over the study period, scales were administered to individuals with suspected substance dependence (prior to 2009) or everyone admitted to the federal correctional facility (2009-onward). We also examined linkage rates by Static Factor Rating, henceforth referred to as criminal history risk score. Risk scores are designed to predict risk of re-offending post-release, based on unmodifiable factors, e.g. criminal history [29]. These measures were developed by CSC as part of the Offender Intake Assessment process and have been in use since 1994 [28,30].

Data analysis
We calculated the linkage rate, overall and by select characteristics of interest. The majority of analyses focused on a subsample of individuals admitted to a federal institution in Ontario (n = 15,248). Admission region is highly correlated with the province where an individual committed their crimes and were sentenced. Because we only have access to Ontario health data, this approach should increase the linkage accuracy.
We used logistic regression to calculate the odds of being a) linked compared with not linked and b) linked deterministically compared with probabilistically. These datasets were analyzed at the Institute for Clinical Evaluative Sciences (ICES).  3.6% (n = 549), 0.6% (n = 100), and <0.1% (n = 12) of all records were linked on the second through fifth passes, respectively.
The mean number of OMS records per unique person (i.e. aliases) was 2.56 (SD = 2.00) and 3.23 (SD = 2.83) for men and women incarcerated in Ontario. Aliases were less common among individuals incarcerated outside of Ontario: the average number of names was 2.20 (SD = 2.26) for men and 1.98 (SD = 2.15) for women. The maximum number of surnames was 53. Table 2 shows the distribution of matched records, by linkage rate, in Ontario. We found that women, minority populations, and individuals with more severe alcohol use were less likely to be linked than their respective comparator (column 6). We found that individuals with a greater number of aliases or a higher risk of reoffending were more likely to be linked. Table 2 (column 7) also presents the odds ratios for being matched deterministically compared with probabilistically. These odds did not vary significantly by age, marital status, or substance use histories. Those identifying as either black or 'other' races were less likely to be linked deterministically compared with those identifying as white. Having more aliases and having a higher criminal history risk rating was also associated with lower odds of deterministic linkage.

Discussion
We linked records from almost 15,000 Ontario-based, federally incarcerated individuals to a province-wide health registry. This translated to 98% linkage rate and most records (86%) were linked deterministically. Our observed deterministic match rate is comparable to rates reported in other Canadian studies using the same data fields [24].
While the high linkage rate for Ontario-based persons is reassuring, we found that women, minority populations, and substance users were less likely to be linked to health records. Variations in linkage by participant characteristics are not surprising and have been reported elsewhere [31]. Reasons for lower linkage and the consequent underrepresentation of certain groups in the final database are not clear. A possible explanation for the lower linkage of women may relate to surname changes arising from marriage or divorce. We included a surname record for all documented aliases, but if different databases did not have the same surname on file, records could not have been linked. A possible explanation for the lower linkage rate of Indigenous Canadians may relate to jurisdictional differences in health care delivery. In Canada, the federal government is responsible for the delivery of some health care to Status Indians, including primary and emergent care on reserve. Although Status Indians are eligible for provincial health care, they may or may not be registered, i.e. in the RPDB. We acknowledge that systematic exclusions of sub-populations may bias estimates generated from these data [32][33][34][35]. However, our relatively large sample size means that we are more likely to conclude minor differences between groups are statistically significant, raising the question of how different truly are the 2% of men and 3% of women who were not linked? This will require attention as we move to using these data in future research. We also found that certain groups were more likely to be linked, including those with a higher static risk rating. Although initially surprising, indicators of past criminal history contribute to the overall static risk rating and being involved with the criminal justice system could itself facilitate contact with health care. Individuals are routinely seen by a physician within a period of days to weeks of admission to a provincial correctional facility. Institutional staff will request health card numbers for anyone who is in provincial custody for long enough to be seen by a physician who does not already have a health card number. Accordingly, individuals with a higher risk rating may be more likely to be in the RPDB because of their criminal history.
We also observed a positive relationship between the number of aliases and data linkage. Aliases are ubiquitous in correctional populations: women incarcerated in British Columbia reported an average of 2.7 surnames in a 2005 study [36]. Similarly high numbers of aliases have been reported in incarcerated men and are in line with what we report herein [37]. Intuitively, it makes sense that those with more names have a higher likelihood of being linked to health data. The quandary lies in how to manage these individuals in linkage studies; specifically, how does one maximize rejection of false matches (i.e. specificity) and acceptance of true matches (i.e. sensitivity)? Some researchers have excluded individuals with aliases in excess of a specified threshold [36], although such an approach may decrease the linkage sensitivity and, as a result, misestimate the burden of health outcomes in incarcerated persons [22]. Future linkage studies in correctional populations should continue to explore the optimal approach to including those with several names.
We also compared records linked deterministically to those linked probabilistically. On most indicators, there were no systematic variations by linkage method. Because there is less certainty if probabilistically linked data are true matches, a conservative approach may be to remove these individuals from the linked dataset. Future studies using these data should conduct sensitivity analyses to fully understand the implications of including or excluding probabilistically linked records.
Our correctional data repository will enable the generation of much-needed knowledge of health pre and post incarceration. Because the OMS records were linked to a registry of all individuals eligible for health care in Ontario, this repository can be used to compare the health of incarcerated persons to the general population. We can also use these data to examine variations in health within specific sub-populations of interest (e.g. Aboriginal). In regards to the latter, the OMS contains a variety of standardized instruments that would not ordinarily be captured with administrative data (e.g., confounders, mediators). [16] These powerful data have the potential to contribute substantially to the field of health research in incarcerated persons.
The limitations of this study should be considered. The OMS contains records for all Canadians admitted to federal custody between 1998 and 2011, yet we only had access to Ontario health data. We have no way of knowing the true denominator for federal inmates eligible for health care in Ontario from these data. Because region of admission is often determined by the region the criminal offence was committed and where sentencing occurred, we restricted the majority of analyses to those admitted to an Ontario facility. The assumption being if region of crime and region of residence are interchangeable, those admitted to an Ontario facility should be in the RPDB. It is unclear if the 295 unlinked Ontario-based records were true missed linkages or if they belonged to people who resided outside of Ontario and could not have been linked. On a related note, 20% of inmates admitted to a federal facility outside of Ontario were linked to the RPDB, suggesting some had residence in Ontario over their lifecourse, and perhaps prior to incarceration.

Conclusion
Through data linkage, we created a powerful platform to enable future research on the health of Canadian individuals who were incarcerated, pre-and post-incarceration. Our repository of almost 15,000 individuals federally incarcerated in Ontario includes demographic measures and validated instruments on substance use. Linkage to the RPDB, the registry of persons eligible for health care in Ontario, will enable the exploration of the health utilization and status of individuals using a variety of different indicators derived from physician billings, hospital separations, and emergent care databases. Although linkage rates in Ontario were high (98%), some populations were proportionately under-represented, like women and minorities.