The United Kingdom National Neonatal Research Database: A validation study.

BACKGROUND
The National Neonatal Research Database (NNRD) is a rich repository of pre-defined clinical data extracted at regular intervals from point-of-care, clinician-entered electronic patient records on all admissions to National Health Service neonatal units in England, Wales, and Scotland. We describe population coverage for England and assess data completeness and accuracy.


METHODS
We determined population coverage of the NNRD in 2008-2014 through comparison with data on live births in England from the Office for National Statistics. We determined the completeness of seven data items on the NNRD. We assessed the accuracy of 44 data items (16 patient characteristics, 17 processes, 11 clinical outcomes) for infants enrolled in the multi-centre randomised controlled trial, Probiotics in Preterm Study (PiPs). We compared NNRD to PiPs data, the gold standard, and calculated discordancy rates using predefined criteria, and sensitivity, specificity and positive predictive values (PPV) of binary outcomes.


RESULTS
The NNRD holds complete population data for England for infants born alive from 25+0 to 31+6 (completed weeks) of gestation; and 70% and 90% for those born at 23 and 24 weeks respectively. Completeness of patient characteristics was over 90%. Data were linked for 2257 episodes of care received by 1258 of the 1310 babies recruited to PiPs. Discordancy rates were <5% for 13/16 patient characteristics (exceptions: mode of delivery 8.7%; maternal ethnicity 10.2%, Lower layer Super Output Area 16.5%); <5% for 9/16 processes (exceptions: medical treatment for Patent ductus arteriosus 6.1%, high-dependency days 10.2%, central line days 11.2%, type of first milk 22.3%; and during first 14 days, summary of types of milk 13.8%; number of days of antibiotics 9.0%; whether antacid given 5.1%); and <5% for 10/11 clinical outcomes (exception: Bronchopulmonary dysplasia, defined as oxygen dependency at 36 weeks postmenstrual age 3.3%). The specificity of NNRD data was >85% for all outcomes; sensitivity ranged from 50-100%; PPV ranged from 58.8 (95% CI 40.8-75.4%) for porencephalic cyst to 99.7 (95% CI 99.2, 99.9%) for survival to discharge.


CONCLUSIONS
The completeness and quality of data held in the NNRD is high, providing assurance in relation to use for multiple purposes, including national audit, health service evaluations, quality improvement, and research.

Introduction Accurate data describing interventions and outcomes from well-defined populations are important for monitoring and planning healthcare while also offering opportunities for national and international benchmarking and a platform for clinical research. Worldwide there is a paucity of population based neonatal data [1]. In the United Kingdom healthcare is provided for all through the National Health Service (NHS) funded directly from taxation, and offers opportunity for complete capture of population data.
Neonatal care is a nationally commissioned specialised service delivered through networks of hospitals. The implementation of routine electronic data capture across all networks provides a unique opportunity to acquire population based data without additional data collection systems. Data on newborn infants receiving hospital care whether it be on the neonatal unit, postnatal or transitional care ward are captured on electronic patient records (EPR) held on a web-based platform, BadgerNet, managed by an approved NHS supplier, Clevermed Ltd (Level 6, Edinburgh Quay, 133 Fountainbridge, Edinburgh, EH3 9QG, www.clevermed.com). An extract of over 400 items for each baby forms The Neonatal Dataset (NDS) [2] approved in 2013 as a national NHS Information Standard by the NHS Information Standards Board (now NHS Digital) (ISB1595 version 1.0; now Standardisation Committee for Care Information (SCCI) 1595) [3]. An increasing number of hospitals currently including all of those in England, Wales and Scotland are members of the UK Neonatal Collaborative (UKNC) and the necessary regulatory approvals are in place for the data from each of those hospitals to be transferred quarterly from the BadgerNet platform to the Neonatal Data Analysis Unit (NDAU), an independent research unit of Imperial College London set up in 2007. The NDAU has approvals to use these data to create the National Neonatal Research Database (NNRD). In 2012, the UK Neonatal Collaborative (UKNC) was formed, consisting of all NHS neonatal units contributing data to the NNRD. This database, now includes details of 100,000 infants admitted to neonatal care each year; it has provided data for a wide range of NHS reports and research studies published in Peer reviewed journals [4][5][6][7][8].
Neonatal databases have been established in many countries; the NNRD differs from most by being compiled from EPR with no extra data collection. It is one of the largest clinical databases and holds the largest range of patient characteristics [1]. Data completeness and accuracy are also important considerations, yet formal quality assurance of databases is rarely reported and probably rarely undertaken [9]. In 2010, funding was secured by NDAU from the National Institute of Health Research (NIHR) Medicines for Neonates programme to explore the potential of the NNRD to facilitate research. In this study, we aimed to evaluate the population coverage and data quality of the NNRD data for English hospitals.

Ethics approval
The National Neonatal Research Database has Research Ethics Approval (London Queen Square Research Ethics Committee Reference number 16/LO/1930).

Methods
We compared data held in the NNRD with independently collected data from the Office for National Statistics [10] and the Probiotics in Preterm babies Study (PiPS) [11]. The latter was a multi-centre, double blind, placebo-controlled, randomised trial funded by the Health Technology Assessment programme of the UK National Institute for Health Research. Twenty four centres in south-East England recruited patients between July 2010 and July 2013. PiPS trial data were collected using conventional paper Clinical Record Forms (CRF), subjected to a standard series of range, logic and missing data checks and double entered onto a dedicated trial database fulfilling standards of ICH-GCP at the Clinical Trials Unit at the National Perinatal Epidemiology Unit, University of Oxford.

Data flows
EPR records are held by Clevermed Ltd and stored on a secure NHS server from which individual neonatal units access their data. The NDAU obtained approvals from the Caldicott Guardians of the NHS Trust of each contributing neonatal unit, to receive a predefined data extract (the Neonatal Data Set) from the EPR of each infant admission. Clevermed Ltd transmits these data to the NDAU, where the NNRD is formed. Data are 'cleaned' by applying completeness, logic and range checks.
Neonatal services are arranged so that babies move between hospitals according to their clinical need, thus the in-patient period between birth and the first discharge home (or death) may include several episodes in different hospitals. For each infant, to create the NNRD, a single record is compiled by linking the episodes of care across different neonatal units using a unique identifier created by Clevermed, the BadgerID.
The NNRD is held on the NHS servers of Chelsea & Westminster NHS Foundation Trust and updated quarterly using MS SQL and SAS programming to include updated patient records from the previous time period. To-date, the NNRD contains data from the year 2006 on over 800,000 infants admitted to NHS neonatal units, and over 10 million care-days.

The neonatal dataset
Infants are identified by their unique BadgerID; no patient identifiers (NHS numbers or names) are stored in the NNRD. Age in minutes from birth, and month and year of birth are stored instead of exact dates. An episode of care is defined as a continuous admission in the same neonatal unit. An infant can have multiple episodes of care e.g. if an infant was transferred from hospital A to hospital B, there are two episodes and back to hospital A would be three.
There are three different types of data: demographic details (e.g date and place of birth, birth weight) entered only once for all infants; episodic items (e.g. blood culture, clinical outcomes and diagnoses) which may be entered during each episode of care; and 'daily' items that include level of care (special/high dependency/intensive), which is categorised from raw data by embedded programming following data entry, and clinical interventions (e.g. respiratory support, type of feeds, surgical procedures, high cost drugs). Daily location and whether the infant's mother is resident and providing care are items, required to distinguish between infants cared for on a neonatal unit, postnatal ward or transitional care ward; these data are required for the categorisation of level of care. Diagnoses include fixed choice and some freetext items. Each data item is clearly defined in an accompanying meta-data set, and mapped to existing national standards and ICD codes; conversion to the international medical nomenclature, Snomed CT terminology, is underway [12]. The NDAU is the data guardian; the data controller is Chelsea & Westminster Hospital NHS Foundation Trust. The NDS was approved after data harmonisation was undertaken across NHS datasets assisted by the NHS Data Dictionary team. This included a public consultation to obtain views on included data items; a process undertaken annually to revise the NDS to reflect current practice.

Population coverage and data quality
We determined the proportion of neonatal units in England contributing data to the NNRD, and the proportion of infants born in England with an NNRD record (by gestational week) in 2008-2014. We obtained denominator figures for the annual number of neonatal units in England and live births from the National Neonatal Audit Programme [13] and Office for National Statistics, respectively [10]. To examine NNRD data completeness, we calculated the percentage of missing data for seven items applicable to all infants (gestational age (GA), sex, birth-weight, antenatal steroids, mode of delivery, multiple birth and survival to discharge from neonatal care). In addition, for antenatal steroids and mode of delivery, we performed a subgroup analysis to determine whether completeness was higher among infants born <32 weeks GA, compared to all GA.
To assess data accuracy at patient level, we performed data linkage between the NNRD and PiPs trial database and compared the agreement between 44 pre-specified items present on both databases (16 patient characteristics, 17 processes, 11 clinical outcomes).
Levels of agreement with criteria for minor and major discordancy were predefined for all 44 items by two of the authors (CB & KC) (S1-S4 Tables). For instance, for a binary item such as whether or not an infant had surgery for a PDA, any difference was considered as a major discordancy whereas for an item such as number of days that a central venous line had been in place a tolerance of +/-2 days was deemed acceptable, +/-3-4 days as a minor and +/-5 or more days as a major discordancy. The 16 patient characteristics were expected date of delivery (EDD), GA, month of birth, year of birth, birth-weight, sex, five minute APGAR score, born in this hospital, singleton or multiple birth, birth order, maternal year of birth, maternal ethnicity, any antenatal corticosteroids given, caesarean or vaginal delivery, instrumental delivery and maternal lower layer super output area (LSOA) (derived from maternal postcode) [14]. In England, the smallest geographical area of practical use, that is, the level at which most national datasets are collected, is the LSOA [15]. These areas are revised after each decennial census to ensure that they contain around 1500 inhabitants. LSOA can be linked to the Index of Multiple Deprivations (IMD) 2010, which reports continuous scores for seven domains of deprivation, for each LSOA in England and Wales [16]. The 17 processes were intensive care days, high dependency care days, central venous line days, length of stay, transfer to another hospital, discharge month, discharge year, surgery for patent ductus arteriosus (PDA), medical treatment for PDA, retinopathy of prematurity (ROP) treatment by laser or cryotherapy, day of first milk, type of first milk, summary of types of milk in the first 14 days, any antibiotics given and number of days of antibiotics in the first 14 days, any antacid given and number of days of antacid given in the first 14 days. The 11 clinical outcomes were worst stage of ROP in any eye, bronchopulmonary dysplasia (defined as supplementary oxygen at 36 weeks postmenstrual age (PMA)), mechanical respiratory support at 36 weeks PMA, any diagnosis of perforated necrotising enterocolitis, any gastrointestinal perforation, any abdominal surgery for NEC, haemorrhagic parenchymal infarct, hydrocephalus, periventricular leucomalacia, porencephalic cyst, and survival to discharge from neonatal care. Analyses were conducted at the level of the episode in each hospital and at infant-level for each infant's total hospitalisation. The PiPS trial captured EDD, from which GA was calculated, and feeding data for the first 14 postnatal days. Therefore GA discordancy was only calculated for infants with EDD in both databases; 'first feed' discordancy was only calculated for infants fed within the first 14 days with no missing data prior to the first reported feed in the NNRD.

Statistical methods
We calculated the percentage of missing data for each item in the PiPs database and the NNRD and minor and major discordances using the predefined criteria. To explore variation of completeness of data by centre we presented the proportion of missing data for incomplete variables using centre-specific box-plots. Infants with missing data were excluded from comparisons. For continuous items, we calculated mean and median differences and 95% limits of agreement for the differences. For binary items, we calculated the percentage of infants for whom the NNRD and PiPS trial data differed with the 95% confidence interval. For items with discordancy rates of less than 5%, we used the Poisson approximation to the Binomial to calculate confidence intervals; otherwise we used the Agresti and Coull method for Binomial confidence intervals as this method has better coverage properties [17]. In addition, for binary processes and outcomes, we calculated sensitivity and specificity, treating PiPs data as the gold standard. We report the prevalence of outcomes in both databases. Sensitivity = number of infants with the disease that are correctly identified by the NNRD/number of infants with the disease identified in the PiPS data. Specificity = the number of infants without the disease correctly identified by the NNRD/number of infants without the disease. PPV = number of infants with the disease that are correctly identified by the NNRD /number of infants (correctly or incorrectly) identified by the NNRD as having the disease. NPV = number of infants without the disease correctly identified by the NNRD/ number of infants (correctly or incorrectly) identified by the NNRD as not having the disease (Fig 2). Analyses were performed using computer codes in SAS version 9.3 and STATA version 11.
Regulatory approvals. National Research Ethics Service approval was granted in 2010 to establish the NNRD (10/H0803/151). Caldicott Guardian and Lead Neonatal Clinician approval from every NHS Trust are also held. A Parent Information Leaflet offers parents the opportunity to opt-out, although to date this has not occurred. The Research Ethics Committee advised that the NNRD PiPS data comparison was a data quality assurance study and did not require research ethics approval; a data sharing agreement for this study was agreed between the National Perinatal Epidemiology Unit where the PiPs database was held, and NDAU [11].

NNRD population coverage and data completeness
The proportion of neonatal units in England contributing to the NNRD rose from 78% in 2008 to 100% (163) in 2012 (Fig 3). Closures and mergers resulted in fluctuations in the number of neonatal units over the years. The percentage of live births with an NNRD record increased over the years for all gestational ages, and has been fairly constant since 2012 (Fig 4). Between 2012 and 2014, almost 100% of infants born in England at a GA of 25-31 weeks had an NNRD record; the figures for live born infants at 23w and 24w GA were lower at 70% and over 90%, respectively. The percentage of infants with an NNRD record diminish with increasing GA; 98% for infants 32-33w; 90% 34w; 60% 35w, 40% 36w and 20% 37w. However, over time there has been an increase in the proportion of more mature live births born !32 weeks GA with a NNRD record (Fig 4).  Table 1 shows the completeness of seven data items for 568,143 infants admitted over 2008-2015. At national level, well completed data items (<1% missing) in the NNRD include GA, sex, multiple birth and birth-weight. Compared to all GA, the subgroup of infants born <32w GA had less missing data for antenatal corticosteroids (4% <32w; 6.7% all GA), and mode of delivery s (6.7% <32w GA; 20.4% all GA).

Assessment of data accuracy
Data for 1,310 infants recruited into the PiPs trial recruited in the South East of England over 37 months from July 2010 were available for analysis. Clevermed was able to provide Badger ID for 1,280 (98%) infants. We further excluded 22 infants who had episodes missing from the NNRD database because they occurred in a paediatric ward which does not use BadgerNet, inaccuracies in admission and discharge dates, or inconsistencies in the names of hospitals and NHS Trusts (which were stored as free text on the PiPs database compared to drop down menu on BadgerNet). In total, we excluded 103 episodes of care and 52 infants, leaving a final dataset of 2257 episodes of care from 1258 infants for comparison (Fig 5). Infants with missing data are excluded from calculations of discordancy.  For baseline characteristics, proportion of missing data was higher on the NNRD compared to PiPs database, and >4% missing for EDD, Apgar at 5 minutes, maternal ethnicity, maternal LSOA, and mode of delivery (Table 2). Box-plots show variation in data completeness across 24 PiPS recruiting units for these five variables (Fig 6).

Antenatal corticosteroids
There was no major discordancy for month and year of birth; <1% discordancy for birth weight, sex, APGAR Score at 5 minutes, birth order; 1-3% discordancy for whether born in this hospital, singleton/multiple, maternal year of birth, antenatal steroids and instrumental delivery.
For processes/interventions, compared at episodic or infant-level, major discordances (difference of !5 days), were highest for days of high dependency care and central venous lines at 10.2 (95% CI 9.0-11.5%) and 11.2% (95% CI 10.0-12.6%), respectively (Table 3). Discordancy for medical treatment of a PDA was 6.0%. For all other items (intensive care days, length of stay, transfer, discharge month and year, surgery for PDA, ROP treatment by laser or cryotherapy), discordancies were less than 5%.

Sensitivity and specificity
The prevalence of all outcomes using the NNRD was similar to that derived from the PiPs database ( Table 6). The sensitivity of NNRD data for identifying survival was 100% and for adverse outcomes was 50-87%. Specificity was over 85% for all outcomes with the majority above 90%. The prevalence of adverse outcomes among infants <32 weeks is low and less than 6% with the exception of BPD, defined as oxygen dependency at 36 weeks PMA and medical treatment for PDA (49.0% and 20.3% respectively). The PPV of all outcomes with the exception of perforated NEC (66.0%; 95% CI 51.2, 78.8) and details of cerebral ultrasound scans, was over 75% (Table 6).

Discussion
This is the first study to formally evaluate the population coverage and accuracy of data held on the NNRD. Completeness and accuracy are fundamental components of data quality (15) yet worldwide there are few published reports on the accuracy of population health data. We believe such assessments to be essential both to confirm the validity of data that potentially underpin a range of important research and service functions and also to highlight areas where modification of the data collection tools will improve data quality. The number of neonatal units contributing to the NNRD has steadily increased over the years, including all 163 neonatal units in England since 2012, and units in Wales and Scotland since 2015. National ONS data covers all reported live births in England and Wales including any that die in the delivery room and healthy babies with no involvement with neonatal medical services, neither of these groups is entered on the NNRD. Our data show that the NNRD represents complete population-based data for live-born infants born 25 to 31w GA. The discrepancies at 23 and 24w of gestation (70 and 90% representation on the NNRD) are presumably due to death on labour ward and suggest continuing increase of admission rates at these gestations compared with those reported by the population based EPICure studies which in 2006 reported admissions of live births of 64% at 23 and 86% at 24w of gestation [18]. These changes are likely to be related to improved condition at birth and changing attitudes towards the management of extreme preterm infants. We speculate that the increase over time in percentage of infants ! 32w with NNRD records may be due to changes in commissioning and a drive to capture for payment purposes medical care outside the neonatal unit e.g. postnatal or transitional care wards. For baseline patient characteristics, the completeness of data on the NNRD was generally high with the exception of maternal ethnicity and LSOA (derived from maternal postcode), five minute Apgar score and vaginal/caesarean birth. Linkage of maternal and neonatal datasets to create a seamless perinatal dataset would address these problems and avoid the need for duplication of data entry and the risk of transcription errors.
Discordancy was low for most patient characteristics but for processes/interventions we found a high discordancy for the type of feed given on the first day of feeding and in general discordancy was higher for items involving counting days e.g. days of antibiotic treatment in the first 14 days and days with central venous lines. For infant-level outcome data, major discordancy was low except for whether infants were receiving supplementary oxygen on the day they reached 36 weeks PMA (13.3%).
There are a number of possible reasons why differences between the two data sources were found. The choice of data variables for comparison was constrained by what was available on the two databases and while most items describing baseline characteristics were entered onto both in response the same direct question e.g. 'What was the birthweight?' the majority of processes/interventions and outcomes were asked for directly at the end of each episode on the PiPS CRF e.g. 'In this hospital did the infant have a PDA treated surgically?' whereas in the EPR data could be entered into and extracted from any of three places on the EPR, daily data, discharge diagnoses or procedures during the stay with no direct questions or check lists requiring negative entries. Absence of positive entry on the NNRD was interpreted to mean that the intervention or outcome did not occur whereas it might simply have been missing. This might lead to under reporting within the NNRD and thereby increase discordancy. This problem could easily be overcome with redesign of some entry screens, or the introduction of check lists on the EPR to be completed at discharge.
One of the great strengths of the EPR system underpinning the NNRD and contributing to its richness, is the acquisition of daily data with details of management including items such as the presence of central venous lines, oxygen use, mechanical respiratory support and details of medications. In practice these are used to compute the infant's level of care (normal, special care, high dependency, intensive) and form the basis of charging within the NHS with mechanisms to avoid double counting when babies move between hospitals. It was agreed when this study was planned that the data on the PiPS trial database should be taken as the gold standard. Data describing length of stay in intensive/high dependency care etc for the PiPS trial were collected in response to the appropriate question at the end of each episode 'In this hospital for how many days. . .. . .. . .' and it is possible that for these items the NNRD data, derived as they are from the raw daily data, are the more accurate.
The levels of agreement and discordancy limits preset by the authors seemed reasonable at the time. As the study proceeded and the complexity of the data including the matching of episodes of care within the total stay emerged, and on subsequent consideration of the structure of the two databases, we have to conclude that it was unrealistic to hope that data describing varying practice such as what different milks a baby received in any one day would be recorded identically in both systems. The accurate recording of complex data such as these and of medications would be helped by standardisation of the structure of questionnaires across clinical and research databases.
We identified high specificity but low sensitivity for some important outcomes. The fact that the PPV was generally high despite low overall prevalence for key outcomes highlights the potential utility of the NNRD as a large and growing population database. Smaller local or regional databases would be unlikely to have adequate statistical power to detect clinically important signals. Overall findings were similar to that of an assessment of the accuracy of routinely collected hospital discharge data in New South Wales against data from a statewide audit of selected neonatal intensive care (NICU) admissions. They also found that, though under-ascertained, routinely collected hospital discharge data had high PPVs for most validated items but that procedures tended to be more accurately recorded than diagnoses [19].
A key strength of our study is the comparison with data from an independent clinical trial conducted to the standards of ICH-GCP. The lack of such a comparator is a common limitation of other database validation studies [9]. We were able to assess patient-level rather than aggregated data, and were able to calculate sensitivity, specificity and negative predictive value, rather than only the PPV as in previous validation studies of the General Practice Research Database (GPRD) [20]. Validation studies often only report the proportion of cases that were confirmed by medical record review or responses to questionnaires, thereby only providing an estimate of PPV. Further, whilst many validation studies have not been blinded or reported by blinded reviewers, our comparisons were automated using computer codes written without knowledge of the dataset identity. We also defined the minor and major discordancy a priori to mitigate bias.
Our study has number of limitations; the principal being the constraints imposed on the scope of the comparison because of lack of standardisation of data items. Also we were not able to validate all episodes held on the PiPs trial database against the NNRD. Data linkage was considered at two levels: first whether an infant recruited into PiPS appeared on the NNRD, second whether all of the episodes of care reported to PiPS were identified on the NNRD. For 2% of recruits into the PiPS trial no EPR data could be identified. Whether this was because of errors of the date of birth and NHS number on either the PiPS database or the NNRD or whether, which seems unlikely, the infants were never entered onto the EPR, is unclear. A further limitation is that the comparison of PiPS and NNRD data was confined to the hospitals participating in the PiPS trial in the South East of England (24 recruiting and 33 step-down sites) and may not be generalisable throughout the UK.
Despite these limitations we have shown that high quality, complete data can be extracted from the routinely collected electronic record and how with some minor changes to the EPR data collection the accuracy of recording of processes, intervention and outcomes within the NNRD could be improved. As electronic records become widely incorporated into daily care and replace paper records, it is expected that data quality will continue to improve. The creation of a static database such as the NNRD, from real-time electronic data is a cost-effective means to create a national resource, obviating the need for duplicate data capture by busy clinical teams, and supporting multiple outputs. The secondary utilities of EPR are increasingly recognised, with advantages that include minimising data entry errors, and better population coverage. The NNRD is now used for a growing number of purposes by a number of research groups, professional organisations and Government bodies [21]. The successful creation of the NNRD is a testament to the collaborative efforts of the UK neonatal community. The NNRD has the potential to revolutionise the approach to conducting clinical research, and offers a time and cost efficient method for conducting clinical trials and population epidemiological studies.
Supporting information S1 Table. Items selected for comparison: Baseline characteristics, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies. (DOCX) S2 Table. Items selected for comparison: Processes of care and interventions, including details of the data held in each database, with pre-set definitions of limits of agreement, and minor and major discrepancies.