Moving from Data on Deaths to Public Health Policy in Agincourt, South Africa: Approaches to Analysing and Understanding Verbal Autopsy Findings

Peter Byass and colleagues compared two methods of assessing data from verbal autopsies, review by physicians or probabilistic modeling, and show that probabilistic modeling is the most efficient means of analyzing these data


Introduction
Throughout the history of public health, the concept of recording causes of individual deaths in a population and presenting them in aggregate form has been a central component of understanding health and disease at the community level. This continues to be the case, even though the extent and quality of cause of death data varies widely around the world [1].
For a large proportion of the world's communities in which individual deaths are not routinely recorded and classified by cause as part of routine civil and health service procedures, verbal autopsy (VA) has become an important technique [2]. VA involves interviewing family, friends, or carers after a death has occurred, to find out about the circumstances of death. These data are normally collected by lay interviewers, and their findings are later interpreted into possible cause(s) of death. Approaches to undertaking the interviews and interpreting the findings vary and are still developing, despite various efforts towards standardisation [3]. Much VA work has relied on physicians reviewing interview material and coming to a conclusion on cause of death, following a process closely analogous to clinical practice in which history, signs, and symptoms are used to construct a differential diagnosis. Recently, computer-based probabilistic models have become an important way of interpreting VA data, as an alternative to case-by-case physician interpretation [4]. These have the advantage of being faster, cheaper, and more internally consistent than physician review, but may lack some subtlety and nuance. Some comparisons between physician review and modelled findings have previously been made [4][5][6]. As well as characterising all-age, all-cause mortality, applications of verbal autopsy have included cause of death determination among particular groups such as women of reproductive age [7], and for assessing community interventions [8].
However, the outputs from probabilistic models have some technical differences from those typically generated by physicians, since the likelihood of a particular cause of death is also estimated quantitatively as part of the modelling. Several likely causes can be reported for a single case, and a case may remain partially or wholly indeterminate, particularly where the VA interview material is scanty. These characteristics might seem problematic from a clinical perspective that instinctively seeks a conclusive single main cause of death for each case (even though this is sometimes fudged by labelling two commonly coexisting causes as a single entity, for example the cause ''HIV/AIDS and tuberculosis''). However, since VA is normally applied as a step towards community-based analyses of cause-specific mortality and public health implications, rather than as an endpoint whose primary concern is the individual case, the outcomes are essentially epidemiologically rather than clinically oriented. The proportions of deaths within a population attributable to a particular cause (cause-specific mortality fractions, CSMFs) are particularly important. Thus some uncertainty at the individual level, and possibly multiple causes per case, are not in themselves problematic, but need analytical approaches that make good sense of the data. The public health imperative to understand causes of death in terms of age and sex is also important, in order to understand burdens of premature mortality, to target potential interventions, and to inform health systems development.
In this paper we aimed to assess appropriate methods for analysing and interpreting VA interview data at the population level, using both probabilistically modelled and physician-interpreted results. An example dataset from the Agincourt health and sociodemographic surveillance site (HDSS) in South Africa, a member of the INDEPTH Network (http://www.indepth-network.org), is taken to illustrate the approaches used. Existing physician-interpreted findings from the same dataset are compared with modelled results in the sense of how they are derived and analysed, leading to some comparisons between the two approaches. However, the intention here is not to validate either approach; rather the emphasis is on interpretation and analysis processes which can lead effectively from data on deaths to public health imperatives.

Methods
The Agincourt HDSS covers rural communities located in northeast South Africa, near the Mozambican border, and has monitored a contiguous population of around 70,000 since 1992. The background to this work is described more fully elsewhere [9], in a paper which analyses cause-specific mortality from 6,153 deaths that occurred between 1992 and 2005, on the basis of cause of death as determined by physician review. These physician reviews of VA interview material were each initially undertaken by two physicians independently. If they did not agree as to cause, a third physician arbitrated in order to reach a consensual cause of death. If consensus could not be reached, then no cause of death was recorded.
The same VA interview data were compiled into an input file for the InterVA v.3 probabilistic VA interpretation model (http:// www.interva.net) and processed into cause of death data. The InterVA model is based on Bayesian calculations of probabilities that a particular death was due to particular causes, given a set of symptoms and circumstances associated with the death. This is achieved using a probability matrix which generically estimates probabilities of particular symptoms and circumstances of death, given particular causes. The model was developed using an expert panel and was deliberately designed to be generic and not contextdependent, and to produce relatively broad cause-of-death categories [10]. As previously described [5], the model expects an input of ''high'' or ''low'' to reflect the local prevalence of two specific causes which often vary by more than an order of magnitude between settings: HIV and malaria; here these were set to ''high'' and ''low'' respectively. These settings do not override the handling of individual cases, but are conceptually similar to a physician knowing that a particular disease is common or rare in the local population, irrespective of a particular patient presenting in a consultation.
Compiling the data input file for the InterVA model (which consists of yes/no answers for each case on around a hundred questions relating to the VA interview material) may take some days for a data manager, but processing the file into causes of death using the model then only takes a matter of minutes. This contrasts with thousands of hours of physician time, and a cost in the region of US$20,000, for reviewing a dataset of this size. The model is also totally internally consistent, meaning that rerunning data produces exactly the same output, and there is also therefore complete consistency at the individual case level over time, when considering a series of deaths that actually occurred over many years. With physician interpretation, it is unlikely that the same physicians can be available to undertake this work over an extended period, and in any case it is probable that their thinking and understanding would change over time. The model provides up to three likely causes of death for each case, or concludes that the cause is indeterminate. Each cause assigned is associated with a likelihood, and the sum of likelihoods of assigned causes has a maximum value of 1.00. If the sum of likelihoods of assigned causes is less than 1.00, then the difference reflects a lack of certainty about the overall case. It therefore seems logical to regard this uncertain proportion of each case as an indeterminate component.
For analysis, a dataset was constructed from the model's output (using Microsoft FoxPro) in which each case had one or more records, each record having one cause (including the possible cause ''indeterminate'') and a weight corresponding to the likelihood of that cause for the particular case. Thus over the whole dataset, the sum of all the weights was equal to the number of cases, 6,153. This dataset included a total of 11,834 records, an average of 1.92 per case. This data structure also facilitates the import of other background factors of interest (since every record contains the individual identifier variable), which can then be analysed against particular causes of death in a weighted multivariate model. Physician-interpreted material where consensus on a single main cause is required can be analysed in very similar ways, with the conceptual weighting for each case being 1.
The analyses presented here were carried out using Stata 10.
Surveillance-based studies in the Agincourt subdistrict were reviewed and approved by the Committee for Research on Human Subjects (Medical) of the University of the Witwatersrand, Johannesburg, South Africa (protocol M960720). Informed consent was obtained at the individual and household levels at every follow-up visit, whereas community consent from civic and traditional leadership was secured at the start of surveillance and reaffirmed from time to time.

Results
The same 6,153 deaths as presented previously using physicianinterpreted causes [9] are shown in Table 1, with cause of death as determined from the same VA material by the InterVA model and shown by cause and age-sex group. The physician-determined CSMFs for the overall population are also shown for comparison. The ten highest ranking causes constituted 83.3% of the total according to physician interpretation and 88.2% according to probabilistic interpretation, and 8/10 of these causes were the same according to both approaches (HIV, tuberculosis, chronic cardiac, diarrhoea, pneumonia/sepsis, transport-related accidents, homicides, and indeterminate). The fractional causes of death from the model reflect the aggregation of likelihoods of particular causes over age-sex subgroups within the Agincourt population. These subgroups are the same as those used for the input file to the InterVA model.
The overall proportion of indeterminate cases was 31.0%, compared with 34.8% in the physician review process. This indeterminate category included 359 deaths for which verbal autopsies were not successfully completed. In the InterVA model, a further 375 cases were rated as completely indeterminate, and the summed weights of the uncertain proportion of the remaining cases totalled 1,170.9, an average uncertainty per case of 24.3%. In the physician coding, 1,609 individual cases were considered to be indeterminate, either because of insufficient information or failure to reach a consensus between assessing physicians. The physicians considered a further 173 cases as indeterminate for particular reasons, for example sudden deaths of unknown cause. It was also interesting to note that the physician coding process led to using a total of 250 different ICD-10 codes, but the ten most frequently used ICD-10 codes accounted for 70.7% of the deaths. Table 2 shows the five principal causes of death for each age group and period. It has been constructed to be as similar as possible to the corresponding table in the previous paper using physician interpretation of the same dataset ( Table 2 in [9]), a process which involved regrouping InterVA causes of death accordingly. For each period and age group, the physician-interpreted ranks from the previous paper are also shown for comparison.
One instance in which there was a clear difference in the estimates between the physician-coded and modelled findings was in tuberculosis as a cause of death among the elderly (over 65 years). According to the physicians, 96/1,492 (6.4%) of deaths in this age group were due to tuberculosis, compared with 318.4/ 1,492 (21.3%) according to the model. Of the 96 cases reported as tuberculosis by the physicians, 78 (81.3%) were also concluded to be tuberculosis by the model. However, among the 241 cases rated as tuberculosis by the model but not by the physicians, 103 (42.7%) were rated as indeterminate by the physicians. To elucidate this difference, Table 3 shows the breakdown of key VA interview parameters which might contribute to a conclusion of tuberculosis as cause of death, both for the InterVA model and for the physicians. It includes the positive predictive value (PPV) for tuberculosis for each parameter, both in the physician and model interpretation.

Discussion
Having considered the causes of more than 6,000 deaths over a 14-year period, the ten highest-ranking causes accounted for 83% and 88% of all deaths by physician interpretation and probabilistic modelling respectively, and eight of the highest ten causes were common to both approaches. Probabilistic modelling was cheaper and more internally consistent than physician interpretation. Uncertainty around the cause(s) of individual deaths was recognised as an important concept that should be reflected in any overall analysis of cause-specific mortality.
The advantages and disadvantages of physician-interpreted and probabilistically modelled cause of death data as evidenced by these analyses were largely as anticipated. Physicianinterpreted findings included a number of quite specific, but rare, causes which were not designed to be addressed by the current model. While it is possible to build similar models with more detailed inputs and outputs, as has been done for deaths among women of reproductive age [7], this model was designed to capture major cause-of-death groupings. In principle a model designed to include greater differentiation-for example, between different cancers at particular sites-could be constructed; but the extent to which that would lead to greater understanding of population health is less clear. The very large number of specific causes used by the physicians, even though the occurrence of many was very low, could be regarded as an advantage in terms of subtlety or as a disadvantage in terms of clear overall understanding of mortality patterns (without applying further judgement calls on appropriate grouping). Probabilisticallymodelled interpretation has major advantages in terms of cost (not needing to pay physicians), time (less delay in getting results after interviews) and complete consistency. A recent review of the Indepth Network accordingly concluded that the InterVA model represented the most effective way forward for standardised interpretation of VA data across the network. [11] However, there is also the possibility of there being consistent errors encapsulated in the model.
The methods described here for analysing the probabilistically modelled cause of death data are relatively straightforward, taking into account that particular causes of death have been modelled with a specific likelihood and the quantifiable margin of uncertainty associated with many individual cases. These methods allow the margins of uncertainty associated with individual cause of death assignments to be carried through into the aggregated analysis process.
When physicians are used to assess VA material, and particularly if, as was the case for these data, physician consensus on individual cases is taken to be an important part of the process [12], then a simpler analytical approach can be used, as evidenced in the earlier paper using these data [9]. Once each case is assigned a cause of death or is considered to be indeterminate, categorising and tabulating cases as needed is straightforward, since each death counts as a single case. However, it has to be realised in this approach that any sense of the uncertainty that may have been evident in the original physicians' consideration of individual cases, or in consensus conferences, has already been eliminated before aggregated analysis begins. Since both approaches yielded only about two-thirds certainty, incorporating uncertainty in aggregated measures of cause-specific mortality seems important. Uncertainty might be better handled in physician-interpreted data if individual physicians' opinions were used, rather than insisting on consensus.
Given the very different approaches to cause-of-death interpretation and analysis as presented here and in the earlier paper (probabilistic modelling and analysis incorporating uncertainty, versus physician assessment and tabulation of definitive consensus findings), it is perhaps remarkable that many of the salient features of Table 2 here and Table 2 in the previous paper are closely similar [9]. Both give a picture of a population increasingly dominated by the burden of HIV-related mortality as time passes, together with appreciable numbers of deaths due to external causes, and relatively low infectious disease mortality (apart from the HIV/TB combination). It is also interesting to note that the overall proportion of cases to which specific causes could not be attributed is similar, despite being derived from completely different methods. There are also some potentially important differences emerging from the two approaches, even though they are not huge in the context of the entire dataset. In considering any such differences, it has to be recognised that there is no gold standard available here. Kahn et al. have previously undertaken a validation exercise between physician assessments and a limited number of welljustified hospital-based diagnoses [12], and we plan to extend this to a detailed three-way comparison including the InterVA findings for this limited subset of deaths. However, in a community such as this where many people die without contacting health services, and where hospital records are often of poor quality, the quest for a wide-ranging gold standard for VA findings which fairly represents all causes and circumstances of death has to be regarded as futile.
Notable differences that do emerge include lower estimates of malignant disease in the InterVA findings and lower estimates of tuberculosis among the elderly in the physician data. The InterVA model also gave higher estimates of HIV-related mortality in the first period (1992-94), which is particularly interesting to note. This early difference may reflect a degree of false-positive HIVrelated findings by the model during a period of lower HIVprevalence, and this needs to be further investigated in terms of characterising the overall HIV prevalence for the model as ''high''. On the other hand, it might reflect a difficulty among the physicians in achieving consensus on HIV as a cause of death in those relatively early days of the epidemic, and this is also something to look into further. It seems likely that during the onset of the epidemic, individual physicians' perceptions of new disease patterns might have developed quite rapidly, but not necessarily in the same ways and at the same rates, depending on their personal experiences. This process, at least for a while, might have increased the difficulties in achieving consensus on HIV-related causes. The reported rate of ill-defined or unknown causes was highest in the physician-coded material for the period 1992-94 (approximately one-third), falling to approximately one-fifth by 2002-04 [9].
The examples of factors leading to tuberculosis as a cause of death among the elderly, as detailed in Table 3, provide interesting insights into differences between the two interpretations. It is clear that the physicians mainly determined tuberculosis as a cause when chest pain, chronic cough, productive cough, and weight loss were all reported for a particular case, whereas the model took a less specific approach. This is reflected in the generally higher positive predictive values for physician interpretation. On the other hand, the high proportion of indeterminate conclusions reached by the physicians among the model's probable tuberculosis cases suggests a degree of uncertainty in their deliberations, rather than clear alternative conclusions. There may also be a question of physicians' expectations of the likelihood of tuberculosis among the elderly, given that many elders in this community will now be living in households with younger adults coinfected with HIV and tuberculosis. Recent studies from Spain [13] and China [14] reported raised tuberculosis case-fatality rates among the elderly. In any case, although this example represented one of the larger discrepancies between the two approaches, it still accounted for only 218/6,153 (3.5%) of overall deaths.
The importance of conceptual categorisations of cause of death can also be seen in these comparisons. At first sight, it appears that the approaches (Table 2) gave different pictures regarding deaths due to malnutrition among the under-5s, with 0.4% from the model and 9.0% from the physicians [9]. However, if one considers that tuberculosis is probably a relatively rare cause of death in young children, even as an HIV coinfection (as evidenced in nearby Mozambique [15]), and that HIV-infected children are Table 3. Pulmonary tuberculosis as a possible cause of death among 1,492 elders (65+ years) as interpreted by physician consensus (6.4%) and probabilistic modelling (21.3%), in relation to selected verbal autopsy parameters. more likely to follow a pattern of chronic diarrhoea and malnutrition [16], then the picture changes somewhat. So, taking the physicians' ''HIV/tuberculosis'' grouping as mainly not being tuberculosis in this age group, and adding that to their ''diarrhoea'' and ''malnutrition'' codings, for the under-5s the proportions of deaths due to ''HIV/diarrhoea/malnutrition'' were 38%, 41%, 42%, and 52% for the four periods, respectively. This result is strikingly similar, in magnitude and progression, to the same grouping from the InterVA findings (34%, 41%, 42%, and 56% respectively), and would represent the largest single cause of under-5 mortality in both approaches. Thus conceptual groupings that reflect real public health issues, rather than (in this instance) rather sterile debates as to what HIV-infected children with chronic diarrhoea and wasting actually die from, are crucial. International Classification of Diseases (ICD-10) coding for causes of death may not therefore be as relevant at this conceptual level, even if they can be a useful framework at earlier stages, for example in assigning physician-coded causes.
The main aim of this paper is not to provide a validation of any particular VA method, but to consider alternative approaches for handling interview data on individual deaths to give meaningful pictures of population health. These data are the basic resource for public health planning: the questions in our minds throughout these considerations have started from ''If I were the local Director of Public Health…''. From these data, and irrespective of the methods used for analysis and interpretation, it is clear that the Agincourt population has undergone rapid changes, which imply new intervention target groups, expanded demands on health professionals' skills, changing demands on health services and increasing resource requirements. The pictures of the major public health themes within the Agincourt population that emerge from both of the interpretative approaches considered are encouragingly similar, both in terms of overall cause-specific mortality patterns and in the ways that they have tracked changes over time, and the adoption of one or other method of interpretation would not lead to fundamentally different public health actions. The clear development of the HIV epidemic revealed in this example, and seeing which population subgroups are vulnerable to particular diseases, both highlight some of the advantages of using VA as a public health tool. At least where VA is used within routine health services, probabilistic modelling with its consistent approach over time and place, the elimination of inter-and intra-assessor variation, faster results, and much lower cost, should be the interpretative method of choice.

Editors' Summary
Background. Whenever someone dies in a developed country, the cause of death is determined by a doctor and entered into a ''vital registration system,'' a record of all the births and deaths in that country. Public-health officials and medical professionals use this detailed and complete information about causes of death to develop publichealth programs and to monitor how these programs affect the nation's health. Unfortunately, in many developing countries dying people are not attended by doctors and vital registration systems are incomplete. In most African countries, for example, less than one-quarter of deaths are recorded in vital registration systems. One increasingly important way to improve knowledge about the patterns of death in developing countries is ''verbal autopsy'' (VA). Using a standard form, trained personnel ask relatives and caregivers about the symptoms that the deceased had before his/her death and about the circumstances surrounding the death. Physicians then review these forms and assign a specific cause of death from a shortened version of the International Classification of Diseases, a list of codes for hundreds of diseases.
Why Was This Study Done? Physician review of VA forms is time-consuming and expensive. Consequently, computerbased, ''probabilistic'' models have been developed that process the VA data and provide a likely cause of death. These models are faster and cheaper than physician review of VAs and, because they do not rely on the views of local doctors about the likely causes of death, they are more internally consistent. But are physician review and probabilistic models equally sound ways of interpreting VA data? In this study, the researchers compare and contrast the interpretation of VA data by physician review and by a probabilistic model called the InterVA model by applying these two approaches to the deaths that occurred in Agincourt, a rural region of northeast South Africa, between 1992 and 2005. The Agincourt health and sociodemographic surveillance system is a member of the INDEPTH Network, a global network that is evaluating the health and demographic characteristics (for example, age, gender, and education) of populations in low-and middleincome countries over several years.
What Did the Researchers Do and Find? The researchers applied the InterVA probabilistic model to 6,153 deaths that had been previously reviewed by physicians. They grouped the 250 cause-of-death codes used by the physicians into categories comparable with the 33 cause-of-death codes used by the InterVA model and derived cause-specific mortality fractions (the proportions of the population dying from specific causes) for the whole population and for subgroups (for example, deaths in different age groups and deaths occurring over specific periods of time) from the output of both approaches. The ten highest-ranking causes of death accounted for 83% and 88% of all deaths by physician interpretation and by probabilistic modelling, respectively. Eight of the most frequent causes of death-HIV, tuberculosis, chronic heart conditions, diarrhea, pneumonia/sepsis, transport-related accidents, homicides, and indeterminate-were common to both interpretation methods. Both methods coded about a third of all deaths as indeterminate, often because of incomplete VA data. Generally, there was close agreement between the methods for the five principal causes of death for each age group and for each period of time, although one notable discrepancy was pulmonary (lung) tuberculosis, which accounted for 6.4% and 21.3% of deaths in this age group, respectively, according to the physicians and to the model. However, these deaths accounted for only 3.5% of all the deaths.
What Do These Findings Mean? These findings reveal no differences between the cause-specific mortality fractions determined from VA data by physician interpretation and by probabilistic modelling that might have led to substantially different public-health policy programmes being initiated in this population. Importantly, both approaches clearly chart the rise of HIV-related mortality in this South African population between 1992 and 2005 and reach similar findings on other major causes of mortality. The researchers note that, although preparing the amount of VA data considered here for entry into the probabilistic model took several days, the model itself runs very quickly and always gives consistent answers. Given these findings, the researchers conclude that in many settings probabilistic modeling represents the best means of moving from VA data to public-health actions.