Documenting Mortality in Crises: What Keeps Us from Doing Better

Francesco Checchi and Les Roberts discuss how mortality among crisis-affected populations is currently documented, barriers to better documentation, and how these barriers might be overcome.

T he effects of crises (manmade or natural disasters) on physical health are ultimately quantifiable as a rise in mortality. Precise and unbiased estimates of mortality rates (deaths per persontime) or excess death tolls (deaths attributable to the presence of the crisis) are critical to grading the severity of a crisis at its onset and over time, and adjusting relief operations accordingly [1,2]. Indeed, the onset of emergencies is commonly defined as a doubling of mortality rate from the pre-crisis baseline, or the crossing of fixed thresholds, typically one death per 10,000 person-days [2]. In reality, because mortality increases only after a crisis has evolved, acute malnutrition may be a better indicator for early crisis detection [3], and data on morbidity and on the coverage of interventions against the main known risk factors for poor health outcomes (e.g., insufficient water and sanitation, lack of preventive and curative health services, etc.) are more useful to target relief programmes and minimise preventable deaths.
Mortality data also provide a basis for advocacy, which may be "humanitarian" (calling for appropriate assistance) or "political" (for example, calling for compliance with international humanitarian law [IHL], a set of rules that seek to limit the effects of armed conflict for humanitarian reasons [see http://www.icrc.org/web/eng/ siteeng0.nsf/html/humanitarian-lawfactsheet]). As historical documents, mortality data also illuminate the consequences of humanity's failures to resolve conflicts non-violently and to protect vulnerable groups from war or disasters. In Table 1, we outline these two main functions of mortality data-the support of relief operations and evidence-building for advocacy/ documentation. However, we believe that both functions can often be served simultaneously. In this article, we attempt to summarise how mortality within crisis-affected populations is documented at present, discuss our perceptions of the barriers to better mortality measurement, and suggest ways by which these barriers might be overcome (see Box 1 for the main suggested actions).

Current Tools for Measuring Mortality
Prospective surveillance. The gold standard tools for monitoring mortality are vital registration systems complemented by frequent census exercises. However, these tools were missing or deficient in all recent high-mortality crises (such as in Darfur, the Democratic Republic of the Congo [DRC], and Angola), with the exception of Bosnia, where vital registration systems continued to function throughout the war [4]. Hence, guidelines for health interventions in crises recommend establishing prospective mortality surveillance as a top priority, either by strengthening existing health information systems or creating new ones [5][6][7][8][9]. Generally, such surveillance involves daily or weekly visits to households by community health workers who also update population figures (in high density communities, monitoring of burials is sometimes an alternative [10]). Real-time analysis of trends enables timely reaction to deteriorating conditions [1]. For example, in 1992, United Nations High Commissioner for Refugees (UNHCR) data from Ghundum II Camp showed that Burmese refugee girls had a higher mortality rate than boys and were less likely to attend clinics. This resulted in the host nation (Bangladesh) waiving treatment fees: within weeks, excess female mortality disappeared [11].
While hard to gauge, the adoption of comprehensive mortality surveillance globally appears very limited. We believe that this results in loss of life and inappropriate resource allocation in at least two ways. Firstly, agencies do not invest in observing mortality prospectively, and secondly, agencies underestimate mortality due to insufficient monitoring of surveillance systems (the sensitivity of ascertaining deaths by surveillance tends to decay unless data collection teams are supervised). Many agencies may simply not realise the indirect connection between timely data and improved health. An exception are the refugee camps managed by UNHCR, where surveillance is commonly implemented and standardised. But in addition to the 8 million refugees worldwide [12], 25 million people are internally displaced [13] and many more live in crisis conditions; thus, the majority of crisis-affected populations are not under surveillance.
We could only find one recent study that systematically evaluated mortality surveillance in multiple crisis settings [14], and we know of no guidelines on mortality surveillance implementation in crises. Yet surveillance is probably cost-beneficial and feasible in all but the most extreme conditions. We suggest an international initiative to foster its use, define best practices, and harmonise tools, as recently advocated for stable settings [15].

Retrospective surveys.
Humanitarian agencies often resort to surveys instead of surveillance (or sometimes they do neither). In retrospective mortality surveys, representative samples of households are interviewed about demographic events within the household (births, deaths, arrivals and departures) over a given period up to the present, yielding mortality the expected range of non-crisis mortality in similar settings (e.g., 0.3 to 0.6 per 10,000 person-days in sub-Saharan Africa); data from the setting itself are however preferable baseline mortality data Sampling requirements over small periods and populations need careful interpretation [80] For benchmarking severity at the outset, the sampling design should enable "reasonably" precise estimates of CMR and U5MR (i.e., enough to infer an abnormal elevation) rate estimates and confidence intervals.
Most surveys are commissioned for immediate operational purposes [16], but they suffer from a number of weaknesses: (1) analysis and reporting may take weeks; (2) results reflect mortality in the past rather than current mortality rates; (3) confidence intervals often overlap emergency thresholds, and there is usually insufficient power to detect trends in time; (4) very recent periods cannot be looked at without unfeasible sample sizes; and (5) since cluster sampling is almost always used (see below), estimates for different sub-regions within the surveyed area cannot be generated, unless stratification is built into the sampling design a priori [17]. In short, surveys are of limited use for emergency relief operations.
Survey implementation is often haphazard and fraught with biases [18,19], and surveys conducted during complex humanitarian emergencies are prone to several methodological limitations [20]. In most crises, lists of households are non-existent and the residential layout is chaotic, making simple or systematic random sampling difficult. An alternative sampling design that is commonly employed, even though it is less precise and more prone to bias, is multi-stage cluster sampling [20]: the first stages involve random selection of n 1 cluster starting points in the population sampling frame, while the last stage entails sampling of n 2 households around each point.
Allocation of cluster points to communities (e.g., villages, camps) within the sampling universe (target population for the survey) is usually proportional to the population size of the communities, which leads to some bias towards high population density areas. When communities' locations and sizes are unknown, spatial sampling of random global positioning system (GPS) coordinates is an option. This approach is biased towards low density communities, although the bias can be controlled for [21]. There is insufficient evidence on the minimum number of clusters needed to ensure statistical robustness under different mortality conditions (i.e., magnitude of the mortality rate, degree to which deaths are clustered together in space), and on the incremental precision benefit of adding more clusters.
Household sampling within clusters usually relies on a technique known as "spin-the-pen," whereby a household along an imaginary line running from the centre to the edge of the community is sampled first, and further households are selected through a proximity rule (e.g., next closest). This method can entail selection bias and reduce precision; alternatives include sampling random GPS coordinates [22] or mapping a segment of the community and doing systematic sampling therein [17]. The question is mainly one of cost-benefit: how much bias can be eliminated by introducing potentially more resource-intensive sampling designs? Further research, particularly based on mathematical simulation, is needed to optimise sampling choices.
Various questionnaires have been used to elicit information on deaths, but there has been little comparison of their reliability. Next-of-kin reports about non-violent cause of death may be unreliable; full verbal autopsies (interviewing the next-of-kin to try to establish the cause of death based on reported signs and symptoms) [23] may be impracticable in emergency surveys, but simplified versions could be developed to classify causes of death broadly and prioritise health relief interventions [17].
Despite the limitations of surveys, they will remain important for relief operations, as they can establish mortality levels at the outset of an intervention or supplement failing surveillance. However, their commissioning must be more rational and quality needs to improve.
Surveys are also a critical tool for advocacy and documentation [24][25][26][27][28][29][30][31][32]. However, since they are often conceived to address operational questions, they may not investigate person-time (i.e., the total amount of time that a cohort of people are at risk) in which most mortality occurred. During long recall periods in situations of war and displacement, the sampled cohort is dynamic: households undergo fragmentation and re-composition, with migration within and outside; individuals may enter the at-risk group at different times (e.g., if displacement is gradual); and the communities investigated may be remnants of larger groups that were together before the crisis [31]. For example, the Karen and Karenni communities in eastern Burma have split among forced displacement sites, Thai refugee camps, and villages of origin [26,33]). In such cases, welldesigned questionnaires are needed to track each individual's time of entry and exit from the at-risk cohort, and the definition of "at risk" needs to be explicit.
Other methods. Demographic methods that attempt to analyse Box 1. Suggested List of Key Future Actions for Better Mortality Documentation in Crises simple tools for emergency mortality surveillance implementation and analysis. surveillance systems as soon as possible after the onset of the crisis. researchers, policy makers, the media, and civil society to widen understanding of the strengths and limitations of various sources of mortality information in crises.
in emergency surveillance and survey methods, including NGO and UN staff, academics, and local government scientists.
mortality estimation, and, if necessary, do studies to establish the relative validity of various methods (including sampling and questionnaire designs). data collection (including remote surveys and satellite data analysis) and mortality prediction (including mathematical modelling).
evaluating the performance of donors by monitoring key indicators of population health in their funded projects, including mortality.
charge of collecting mortality data on a systematic basis, especially in underpublicised and under-funded crises.
(possibly housed within the above body) to arbitrate disputes about study validity, review study protocols and reports, and define best practices for mortality data collection in crises.
the age-sex composition of the population [34] have been used to indirectly estimate excess mortality (see below) by combining census, fertility, child mortality, or sibling and parent survival data with the expected age-sex structures of the population investigated ("model life tables") [35]. Major drawbacks of this approach are that crisis-affected populations often do not fit these expected structures, and that crisis settings violate the assumptions of constant conditions over periods of months or years inherent in many demographic methods.
"Body counts" usually capture only a fraction of actual deaths, ignore indirect mortality (deaths not caused directly by violent trauma in a conflict but by the deterioration in health services and increased risk of disease attributable to the violence), and indicate at best a minimum death toll directly attributable to violence. In nearly all crises, most deaths occur outside health facilities and are not reported to government offices or by media [36]. In Iraq, morgue reports over the first 2.5 years of occupation tallied to about 45 violent deaths per 100,000 per year [37], a rate comparable to that observed in Baltimore and Washington, D.C., lower than in New Orleans [38,39], and implying a less than 10% rise over the baseline despite circumstantial evidence to the contrary [40].
Active identification of victims often requires years of painstaking work, and may never be complete, but it has value for legal documentation. In Bosnia, nearly 100,000 war fatalities were individually ascertained [41]. Where several individual lists of deaths are available, capture-recapture estimation, a method developed in ecology, can infer the total death toll, including deaths not recorded on any lists, by analysing the overlap of different lists, as in Guatemala [42] and Kosovo [43].
New technology deserves consideration for improving estimates, especially in hard-to-access populations. High-resolution satellite images of Darfurian villages being attacked or destroyed are now available [44,45]. Satellite photographs could allow for remote burial monitoring. Expanding telecommunications may enable sentinel surveillance (whereby specific sites are selected for prospective monitoring on the assumption that trends at these sites reflect the population experience) or snowball sampling (whereby a given type of respondent, e.g., a member of a household that experienced an attack, refers the researcher to other similar households, and so forth, thus building an iterative sample). One of us (LR) asked Columbia University students to interview a convenience sample of Baghdad residents contacted by telephone or e-mail about violent deaths near their homes since the 2003 invasion; data were then compared to media reports to assess the latter's completeness [46].
Excess mortality estimation. Accumulating evidence shows that in most wars, mortality indirectly attributable to violence (due to disruption of health services, displacement into unsanitary camps, food insecurity, etc.) far exceeds that due to intentional injury [47]. Documenting this total excess mortality entails estimating the rates of all-cause mortality that can be attributed to the crisis, and then projecting them to the entire person-time at risk during the crisis. This requires a baseline rate, namely mortality that would have occurred in the absence of a crisis. However, this baseline rate is immeasurable [48], and so pre-crisis mortality is usually adopted instead [49]. Alternatively, the baseline rate is taken to be mortality in neighbouring "control" populations with similar demographic and epidemiological characteristics that are unaffected by crisis [50]; however, selecting these controls presents obvious difficulties because no two populations are truly the same and because assumptions of similarity cannot be tested once a conflict begins. Available pre-crisis mortality estimates come from census or national health surveys, but they are often imprecise at administrative levels below the national level, or may be outdated, especially in chronic crises (e.g., Burundi, Afghanistan). Sensitivity analysis of different scenarios is prudent in such cases. Alternatively, surveys can establish a baseline by investigating pre-crisis person-time, though long periods may introduce biases with event recall. Evidence suggests that householdlevel IHL violations are associated with all-cause mortality [51]. Future mortality studies could elucidate these causal links further by explicitly exploring violent exposures as a determinant.

Commissioning, Interpretation, and Use of Mortality Data
The role of different players. Ideally, local governments should spearhead data collection, but often lack technical capacity or have no interest in documenting the impact of conflicts to which they are parties.
The United Nations (UN) system's track record of documenting mortality and IHL violations is poor. The World Health Organization (WHO) has a mandate [52], but most expertise is confined to headquarters, and though we are unaware of detailed analyses of WHO's work in individual crises, our experience suggests that its decentralised country offices, especially in Africa, are often unwilling to challenge governments [53].
One of the few WHO mortality surveys ever done in a crisis [54] was critical to scaling up relief in Darfur. But the UN has done little mortality measurement in most crisis settings including Chechnya, Angola [55], Burma, Zimbabwe [56], West Africa, and the Central African Republic. We believe that, by not commissioning data collection, the UN may diminish its diplomatic ability to pursue conflict resolution and civilian protection based on evidence. Furthermore, by not upholding existing reliable figures, the UN may undermine the credibility of data collecting agencies and enable parties responsible for conflict to dismiss the data as partisan.
Non-governmental organisations (NGOs), human rights organisations, and academics have partly filled the data collection void [16], but have variable technical expertise [18], are often characterised in the press or by parties to the conflict as being biased and agenda-driven, and can easily be barred from working by host governments. The influence of their findings on policy has probably been minimal in Burma and Iraq, and moderate in the DRC [57] and Darfur [58,59].
Donors should enhance the accountability of the humanitarian projects they fund by encouraging collection of mortality data. However, demonstrating the mortality impact of specific funded interventions (e.g., water provision) generally requires complex study designs [60] and is rarely needed, since evidence on the effectiveness of these interventions (e.g., vaccination; vector control) often already exists. What needs to be documented instead is their coverage among targeted beneficiaries. On the other hand, some assessment of the impact of the relief operation as a whole can be done through mortality surveillance or surveys, provided contextual information is used to interpret findings and correctly attribute causality. However, interagency coordination is needed to implement mortality studies that cover the entire crisis-affected population.
Generally, UN, governmental, and other agencies involved in longterm development have not adapted their toolbox to fast-evolving crises. They have relied instead on bulky, infrequent demographic assessments such as the Multi-Indicator Cluster Surveys or Demographic and Health Surveys (nationwide surveys employing large samples and complex designs that collect data on a wide range of health, nutrition, and household livelihood indicators). While these types of assessments are needed to track development goals, they require up to a year for reporting and generate mortality estimates centred several years in the past. Epidemiologists measure mortality as an incidence rate, while demographers typically rely on previous birth histories and indirect methods; inter-disciplinary dialogue would help to harness the best of both disciplines for crisis conditions specifically.
To improve data commissioning, methods should be streamlined and standardised for the benefit of nonacademic agencies, who will likely remain the main data producers. The Standardized Monitoring and Assessment of Relief and Transition (SMART) initiative (see http://www. smartindicators.org/), an interagency initiative to improve monitoring and evaluation of humanitarian assistance interventions, has developed a standard survey protocol for mortality and nutritional anthropometry to support relief operations [61]. The protocol includes automated data management software, which NGOs such as Action Against Hunger are adopting; this tool should be disseminated more widely.
Technical capacity must be scaled up through focused training on emergency epidemiology methods for both researchers and policy makers. The expert pool is overstretched and experiences high turnover, as with most humanitarian professionals [62]. Most public health schools do not teach emergency epidemiology methods in depth, and opportunities for skills handover are mostly on the job; many agencies thus work in isolation and produce invalid data.
Political sensitivities of mortality data. Mortality information can be politically inflammatory and prone to manipulation. Households may hide deaths if they occurred among combatants or victims of persecution, or if the reporting of deaths would result in a smaller aid ration [63]. Relief agencies may dispute mortality findings that suggest minimal impact of their relief efforts, or support them if they attract donor interest. At country or international levels, opposition parties, advocacy organisations, and the media may use findings to further their claims [1]. Combatants will generally reject findings that show their actions have led to massive death or that expose a hidden crisis. The North Korean [64] and Zimbabwean [65] governments initially denied extraordinary food crises. Rwanda rejected mortality findings from areas of the DRC where it had intervened militarily (L. Roberts, personal observation). The Russian government hindered investigations of disappearances in Chechnya [66]. Sudan's president claimed only 9,000 died in Darfur [67], despite various studies that reported a far greater number [58,59,[68][69][70]. Leaders of the coalition of countries that invaded Iraq in 2003 dismissed survey estimates of 601,000 deaths due to violent trauma over the first 40 months after the invasion [48]; a lower estimate (151,000) [71] has not met with such scepticism.
A 2005 report from northern Uganda by WHO and partners [72] yielded a typical scenario. The government rejected the report and barred its publication; pro-government media criticised it; an NGO coalition used it to denounce the crisis; and an opposition figure claimed genocide; while the UN country team, including agencies sponsoring the survey, remained silent [73].
From local to crisis-wide estimates. Estimates covering the entire crisis (both chronologically and geographically) are more powerful in policy terms than disparate data from various locations and periods. However, the latter situation is typical: to our knowledge, in the past decade real-time crisis-wide data were only available for the DRC [28] and Iraq [71,74], though estimates were derived for Kosovo [30,43,75] and Timor-Leste [76] after peace broke out.
Crisis-wide syntheses of patchwork data can provide an overall picture, but extrapolation of available findings to person-time not covered by data collection is a treacherous task, subject to fragile assumptions. In Darfur, insufficient data have led to widely divergent extrapolation estimates. For example, projections for September 2003 to February 2004, when violence levels were highest, are based solely on three localised surveys done in West Darfur, which altogether covered less than 15% of person-time at risk during that period. Reliable baseline mortality data for Darfur are unavailable, and different projections adopt baseline levels ranging from 0 to 0.6 deaths per 10,000 person-days. Insecure areas of South Darfur have never been surveyed [70].
Improvements could be explored, such as spatial modelling of mortality risk based on circumstantial information (e.g., reports of attacks, nutritional crises, epidemics, displacement dynamics) and evidence of the statistical association between these factors and mortality. Such spatial modelling is currently being used to analyse food security in Somalia [77]. Extrapolative exercises, however, are only a partial substitute for timely data collection that is representative of all person-time: the latter requires coordination, funding, and, most importantly, access by researchers to affected populations.
Understanding statistics. When done in settings without many other sources of information, studies can become high-profile events, as in the DRC [28] and Iraq [25], attracting non-specialists' attention. Global political and human rights activism, sustained by greater access to media and the Internet, is now a force to be reckoned with. Increased information flow carries ethical responsibilities for those generating and reporting on data. Debate around Iraq mortality estimates flourished not just among experts, but in wider academic circles, generalist media, and particularly the blogosphere. A common critique was that survey estimates of 500 violent deaths per day were implausible, as they implied that the media were largely missing a daily catastrophe [78]. Understandably, journalists and the general public view palpable body counts as being more reliable, and researchers struggle to convey the inherent superiority of estimation based on representative sampling. In African crises, where media coverage is far weaker, excess mortality is largely due to small arms and disease, and thus even more imperceptible, though it may continue for years and affect millions.
Insecurity is an underappreciated constraint to mortality documentation. Most data are generated after fighting subsides and the crisis is no longer under the international spotlight. Studies are often done in areas where locals are not free to report atrocities or corroborate statistical estimates. Wishing to minimise risk of physical harm for interviewers and respondents, researchers tend to design small surveys with wide confidence intervals, giving the data an added aura of unreliability.
Iraq has provided a rare opportunity for policy makers, journalists, and civil society to consider different mortality data sources. However, more dialogue needs to occur among these stakeholders and researchers so as to increase non-specialists' familiarity with statistical estimation and the hierarchy of different types of evidence. Methods that are population-based (i.e., designed to capture or estimate all deaths, including those that occur outside health facilities or are unreported to government systems or journalists) should generally be viewed as superior. Crisis-wide surveys, as in the DRC, are preferable to extrapolation of disparate data, as in Darfur. However, single survey estimates may feature biases that, when projected to the entire population and period (person-time) represented by the sample, can cause serious inaccuracies. Independent replications of surveys or the simultaneous implementation of other population-based methods enhance the strength of inference and reduce the scope for speculation. After the Kosovo war, three studies [30,43,75] yielded remarkably similar estimates of 10,000 to 12,000 killings. In Iraq, the 10-fold difference among violent death toll estimates mainly reflects a comparison between surveys and body counts, as well as heterogeneous analysis periods [48,74].

Conclusion: A Dedicated Body to Monitor Mortality?
How many of the more than 3 million estimated to have died in the DRC because of war [28] might still be alive if credible, crisis-wide mortality estimates had become available sooner, and been used to inform policy? The establishment of a technical, apolitical body dedicated to timely, systematic collection of valid mortality data, especially in the least funded and publicised crises, could help to ensure that the DRC experience is not repeated. Such a body could also independently evaluate mortality study protocols and reports, promote bestpractice methods, and train a cadre of researchers to be deployed to emergent crises. Such a body could constitute a resource for relief agencies and improve the quality of press coverage and discussion around ongoing crises. The fledgling Health and Nutrition Tracking Service [79], currently hosted by WHO, proposes to coordinate some of the above tasks. If housed within a UN agency or government, its effectiveness might be stymied by negotiations between UN headquarters, the UN country office, and the host government. Whatever its positioning (we suggest autonomy from all existing institutions; Spiegel proposes ad hoc bodies for each crisis [18]), its independence will be a key success determinant, and could be fostered through the following measures: (1) non-earmarked, long-term funding by a very broad spectrum of donors, with preference for politically neutral ones; (2) ability to pursue projects without consulting donors; (3) involvement of experts based on technical merit alone; and (4) independent review by a contracted firm and/or representatives from civil society, especially from crisisaffected populations. The monitoring of war prisoners by the International Committee of the Red Cross could serve as a model.
The expression "a matter of life and death" must have an equivalent in almost every language. Documenting mortality is itself a matter of life and death, since it can instigate better relief assistance and greater adherence to IHL, and possibly influence humans' future relationship with their kind and the environment. We can do far better: failure to do so will be our collective loss.