Configuring Balanced Scorecards for Measuring Health System Performance: Evidence from 5 Years' Evaluation in Afghanistan

Anbrasi Edward and colleagues report the results of a balanced scorecard performance system used to examine 29 key performance indicators over a 5-year period in Afghanistan, between 2004 and 2008.


Introduction
Emerging from decades of conflict, in 2002, Afghanistan had some of the worst health indicators in the world, with more than 30% of its population living below the poverty line. A myriad of environmental and security constraints further exacerbate the deficits in the health system, impeding optimal service delivery. Efforts to reconstruct Afghanistan's heath infrastructure have accelerated since, with the total public health spending rising to US$280 million in 2008-2009, with most (85%) financed by external donor assistance [1]. To address the exceptionally high disease burden, the Under Five Mortality Rate estimated to be 257 deaths per 1,000 live births and the Maternal Mortality Ratio estimated at 1,600 deaths per 100,000 births [2][3][4][5], the Ministry of Public Health (MOPH) designed a Basic Package of Health Services (BPHS) delivered primarily through contracting mechanisms with nongovernmental organization (NGO) and MOPH implementing agencies [6]. To measure the performance of the BPHS, in 2004, the MOPH initiated the National Health Service Performance Assessments (NHSPA), which has been conducted annually in every accessible province.
In the current era where there is increasing demand for improved governance and accountability, policy makers seek comprehensive performance measures that illustrate evidence of health systems strengthening innovations on service delivery and health outcomes [7]. Among the conceptual frameworks employed for measuring organizational performance, the balanced scorecard (BSC), developed for industry by Kaplan and Norton, has steadily gained momentum as a popular strategic management tool in the health sector [8][9][10][11][12]. In comparison to the traditional performance metrics that measure health outcomes, the scorecard offers an integrated measurement and management system, that links the mission and policy of the organization through strategic mapping of multiple performance domains facilitating benchmarking and fostering a culture of accountability [11,13].
Successes of scorecard integration have been reported in the Dutch and Italian public health care systems, but aside from two recent publications in a hospital in Pakistan and a district level program in temporary health clinics to accommodate displaced populations, their application in national public health systems in low-income countries is limited [14][15][16][17]. In 2004, Afghanistan pioneered a BSC integrating the health sector's vision and strategy to measure performance of service capacity and delivery on the basis of the standards instituted in the BPHS, which prioritizes services to address the major disease burden in vulnerable population segments (Box 1). The traditional BSC quadrants were modified to include six domains with 29 core performance indicators, designed by a multidisciplinary team of government, donor, and NGO stakeholders. The ''Patient and Community Perspectives domain'' focused on patient satisfaction and the engagement of community councils (shura-e-sehies). Workforce capacity, management, and satisfaction was measured by the ''Staff domain'' followed by a set of indicators examining system preparedness based on BPHS standards for staffing, equipment, essential commodities, and infrastructure in the ''Capacity for Service Provision domain.'' Service provision and clinical quality of care was measured through the ''Service Provision domain.'' The ''Financial Systems domain'' included aspects regarding user fees. This domain was eliminated in 2008, following the discontinuation of user fees after 2007. The last domain on ''Overall Vision'' measured equity factors. Earlier publications illustrated the process undertaken for scorecard design and some evidence of management decision-making using scorecards results [13,18]. The purpose of this study is to illustrate the performance trends in delivering the BPHS during the first 5-y period following elections in 2004, and to reflect on the potential and limitations of the scorecard as a performance management tool to measure and improve health service delivery in similar health care contexts.

Methods
The BSC was designed to integrate key performance indicators from the NHSPA conducted annually between 2004 and 2008. Employing a multistage stratified random sampling every year, a sample of 25 facilities were selected from each province, according to the following distribution by facility type: three district hospitals; seven comprehensive health centers; 15 basic health centers. However, the sampling frame varied each year as (a) new facilities were being constructed; (b) some provinces had fewer functional facilities than the proposed sample and therefore all functional facilities were included; and (c) adverse security incidents prevented surveyors from accessing sampled health facilities in some provinces. Therefore up to 25 facilities were included in each province. Weighting was applied at the provincial level on the basis of the total number of health facilities included at the national level, and subsequently national scores were computed on the basis of the weighted scores. The total national sample included in the scorecard represented approximately 50% of the functional health facilities in Afghanistan. The inclusion of subcenters and mobile clinics, subsequently, has necessitated the sampling of these facilities in the national assessments in 2009. However these peripheral facilities were not included in the national performance assessments between 2004 and 2008.
In this study, we included data from all 28 provinces that were surveyed annually between 2004 and 2008. Six of the 34 provinces in Afghanistan were not included in the trend analysis, as Daikundi province did not have functional facilities in 2004, and Helmand, Kandahar, Zabul, and Uruzgan were excluded after 2004 and subsequently Farah, in 2008 as they were inaccessible due to worsening security.
A five-member survey team conducted the assessments in the selected facilities in each province. In each facility, observations of patient consultations were conducted on five children under 5 y, and five patients above 5 y, selected by systematic random sampling using a sampling interval on the basis of utilization rates, resulting in an annual national sample of approximately 5,000 patient observations, 5,000 exit interviews with patients or Box 1. Seven Elements of BPHS caretakers of children under 5 y, and 1,500 health provider interviews. The assessment of provider knowledge was modified in 2008, with the introduction of case vignettes to determine the provider's ability to diagnose and treat conditions on the basis of presenting symptoms. Response rate was .99% for providers and patients interviewed.
Each of the 29 BSC indicators was converted to a percentage score ranging from 0 to 100. An unadjusted mean score for each province and indicator was first calculated by averaging the performance of the sampled facilities in that province and weighted to account for the different proportion of facility types in the total national sample. A national score for each indicator was calculated as a median of weighted provincial mean scores. Provincial performance in 2004 was applied to set the benchmarks and provinces were categorized into quintile groups on the basis of performance with the top and bottom quintiles illustrating upper and lower benchmarks. Fourteen of the 29 indicators were indices, created from an aggregate set of performance indicators.
Wilcoxon matched-pair signed-rank test was applied to test the difference in the scores between 2004 and 2008, using STATA 10 (Stata Corp). We used generalized estimating equation (GEE) with robust standard errors using time as the predictor to assess the linear changes for each indicator over the 5-y period, which accounts for the correlation between repeated observations at the provincial level [19].
The study was considered exempt research according to the Johns Hopkins University Policy for Human Subjects Research and had approval from the MOPH in Afghanistan. A detailed description of the sampling methodology, survey procedures, scoring system, costs, and data quality measures undertaken in the NHSPA have been reported previously [13,18]. This paper analyzes performance trends in BPHS service delivery from the 28 provinces (of 34) that were included in all years between 2004 and 2008 (Table 1).

Results
The national median performance score for the indicators included in each domain of the scorecard between 2004 and 2008 is illustrated in Table 2. Twenty of these indicators showed significant improvement (p,0.05) since 2004. Domains of service capacity and delivery, which scored poorly (,50 median score) in 2004, had more dramatic improvements in performance over the years than the domain for patient and community satisfaction that scored high initially (.65).

Patient and Community Domain
Overall, the patient and community domain illustrated significant improvement from baseline to 2008 (65.3%-84.5% p,0.0001). The substantial increases in this domain were primarily driven by the improvements in the engagement of village councils (shura-e-sehie) in the management oversight of health facilities measured by meeting records at the health facilities (,35% in 2004 to 95% in 2008). Client satisfaction and perception of quality, critical determinants to service utilization, were rated high (.70%) in all survey years. The annual increase for patient perception of quality and village council activities, estimated by GEE was significant (p = 0.02 and p,0.0001) ( Table 3), although the size of the actual increases in patientperceived quality was minimal, over the years.

Staff Perspectives and Satisfaction Domain
Provider satisfaction included multiple measures of various health system attributes and support factors including training, professional development opportunities, availability of essential equipment and medicines according to BPHS guidelines, and working relationships with various stakeholders. There was a slight, but significant, increase in this index indicator (+6%) to about 70% in 2008. In 2008, median scores for providers reporting timely salary payments increased by 20% from baseline levels. Financial incentives and regular payments are important contributory factors for provider motivation and retention, and the finding suggests improved management of implementing agencies.

Capacity for Service Provision
Compared with other domains, the most dramatic improvements were evident in the capacity for service provision domain (47.4-76.4, p,0.0001). All system preparedness factors for service provision; essential medicines, equipment, clinical guidelines, laboratory functionality, health provider adequacy, skill mix, and provider knowledge and training showed significant improvements since 2004. The evidence illustrates the investment efforts to adhere to BPHS standards to ensure capacity for service provision, a critical prerequisite for ensuring optimal quality of care and health outcomes. The annual progressive increase estimated by GEE was significant for all indicators in this domain (Table 3).

Service Provision
Significant improvements were evident for adherence to clinical standards for patient history and physical exam as well as counseling, although the latter was still at suboptimal levels in 2008 (48% national median score). There was a progressive increase in average new patient visits per month, which reached peak levels (median score of 85) in 2008. Scores were significantly higher for provision of both antenatal (+36%, p,0.0001) and delivery care services (+46%, p,0.0001) from baseline levels. Despite these improvements, average patient consultation time diminished over the years, which may have resulted from higher levels of service utilization following the removal of user fees. Regression modeling demonstrated a strong statistical association between the capacity for service provision (coefficient 0.45, p,0.0001), with quality of service provision in 2008. Applying the GEE modeling, there was a significant increase (p,0.0001) in adherence to standards for patient history taking and patient counseling during the 5-y period.

Financial Systems
Modest improvements were evident in the existence of patient user fee guidelines (+6.5% by 2007). All sampled facilities provided user fee exemptions for poor patients, raising the national median score to 100 in 2007, a 17.5% (p,0.0001) increase from baseline levels. Discussions are underway to reconfigure this domain to include other characteristics of health facility and hospital financial management.

Overall Vision for Pro-poor and Pro-female Services
One of the primary goals of the BPHS was to ensure equity of service delivery to the poor and optimize utilization of females for outpatient services. A concentration index of 50 would indicate no differences in service utilization, and a higher index would indicate higher utilization of females or poorer patients. This objective was partially achieved, as female patients increasingly constituted a higher proportion of new outpatient visits (+4.9%, p,0.0001), but there was minimal improvement in scores for equity of utilization or patient satisfaction as measured through concentration indices of users over the years.

Discussion
Despite the inherent health environment challenges and worsening security, the government of Afghanistan has made impressive gains in strengthening the capacity of the health system to improve service delivery, as evidenced by the increasing trends in various domains. Since its inception in 2004, the scorecard has been adopted by the MOPH, as a key performance management tool, demonstrating effective stewardship, to illustrate evidence of investments, and facilitate policy changes. Furthermore, it has created a learning organization, enhancing a more evidence-based decision-making culture [18,20]. The following sections provide some evidence on the management utility and limitations of the BSC to manage health service delivery and some recommendations for optimizing the tool as a comprehensive health system performance measure.

Leadership and Transparency
Lack of leadership support has often been cited as a barrier to successful scorecard implementation [10]. However, in the Afghanistan context, the support of the executive leadership in policy and planning was a key enabling factor for the significant achievements in national performance scores. The BSC served as a strategic management tool to ensure capacity, identify factors contributing to performance deficiencies, and determine policy and resource innovations to promote good governance [15]. Despite some initial skepticism and reluctance to adopt the tool, over the years there have been several ''champions'' who leveraged the interest and support of other investors in the health sector to sustain the application of the BSC. In a postconflict environment, heavily reliant on external donor assistance, the BSC has been instrumental in enhancing transparency in the decisionmaking process, creating a culture of accountability by policy leaders to manage performance-based contracts. Improved performance trends helped demonstrate positive results to the public to rebuild trust in the government health care system [13,18].

Evidence of Improved Health Service Capacity and Delivery
As a performance assessment tool, the BSC has provided a comprehensive framework of multiple service delivery and system elements, illustrating impressive improvements to achieve the BPHS standards for optimal service provision. The trends provide the evidence of investments to enhance health infrastructure, staffing and resource capacity, and subsequent impact on service provision. The customer domains of patient and provider satisfaction, and the financial domain (through 2007) also showed significant improvements during the 5 y. However, efforts must be made to motivate providers to improve counseling quality by providing appropriate instructions to patients on home care, conditions for immediate return to facility, and correct administration of prescriptions. Time spent on patient consultation and counseling, a significant predictor of quality of care [18,21], showed a decreasing trend over the years with levels returning to baseline scores in 2008. Without additional health system level resource investments complemented by patient triaging, improving counseling quality will continue to be a challenge.
We believe that the apparent dramatic improvements for some indicators and minimal changes in others over the 5-y period have three plausible reasons: (1) The median performance score for service capacity and service provision domains was at much lower levels in 2004, (47 for service capacity and 40 for service provision) than the scores for overall patient satisfaction (.80 in 2004), patient perception of quality (.75% in 2004). Apparently the improvements were more dramatic for domains that had low scores in 2004, than those that were at higher levels. (2) The presentation of the scorecard illustrating poorly performing domains and the ongoing efforts of donor priorities to establish health infrastructure and resource capacity resulted in additional performance improvement strategies to improve the scores for these indicators. (3) The possibility of a courtesy bias resulting in high levels of patient satisfaction over the years has also been hypothesized in a later section.

BSC Facilitated Performance Benchmarking
In addition to enhancing improvements in service capacity and delivery, the BSC facilitated historical and provincial benchmarking and determining the effectiveness of various contracting mechanisms [20]. Benchmarking performance has been demonstrated to be an effective strategy, to foster healthy competition and create a culture of evidence to determine factors of success or failure, particularly in decentralized health systems, and is still considered a relevant strategy for the Afghan context, despite the provincial variations in the health care environment particularly security and geographic access [22,23].

Remaining Challenges
Despite the progress on health system performance, Afghanistan continues to face challenges to ensure equitable service delivery. Service utilization for female patients is still dependent on the availability of female providers. Though the acceleration of the midwifery training programs has tried to meet the increasing demand for female providers, the data from the national assessments does not provide information on the national health workforce because of the sampling strategy employed. The human resource database indicated that female providers currently constitute only 24% of the workforce, with considerable variation across provinces. The overall proportion of health provider shortage estimated at 39% (based on BPHS standards) is still a severe deficit, compounded by worsening security [1]. Poor provider motivation can lead to suboptimal quality due to nonadherence to clinical guidelines, harmful practices, and poor provider-patient interactions [24]. Results-based financing mechanisms, similar to those implemented in Rwanda [25], have been introduced recently to incentivize providers to enhance motivation and performance. Though external assistance and public health spending has increased substantially in the last few years, better targeting of services is critical to ensure equity.

Limitations in the Design and Execution of the BSC
It is also important to highlight some of the limitations in scorecard design and execution for policy makers to debate the utility of the Afghanistan scorecard tool to measure health service delivery. Aside from the classic Hawthorne bias of performance improvements due to the fact that providers are aware of being observed, the quality of care assessments did not have gold standards by which the accuracy of diagnosis or treatment could be ascertained. High levels of patient satisfaction in this context may also be attributed to a courtesy bias, as patients with limited physical and financial access to services, may fear the consequences of negative feedback, resulting in termination of services [26]. Leveraging the support of village health councils to ensure community feedback may alleviate some of this fear and enhance community trust of the health system. Another limitation to the scorecard design is the lack of measures to assess the impact of investments at the population level, as currently the assessments are organized around health facilities and the users of services. Although the BSC was not designed to measure important health impact like mortality rates or to evaluate the causal relationships that link to such measures of population health, future strategies will try to link household surveys to the facility-based assessments to examine causal associations between facility interventions and observed changes in population-based outcomes. These could include effectiveness and efficiency measures to demonstrate the overall performance of the health system.
To ensure equitable access to services, the establishment of subcenters, mobile clinics, and health posts has accelerated in recent years with more than 22,000 community health workers trained and deployed nationally to provide health promotion and first level of care in rural communities. Efforts are underway to reconfigure the scorecards to integrate community provider performance measures and assess investments made to the base of the health system pyramid to demonstrate results as communities and patients engage to become coproducers of health [27,28].
A commonly cited barrier of scorecard implementation is the progression from a measurement tool to a performance management tool for continuous quality improvement to ensure social accountability and balance other financial considerations [29]. It is hypothesized that these quality improvement strategies may offset some of the costs of BSC reconfigurations necessitated by the changing health care environment [9]. The national quality assurance committee in Afghanistan, has been instrumental, in prioritizing performance deficiencies identified from routinely obtained health management information and supervision systems, as well as the NHSPA, in order to identify and address management constraints [30]. We recognize the importance of rigorous survey designs to assign cause and effect to improvements instituted as it provides a record of strategic inflection points when policies changes and resources were appropriated to improve systems and performance. Deficient performance such as inadequate provider knowledge or stock out resulted in improved efforts for refresher training and ensuring adequate stocks of essential drugs and review of NGO performance when issuing new contracts [18]. Cascading Scorecard Results to Frontline Providers Translating the scorecard strategy into easily comprehensible performance measures to demystify the scorecard at the peripheral levels is important for the frontline providers to comprehend the value of the measurement strategy [11]. In Afghanistan, dissemination of scorecard results was primarily targeted to the policy leaders and contracting and implementing agencies responsible for the oversight of the health system at the national and provincial level, but there was minimal evidence on its use at the facility level. The importance of packaging and cascading the scorecard results to the frontline constituents cannot be undermined as this is critical for creating autonomy in the peripheral units to address context specific performance deficiencies through rapid quality improvement cycle efforts [9,11].

Impediments in the Health Care Environment
The lack of explicit systems and policies, minimal infrastructure, high corruption levels, low literacy rates, and resistance to measurement, identified as barriers for successful scorecard execution, are also inherent in Afghanistan's health care environment [31]. However, the major challenge will be to sustain the momentum of BSC performance assessments in the context of worsening security, and therefore the long-term effectiveness of the current scorecard system warrants further scrutiny, particularly if donor assistance diminishes in the future. Further analysis applying multivariate statistical modeling controlling for confounders will be required to determine the major contributors for improving health service delivery in this context [32].

Future Scorecard Reconfigurations
Transitions in the health system landscape of leadership and political priorities, changes in the health system architecture of financing and resource structures, guidelines and policies, epidemiological transitions, and security constraints will necessitate periodic reconfigurations to ensure the scorecard's relevance and effectiveness. The Ministry and its stakeholders are considering the integration of indicators to determine effectiveness of financial management operations and performance to replace the user fee indicators formerly in the financial management domain. As all provinces have achieved the 2004 performance benchmarks, new targets will be required in the future, as well as the inclusion of indicators for emerging health priorities to modify the scorecard.
The costs for conducting the annual national performance assessment to generate the scorecards are estimated at 2.5% of the overall cost of delivering the BPHS. As the government becomes more self-reliant, the scorecard may be scaled down to a few key performance indicators that can be derived from data obtained from health information systems supplemented by routine rapid facility assessments to minimize survey costs.

Enhanced Organizational Learning Promoting a Culture of Measurement
Few national performance measures were available when the BSC was first initiated for the primary health care sector. As the health system matured, several parallel measurement systems were standardized using the BSC prototype, including scorecards to determine performance of the essential package of hospital services and more recently the ongoing routine assessments for the health management information systems. The scorecard enables a single organizational measure, and reduces the burden of managing fragmented performance assessments focused on few health service delivery elements. All surveys and measures have been conducted at the national level to include all secure provinces addressing vertical scale-up and assessment mechanisms for inclusion of insecure provinces are in consideration.

Conclusion
Emerging from decades of war and continued insecurity, Afghanistan has successfully pioneered the integration of the BSC at the national and provincial levels, to improve the delivery of basic health services. The BSC provides a dashboard of indicators integrating the building blocks of health systems, promoting effective governance and leadership, facilitating good stewardship and managerial autonomy for timely decisions on policy and service delivery [33]. Despite the promising results so far, the successful execution of the BSC will depend on its adaptive ability and sustained efforts of the MOPH leadership to accommodate dynamic and complex changes in the health care environment. As the global momentum for health systems strengthening accelerates, the balanced scorecard offers a promising measure of comprehensive health system performance to examine the effectiveness of strategies and innovations executed, particularly in resource constrained environments.

Editors' Summary
Background. Traditionally, the performance of a health system (the complete network of health care agencies, facilities, and providers in a defined geographical region) has been measured in terms of health outcomes: how many people have been treated, how many got better, and how many died. But, nowadays, with increased demand for improved governance and accountability, policy makers are seeking comprehensive performance measures that show in detail how innovations designed to strengthen health systems are affecting service delivery and health outcomes. One such performance measure is the ''balanced scorecard,'' an integrated management and measurement tool that enables organizations to clarify their vision and strategy and translate them into action. The balanced scorecardessentially a list of key performance indicators and performance benchmarks in several domains-was originally developed for industry but is now becoming a popular strategic management tool in the health sector. For example, balanced scorecards have been successfully integrated into the Dutch and Italian public health care systems.

Why Was This Study Done?
Little is known about the use of balanced scorecards in the national public health care systems of developing countries but the introduction of performance management into health system reform in fragile states in particular (developing countries where the state fails to perform the fundamental functions necessary to meet its citizens' basic needs and expectations) could help to promote governance and leadership, and facilitate essential policy changes. One fragile state that has introduced the balanced scorecard system for public health care management is Afghanistan, which emerged from decades of conflict in 2002 with some of the world's worst health indicators. To deal with an extremely high burden of disease, the Ministry of Public Health (MOPH) designed a Basic Package of Health Services (BPHS), which is delivered by nongovernmental organizations and MOPH agencies. In 2004, the MOPH introduced the National Health Service Performance Assessment (NHSPA), an annual country-wide assessment of service provision and patient satisfaction and pioneered a balanced scorecard, which uses data collected in the NHSPA, to manage the delivery of primary health care services. In this study, the researchers examine the trends between 2004 and 2008 of the 29 key performance indicators in six domains included in this balanced scorecard, and consider the potential and limitations of the scorecard as a management tool to measure and improve health service delivery in Afghanistan and other similar countries.
What Did the Researchers Do and Find? Each year of the study, a random sample of 25 facilities (district hospitals and comprehensive and basic health centers) in 28 of Afghanistan's 34 provinces was chosen (one province did not have functional facilities in 2004 and the other five missing provinces were inaccessible because of ongoing conflicts). NHSPA surveyors collected approximately 5,000 patient observations, 5,000 exit interviews with patients or their caregivers, and 1,500 health provider interviews by observing consultations involving five children under 5 years old and five patients over 5 years old in each facility. The researchers then used this information to evaluate the key performance indicators in the balanced scorecard and a statistical method called generalized estimating equation modeling to assess trends in these indicators. They report that there was a progressive improvement in national average scores in all six domains (patients and community satisfaction with services, provider satisfaction, capacity for service provision, quality of services, overall vision for propoor and pro-female health services, and financial systems) between 2004 and 2008.
What Do These Findings Mean? These findings suggest that the balanced scorecard was successfully used to improve health system capacity and service delivery through performance benchmarking over the 5-year study period. Importantly, the use of the balanced scorecard helped to show the effects of investments, facilitate policy change, and create a more evidence-based decision-making culture in Afghanistan's primary health care system. However, the researchers warn that the continuing success of the balanced scorecard in Afghanistan will depend on its ability to accommodate changes in health systems policy. Furthermore, reconfigurations of the scorecard are needed to include measures of the overall effectiveness and efficiency of the health system such as mortality rates. More generally, the researchers conclude that the balanced scorecard offers a promising measure of health system performance that could be used to examine the effectiveness of health care strategies and innovations in other fragile and developing countries.