Development of Standard Indicators to Assess Use of Electronic Health Record Systems Implemented in Low- and Medium-Income Countries

Background Electronic Health Record Systems (EHRs) are being rolled out nationally in many low- and middle-income countries (LMICs) yet assessing actual system usage remains a challenge. We employed a nominal group technique (NGT) process to systematically develop high-quality indicators for evaluating actual usage of EHRs in LMICs. Methods An initial set of 14 candidate indicators were developed by the study team adapting the HIV Monitoring, Evaluation, and Reporting indicators format. A multidisciplinary team of 10 experts was convened in a two-day NGT workshop in Kenya to systematically evaluate, rate (using Specic, Measurable, Achievable, Relevant, and Time-Bound (SMART) criteria), prioritize, rene, and identify new indicators. NGT steps included introduction to candidate indicators, silent indicator ranking, round-robin indicator rating, and silent generation of new indicators. Results: Candidate indicators were rated highly on SMART criteria (4.05/5). NGT participants settled on 15 nal indicators, categorized as system use (4); data quality (3), system interoperability (3), and reporting (5). Data entry statistics, systems uptime, and EHRs variable concordance indicators were rated highest. This study describes a systematic approach to develop and validate quality indicators for determining use and provides LMICs with a multidimensional assessing IS

satisfaction, and system use" (26)(27)(28)(29)(30)(31)(32)(33)). Among numerous IS success frameworks and models, "system use" is considered an important measure in evaluating IS success; IS usage being "the utilization of information technology (IT) within users' processes either individually, or within groups or organizations" (28,30). There are several proposed measures for system use, such as frequency of use, extent of use, and number of system accesses, but these tend to differ between models. The system use measures are either selfreported (subjective) or computer-recorded (objective) (21,28,29).
There is compelling evidence that IS success models need to be carefully speci ed for a given context (33). EHRs implementations within LMICs have unique considerations, hence system use measures need to be de ned in a way to ensure that they are relevant, meet the EHRs monitoring needs, while not being too burdensome to accurately collect. Carefully developed EHRs use indicators and metrics are needed to regularly monitor the status of the EHRs implementations, in order to identify and rectify challenges to advance effective use. A common set of EHRs indicators and metrics would allow for standardized aggregation of performance of implementations across locations and countries. This is similar to the systems currently in use for monitoring the success of HIV care and treatment through a standard set of HIV Monitoring, Evaluation and Reporting (MER) indicators (34).
All care settings providing HIV care through the PEPFAR program and across all countries are required to report the HIV indicators per the MER indicator de nitions. An approach that develops EHRs indicators along the same lines and format as HIV MER indicators assures that the developed EHRs system use indicators are in a format well-familiar to most care settings within LMICs. This approach reduces the learning curve to understanding and applying the developed indicators. In this paper, we present development and validation of a detailed set of EHRs use indicators that follows the HIV MER format, using nominal group technique (NGT) and group validation technique. This was developed for Kenya, however, it is applicable to LMICs and similar contexts.

Identi cation of Candidate Set of EHRs Use Indicators
Using desk review, literature review, and discussions with subject matter experts, the study team (PN, MW, JK, XS, AB) identi ed an initial set of 14 candidate indicators for EHRs use (36)(37)(38) The candidate set of indicators were structured around four main thematic areas, namely: system use, data quality, interoperability, and reporting. System use and data quality dimensions broadly re ect IS system use aspects contained in the DeLone and McLean IS success model, while interoperability and reporting dimensions enhance system availability and use (38). The focus was to come up with practical indicators that were speci c, measurable, achievable, relevant, and time-bound (SMART) (39). This would allow the developed indicators to be collected easily, reliably, accurately, and in a timely fashion within the resource constraints of clinical settings where the information systems are implemented.
Each of the 14 candidate indicators was developed to clearly outline the description of the indicator, the data elements constituting the numerator and denominator, how the indicator data should be collected, and what data sources would be used for the indicator. These details for the indicators were developed using a template adapted from the HIV MER 2.0 indicator reference guide, given that information systems users in most of these implementation settings were already familiar with this template (Appendix A) (34). Nevertheless, it will require short training time for those unfamiliar due the simplicity of the format.

Nominal Group Technique (ngt)
NGT is a ranking method that enables a controlled group of nine or 10 subject matter experts to generate and prioritize a large number of issues within a structure that gives the participants an equal voice (40). The NGT involves several steps, namely: 1) silent, written generation of responses to a speci c question, 2) round-robin recording of ideas, 3) serial discussion for clari cation and, 4) voting on item importance. It allows for equal participation of members, and generates data that is quantitative, objective, and prioritized (41,42).
Nominal group technique (NGT) was used in the study to reach consensus on the nal set of indicators for monitoring EHRs use.

Ngt Participants
Indicator development requires consultation with broad-range of subject matter experts with knowledge of the development, implementation, and use of EHRs. With guidance from Kenya Ministry of Health (MoH), a heterogeneous group of 10 experts was invited for a two-day workshop led by two of the researchers (M.W. and P.N.) and a qualitative researcher (V.N.). Inclusion in the NGT team was based on the ability of the NGT participant to inform the conversation around EHRs usage metrics and indicators, with an emphasis on assuring that multiple perspectives were represented in the deliberations. The NGT participants included: the researchers acting as facilitators; a qualitative researcher; MoH representatives from the Division of Health Informatics and M&E; System Development Partners (SDPs) representative; healthcare facilities clinical services representatives; CDC funding agency representative; and representatives from the EHRs implementing partners (Palladium and International Training and Education Center for Health (I-TECH)), who have been involved in the EHRs implementations and who selected sites for EHRs implementations (44,45). The study participants were consenting adults, and participation in the group discussion was voluntary. Discussions were conducted in English, with which all participants were conversant. For analysis and reporting purposes, demographic data and roles of participants were collected, but no personal identi ers were captured. The study was approved by the Institutional Review and Ethics Committee at Moi University, Eldoret (MU/MTRH-IREC approval Number FAN:0003348).

Nominal Group Technique (ngt) Process
The NGT exercise was conducted on April 8-9, 2019, in Naivasha, Kenya. After providing informed consent, the NGT participants were informed about the purpose of the session through a central theme question: "How can we determine the actual use of EHRs implemented in our healthcare facilities?" Participants were given an overview on the NGT methodology and how it has been used in the past. Given that candidate indicators had already been de ned in a separate process, we did not include the rst stage of silent generation of ideas. Ten NGT participants (excluding research team members) evaluated the candidate indicators on quality using the SMART criteria on a 5-point Likert scale rating on each of the ve quality components. The NGT exercise was conducted using the following ve speci c steps: Step 1: Clari cation of indicators For each of the 14 candidate indicators, the facilitator took ve minutes to introduce and clarify details of the candidate indicator to ensure all participants understood what each indicator was meant to measure and how it would be generated. Where needed, participants asked questions and facilitators provided clari cations.
Step 2: Silent indicator rating The participants were given 10 minutes per indicator and were asked to: (1) individually and anonymously rate each candidate indicator on each of the SMART dimensions using a 5-point Likert scale for each dimension where 1 = Very Low, 2 = Low, 3 = Neutral, 4 = High, and 5 = Very high level of quality; (2) provide an overall rating of each indicator on a scale from 1-10, with 10 being the highest overall rating for an indicator; (3) indicate whether the indicator should be included in the nal list of indicators or removed from consideration; and (4) provide written comments on any aspect regarding the indicator and their rating process. To help with this process, a printed standardized indicator ranking form was provided (Appendix B), and the indicator details were projected on a screen.
Step 3: Round-robin recording of indicator rating Each participant in turn was asked to give their overall rating of each indicator and these were recorded on a frequency table. No discussions, questions, or comments were allowed until all the participants had given their ratings. At the end of the round-robin, each participant in turn elucidated his/her criteria for the indicator overall rating score. At this stage, open discussions, questions and comments on the indicator were allowed. The discussions were recorded verbatim. The participants were not allowed to revise their individual rating score after the discussion.
Step 4: Silent generation of new indicators After steps 2 and 3 were repeated for all 14 candidate indicators, the participants were given ten minutes to think and write down any missing indicators in line with the central theme question. The new indicator ideas were shared in a round-robin without repeating what had been shared by other participants. These new proposed indicators were written on a ip chart and discussed to ensure all participants understood and approved any new indicator suggestions. The facilitator ensured that all participants were given an opportunity to contribute. From this exercise, new indicators were generated and details de ned collectively by the team.
Step 5: Ranking and sequencing the indicators After Step 4, with exclusion of some of the original candidate indicators and addition of new ones based on team discussions, a nal list of 15 indicators was generated. Each participant was asked to individually and anonymously rank the nal list of the 15 indicators in order of importance, with rank 1 being the most important and rank 15 the least important. The participants were also asked to group the 15 indicators by the implementation priority and sequence into Phase 1 or 2. Phase 1 indicators would be those deemed as not requiring much work to collect, while Phase 2 indicators would require more human input and resources to collect.

Selection Of Final Indicators
All the individual rankings for each indicator were summed across participants and the nal list of prioritized consensus-based EHRs use indicators was derived from the rank order based on the average scores. The ranked indicator list was shared for nal discussion and approval by the full team of NGT participants. The relevant indicator reference sheets for every indicator were also updated based on discussions from the NGT exercise. No xed threshold number was used to select the indicators for inclusion. Finally, the indicator details were reviewed (including indicator de nition or how data elements are collected, and indicator calculated) as guided by the NGT session discussions, resulting in the nal consensus-based EHRs use reference sheets with details for each indicator.

Data Analysis
Descriptive statistics were computed to investigate statistical differences on the rating of the 14 candidate indicators among the participants. Chi-square test was used to determine if there were statistically signi cant differences in rating of indicators across each of the SMART dimensions. The ratings totals per SMART dimension from the crosstabs analysis output were summarized in a table (Table 1), indicating the p-value generated from the Chi-square output for each dimension. The totals include rating count and its percentage. Weighted mean for each SMART dimension across all the 14 indicators was calculated to identify how the participants rated various candidate indicators. For the nal indicator list, descriptive statistics were computed to determine the average rank score for each indicator and to assign priority numbers from the lowest average score to the highest. As such, the indicator with the lowest average score was considered the most important per the participants' consensus. All analyses were performed in SPSS version 25 (IBM, https://www.ibm.com/analytics/spss-statistics-software). The indicators were also grouped according to implementation phase number assigned by the participants (either 1 or 2) to form the implementation order phases.

SMART Criteria Rating for Candidate Indicators
The participants rated the collective set of the 14 candidate indicators highly (i.e. 4 or 5) across all the SMART dimensions (Table 1).
However, a variation in the totals across the SMART components was due to some participants' non-response in rating some of the components.  Individual Indicator Ratings Table 2 shows the participants' overall ratings for each of the 14 candidate indicators on a scale of 1 to 10, re ecting lowest to highest rating respectively. Generally, the participants rated the candidate set of indicators highly with an overall mean rating of 6.6. Data concordance and automatic reports were rated highest with a mean above 8.0. However, the participants rated the observations indicator low with a mean of 3.8, while staff system use, system uptime, and report completeness indicators were moderately rated with a mean of 4.4, 5.9, and 5.8 respectively. The individual indicator ratings and ratings against SMART criteria served as a validation metric for candidate indicators.  The NGT team reached a consensus to include all 14 candidate indicators in the nal list of indicators, and added one additional indicator, report concordance, for a total of 15 EHRs usage indicators. The nal set of indicators fell into four categories, namely ( Fig. 1 and Table 3): 1. System Use -these indicators are used to identify how actively the EHRs is being used based on the amount of data, number of staff using system, and uptime of the system. 2. Data Quality -these indicators are used to highlight proportion and timeliness of relevant clinical data entered into the EHRs. They also capture how well EHRs data captures an accurate clinical picture of the patient. 3. Interoperability -given that a major perceived role of EHRs is to improve sharing of health data, these indicators are used to measure maturity level of implemented EHRs to support interoperability. 4. Reporting -aggregation and submission of reports is a major goal of the implemented EHRs, and these indicators capture how well the EHRs are actively used to support the various reporting needs. As part of the NGT exercise, the details of each of the indicators was also re ned. Appendix C provides the detailed EHRs MER document, with agreed details for each indicator provided. In this document, we also highlight the changes that were suggested for each indicator as part of the NGT discussions.

Indicator Ranking
The score and rank procedure generated a prioritized consensus-based list of EHRs use indicators with a score of 1 (highest rated) to 15 (lowest rated). As such, a low average score Mean' meant that the particular indicator was on average rated higher by the NGT participants. Table 4 shows the ordered list of ranking for the indicators as rated by nine of the NGT participants as one participant was absent during this NGT activity. Data Entry Statistics and System Uptime indicators were considered to be the most relevant in determining EHRs usage, while Reporting Rate indicator was rated as least relevant. Indicator Implementation Sequence Nine of the 15 indicators were recommended for implementation in the rst phase of the indicator tool rollout, while the other six indicators were recommended for Phase 2 rollout ( Table 5). The implementation sequence largely aligns with the indicator priority ranking by the participants ( Table 4). The indicators proposed for Phase 1 implementation are a blend from the four indicator categories but are mostly dominated by the System Use subcategory.

Discussion
To the best of our knowledge, this is the rst set of systematically developed indicators to evaluate the actual status of EHRs usage once an implementation is in place within LMIC settings. At the completion of the modi ed NGT process, we identi ed 15 potential indicators for monitoring and evaluating status of actual EHRs use. These indicators take into consideration constraints within the LMIC's setting such as system availability, human resource constraints, and infrastructure needs. Ideally, an IS implementation is considered successful if the system is available to the users whenever and wherever it is needed for use (45). Clear measures of system availability, use, data quality, and reporting capabilities will ensure that decision makers have clear and early visibility into success and challenges facing system use. Further, the developed indicators allow for aggregation of usage indicators to evaluate performance of systems by type of system, regions, facility level, and implementing partners.
An important consideration of these indicators is the source of measure data. Most published studies on evaluating success of information system focus on IS use indicators or variables such as ease of use, frequency of use, extent of use, and ease of learning, mostly evaluated by means of self-reporting tools (questionnaires and interviews) (18,38,46). As such, the resulting data can be subjective and prone to bias. We tailored our indicators to ensure that most can be computer-generated through queries, hence incorporating objectivity into the measurement. However, a few of these indicators, such as data entry statistics as well as those on concordance (variable concordance and report concordance) derive measure data from facility records in addition to computer logs data.
Although the NGT expert panel was national, we are convinced the emerging results are of global interest. First, we developed the indicators in-line with the internationally renowned PEPFAR Monitoring, Evaluation, and Reporting (MER) indicators Reference Guide (34). Secondly, the development process was mainly based on methodological criteria that are valid everywhere (47, 48) Furthermore, the indicators are not system-speci c and hence can be used to evaluate usage of other types of EHRs, including other clinical information systems implementations like laboratory, radiology, and pharmacy systems. However, we recognize that differences exist in systems database structure; hence, the queries to determine indicator measures data from within each system will need to be customized and system-speci c. It is important to also point out that these indicators are not based on real-time measures and can be applied both for point of care and non-point of care systems.
The selected set of indicators have a high potential to determine the status of EHRs implementations considering that the study participants rated all ve SMART dimensions high (over 70%) across all the indicators. Further, the indicators reference guide provides details on "how to collect" and the sources of measure data for each indicator (Appendix C). This diminishes the level of ambiguity in regard to measurability of the indicators. Nonetheless, some of the indicators need countries to de ne their own thresholds and reporting frequencies. For instance, a country would need to de ne the length of acceptable time duration within which a clinical encounter should be entered into the EHRs for that encounter to be considered as having been entered in a timely fashion. As such, the indicator and reference guide need to be adapted for speci c country and use context.
This study has several limitations. It was based on a multidisciplinary panel of 10 experts, which is adequate for most NGT exercises, but still has a limited number of individuals who might not re ect all perspectives. On average, 5-15 participants per group are recommended depending on the nature of the study (49,50). The low ranking of Data Exchange and Standardized Terminologies indicators indicate that the participants might have limited knowledge or appreciation of certain domains and their role in enhancing system use. Further, all participants were drawn from one country. Nevertheless, a notable strength was the incorporation of participants from more than one EHRs (KenyaEMR and IQCare systems) and a diverse set of expertise.
A next step in our research is to conduct an evaluation on actual system use status for an information system rolled-out nationally, using the developed set of indicators. We will also evaluate the real-world challenges of implementing the indicators and re ne them based on the ndings. We also anticipate sharing these indicators with a global audience for input, validation, and evaluation. We are cognizant of the fact that the indicators and reference guides are living documents and they are bound to evolve over time, given the changing nature of the IS eld and maturity of EHRs implementations.

Availability of Data and Materials
The data analyzed in this study is in the custody of the researchers and is available on request.

Competing Interests
All authors report no competing interests to declare.