What is meant by validity in maternal and newborn health measurement? A conceptual framework for understanding indicator validation

Background Rigorous monitoring supports progress in achieving maternal and newborn mortality and morbidity reductions. Recent work to strengthen measurement for maternal and newborn health highlights the existence of a large number of indicators being used for this purpose. The definitions and data sources used to produce indicator estimates vary and challenges exist with completeness, accuracy, transparency, and timeliness of data. The objective of this study is to create a conceptual overview of how indicator validity is defined and understood by those who develop and use maternal and newborn health indicators. Methods A conceptual framework of validity was developed using mixed methods. We were guided by principles for conceptual frameworks and by a review of the literature and key maternal and newborn health indicator guidance documents. We also conducted qualitative semi-structured interviews with 32 key informants chosen through purposive sampling. Results We categorised indicator validity into three main types: criterion, convergent, and construct. Criterion or diagnostic validity, comparing a measure with a gold standard, has predominantly been used to assess indicators of care coverage and content. Studies assessing convergent validity quantify the extent to which two or more indicator measurement approaches, none of which is a gold-standard, relate. Key informants considered construct validity, or the accuracy of the operationalisation of a concept or phenomenon, a critical part of the overall assessment of indicator validity. Conclusion Given concerns about the large number of maternal and newborn health indicators currently in use, a more consistent understanding of validity can help guide prioritization of key indicators and inform development of new indicators. All three types of validity are relevant for evaluating the performance of maternal and newborn health indicators. We highlight the need to establish a common language and understanding of indicator validity among the various global and local stakeholders working within maternal and newborn health.


Background
Rigorous monitoring supports progress in achieving maternal and newborn mortality and morbidity reductions. Recent work to strengthen measurement for maternal and newborn health highlights the existence of a large number of indicators being used for this purpose. The definitions and data sources used to produce indicator estimates vary and challenges exist with completeness, accuracy, transparency, and timeliness of data. This paper presents a conceptual overview on how indicator validity is defined and understood by those who develop and use maternal and newborn health indicators.

Methods
A conceptual framework of validity was developed using mixed methods. We were guided by principles for conceptual frameworks and by a review of the literature and key maternal and newborn health indicator guidance documents. We also conducted qualitative semi-structured interviews with 32 key informants chosen through purposive sampling.

Results
We categorised indicator validity into three main types: criterion, convergent, and construct. Criterion or diagnostic validity, comparing a measure with a gold standard, has predominantly been used to assess indicators of care coverage and content. Studies assessing convergent validity quantify the extent to which two or more indicator measurement approaches, none of which is a gold-standard, relate. Key informants considered construct validity, or the accuracy of the operationalisation of a concept or phenomenon, a critical part of the overall assessment of indicator validity.

Conclusion
Given concerns about the large number of maternal and newborn health indicators currently in use, a more consistent understanding of validity can help guide prioritization of key indicators and inform development of new indicators. All three types of validity are relevant for evaluating the performance of maternal and newborn health indicators. We highlight the need to establish a common language and understanding of indicator validity among the various global and local stakeholders working within maternal and newborn health.

39
A conceptual framework of validity was developed using mixed methods. We were guided by principles 40 for conceptual frameworks and by a review of the literature and key maternal and newborn health 41 indicator guidance documents. We also conducted qualitative semi-structured interviews with 32 key 42 informants chosen through purposive sampling.

45
We categorised indicator validity into three main types: criterion, convergent, and construct. Criterion         Table). These types broadly map onto the social  The percentage of individuals with the outcome/characteristic of interest who were correctly classified as such. Specificity The percentage of individuals without the outcome/characteristic of interest correctly classified as such. Percent agreement or Accuracy The percentage of individuals who were correctly classified, i.e. for whom the outcome/characteristic of interest being measured is a match to the gold-standard comparison.

Positive predictive value
The probability that an individual who reported having an outcome/characteristic of interest truly had it. Negative predictive value The probability that an individual who did not report having an outcome/characteristic of interest truly did not have it. Area under the receiver operating characteristic curve (AUC) Plot of sensitivity versus 1-specificity. A value of 1 means a perfect match, 0.5 a random guess. For binary measures, this is the average of sensitivity and specificity. Other, less commonly used, measures include likelihood ratio[31] and efficiency [30].

Population-level validity
Inflation factor (IF) or Test to Actual Positive (TAP) ratio Ratio of the population prevalence based on the measure being assessed in comparison with the true prevalence based on the gold standard. [43] This measure expresses the extent to which the true population prevalence of the indicator is under-or overestimated, given the sensitivity and specificity of the measure under consideration and the true population prevalence. It is possible for an indicator to show low individual-level accuracy but good population-level accuracy.

236
Some of the limitations of these predominantly facility-based criterion validity studies include limited     emphasized that the process of assessing whether an indicator is "valid" should start with an 304 understanding of not only the construct or phenomenon an indicator intends to measure, but also for 305 whom and why. This includes a consideration of whether the underlying phenomenon itself is 306 meaningful, that is, whether its purpose is important to maternal and newborn health and clearly 307 understood by all stakeholders (S1 Table). Yet, despite the importance attributed to construct validity

356
There is a growing concern with the large number of maternal and newborn health indicators used

378
We also acknowledge that while the key informants included measurement experts and authors of many 379 of the recently conducted validation studies on maternal and newborn health indicators within the 380 maternal and newborn health field, our sample of key informants included only English-speaking 381 respondents working predominantly at the global level and did not include many country-level experts 382 and stakeholders. We did not aim to summarize the findings of all validation studies for individual 383 indicators; however, such systematic reviews and meta-analyses could be a useful next step for 384 summarising the available evidence.

387
Indicator validation is a part of a continuous process of building and synthesising evidence on indicator 388 performance. We found that in the maternal and newborn health literature and among measurement 389 experts, the term validity is used broadly to capture a variety of indicator performance assessments.

390
Some of the current challenges related to harmonization and coordination of maternal and newborn health indicators stem from a heterogeneity of definitions of indicator validity, often by stakeholders 392 from various disciplinary backgrounds. We recommend that the language used to describe validation 393 research should be more precise as to the specific type(s) of validation assessed and the related 394 findings (e.g. an indicator described as "valid" or "validated" should be nuanced and time-and context-395 specific).

397
In addition to the three most common types of maternal and newborn health indicator validity identified,

398
we highlight the fact that any appraisal of an indicator's validity requires clarity about the construct that 399 the indicator is intending to measure. We therefore recommend that future initiatives to coordinate