Measuring Coverage in MNCH: A Validation Study Linking Population Survey Derived Coverage to Maternal, Newborn, and Child Health Care Records in Rural China

Background Accurate data on coverage of key maternal, newborn, and child health (MNCH) interventions are crucial for monitoring progress toward the Millennium Development Goals 4 and 5. Coverage estimates are primarily obtained from routine population surveys through self-reporting, the validity of which is not well understood. We aimed to examine the validity of the coverage of selected MNCH interventions in Gongcheng County, China. Method and Findings We conducted a validation study by comparing women’s self-reported coverage of MNCH interventions relating to antenatal and postnatal care, mode of delivery, and child vaccinations in a community survey with their paper- and electronic-based health care records, treating the health care records as the reference standard. Of 936 women recruited, 914 (97.6%) completed the survey. Results show that self-reported coverage of these interventions had moderate to high sensitivity (0.57 [95% confidence interval (CI): 0.50–0.63] to 0.99 [95% CI: 0.98–1.00]) and low to high specificity (0 to 0.83 [95% CI: 0.80–0.86]). Despite varying overall validity, with the area under the receiver operating characteristic curve (AUC) ranging between 0.49 [95% CI: 0.39–0.57] and 0.90 [95% CI: 0.88–0.92], bias in the coverage estimates at the population level was small to moderate, with the test to actual positive (TAP) ratio ranging between 0.8 and 1.5 for 24 of the 28 indicators examined. Our ability to accurately estimate validity was affected by several caveats associated with the reference standard. Caution should be exercised when generalizing the results to other settings. Conclusions The overall validity of self-reported coverage was moderate across selected MNCH indicators. However, at the population level, self-reported coverage appears to have small to moderate degree of bias. Accuracy of the coverage was particularly high for indicators with high recorded coverage or low recorded coverage but high specificity. The study provides insights into the accuracy of self-reports based on a population survey in low- and middle-income countries. Similar studies applying an improved reference standard are warranted in the future.


Introduction
Accurate data on coverage of key maternal, newborn, and child health (MNCH) interventions are crucial for monitoring progress toward the Millennium Development Goals 4 and 5 and ending preventable child deaths in a generation [1,2]. Recognizing its significance, the Child Health Epidemiology Reference Group (CHERG) for WHO and UNICEF has made it a priority to improve coverage measurement for proven MNCH interventions. This paper is part of the PLOS Medicine ''Measuring Coverage in MNCH'' collection organized by CHERG for this purpose. Coverage estimates are generally obtained from routine population-based household surveys, such as the Demographic and Health Surveys (DHS) and the Multiple Indicator Cluster Surveys (MICS), primarily through self-reporting [3]. However, little is known about the validity of self-reported coverage derived from these population-based surveys.
Previous validation studies comparing health care records with respondents' self-reports were mostly conducted in facility-based settings in high-income countries and found varying and often moderate validity across MNCH indicators studied. For example, an Australian study comparing medical records with women's reports of delivery interventions found that women's self-reports and medical records were both subject to errors [4]. In another recent population-based survey conducted in the UK, selfreported delivery mode was found to be highly reliable [5]. Two studies done in American hospitals showed generally unsatisfactory validity of self-reported medical interventions in the pregnancy, delivery, and postnatal periods [6,7], while one study among mothers of children with cancer in Canada and the US reported moderate to high validity for similar indicators [8]. According to one other US study, parents' reports of children's immunizations were of unsatisfactory validity, mainly due to poor initial encoding of the events [9].
Results from high-income countries may have limited generalizability when applied to the low-and middle-income countries (LMICs) due to different levels of coverage, intensities of service provision counseling, and degrees of recall bias that may be associated with the education level of respondents [10,11]. However, similar studies are sparse in the LMIC setting; most of them have focused on obstetrical complications rather than routine interventions [12][13][14]. To our knowledge, the current study and other validation studies in this collection are the only ones that aim to evaluate the accuracy of coverage of MNCH interventions in LMICs [15][16][17][18].
Most of the validation studies reviewed are facility-based, and therefore subject to selection biases, because the study sample is often not representative of the general population. The fact that such a facility-based study design is widely adopted is perhaps because validation studies based on population surveys are more methodologically challenging. This is particularly true in LMICs, because health-care recordkeeping systems are rarely complete or of adequate quality to be used as the reference standard. In this study, we sought to examine the validity of self-reported coverage of selected MNCH interventions in a relatively less developed rural area in China, a setting selected to increase the extent to which the study results would be generalizable to other LMICs. In addition, we attempted to minimize selection bias associated with facility-based validation studies by collecting the study sample through a population-based survey.

Study Site
We conducted the validation study in Gongcheng County, which is located in Guangxi Province in southwestern China. The county contains nine townships and 125 villages [19], with a total population of 285,058 based on a 2010 census, of which 59% were of Yao and 39% were of Han ethnicity [20]. Among the population aged 15 and above, 1.2% were illiterate. The majority of the population were fruit farmers and engage in citrus and persimmon production. In 2006, the county GDP per capita was reported to be around $1,500 [19].
In 2006-2010, the under-five mortality rate in Gongcheng County decreased from 15.2 to 8.1 per 1,000 live births, and the infant mortality rate declined from 13.4 to 6.8 per 1,000 live births [21]. During the same period, the maternal mortality ratio was on average 31.8 per 100,000 live births [21]. Most MNCH services are provided in county-and township-level hospitals and village clinics [22]. Coverage of antenatal care and institutional delivery is close to universal in the past five years [21]. Information on MNCH services is routinely recorded by service providers in a number of booklets, including, for example, the antenatal care booklet, the maternal and child health booklet, the child care booklet, and the vaccination booklet. The antenatal and child care booklets have been in use since 2003. These booklets are usually kept by women. In January 2007, an electronic MNCH information system was launched as part of the Guangxi Province MNCH information system. The system digitized key information collected from the booklets.

Data Collection
Women aged 18 to 49 years who lived in Gongcheng County during the fieldwork and had delivered at least one live birth in the county in the five years preceding the survey (i.e., between 01 November 2006 and 01 November 2011) were eligible to participate in the study. Participants were selected via multi-stage stratified sampling with a target of interviewing mothers of 1,000 live births. The target sample size was determined based on the following consideration. The study was originally designed to also evaluate whether the validity of women's self-reports was worse based on a five-year compared to a two-year recall period. The study sample size needed to be sufficient to distinguish ten percentage point differences in validity (e.g., sensitivity) when comparing the two recall periods. Ten percent is considered to be programmatically important. Since no prior information was available on the sensitivity of the measurement of any indicators studied, 50% sensitive was assumed for the five-year recall period to yield the most conservative sample size. Based on a 10% difference, 60% sensitivity was assumed for the two-year recall. Assuming constant fertility, the number of live births born in the past two years is two-fifths of those born in the past five years. To ensure that the sample size was conservative, continuation correction was applied to improve the approximation of binomial distribution to the normal distribution [23]. With a significance level of 0.05 and 80% of power, sample size calculation for twosample comparison of proportions using Stata 10 produced a total sample size of 714 live births in the past five years and 286 live births in the past two years [24]. We assumed that coverage of 20% of the study sample cannot be validated due to the lack of the reference standard. Taking into account a 10% non-response rate, a total of 992 or approximately 1,000 live births were needed. Information on 900 live births was anticipated to be actually collected.
Study women were selected via a multi-stage stratified sampling design. In the first stage, the nine townships were divided into three strata based on the population size as a proxy of the level of economic development, and one township was selected in each stratum. In the second stage, villages were divided into four groups according to their geographic location (east, west, north, and south), and one village was sampled from each geographic group. In the third stage, participants were recruited for interview by the village doctors based on availability by going through the vaccination roster, which is considered to have enlisted all children under five years of age in the villages. Recruitment stopped once the desired sample size in each township was reached. During the recruitment process, women were also asked to bring their MNCH booklets to the community interview for abstraction.
The study period overlapped with the persimmon harvesting season, which made it difficult to recruit participants based on our original plan. However, the fieldwork happened to fall on child immunization days in two of the three sampled townships, during which young children were brought to the township hospitals for immunization and well-baby checkup. A fraction of the study sample was recruited on the immunization days at the end. As a result, children younger than two years were over-represented in the study sample.
Community-based face-to-face interviews were conducted in village centers with reasonable privacy after obtaining written informed consent. The survey instrument was adapted from the DHS and MICS questionnaires to suit the local context [25]. It was used to solicit information on coverage of selected MNCH interventions, including those routinely collected in the DHS or MICS or of local relevance. Some additional modifications were made in the wording of a number of questions, including those on child vaccination, in an effort to reduce the length of the questionnaire. The exact wording of the questions used in the survey in comparison with those used in the DHS and MICS questionnaires is provided in Table S1. The questionnaires were designed in English, translated into Chinese, and verified through back-translation. One questionnaire was administered to each eligible woman to collect information on household characteristics and her socio-demographic background. A second questionnaire was administered for each eligible live birth to collect information on services received during the antenatal, delivery, newborn, postnatal, and child care periods.
We abstracted relevant records from available booklets after completion of the interview using a structured template. We also extracted relevant records from the electronic system for all women residing in Gongcheng who delivered locally between the initiation of the electronic system (01 January 2007) and 01 October 2011. Because study women may not have all booklets available for abstraction, the electronic system was not in operation for the first few months of the study reference period, and a small set of indicators were only available in the booklets, we combined records from booklet abstraction and the electronic systems. The study reference standard was created by giving preference to the electronic system and is referred to as Gongcheng MNCH Information System (GMNCHIS). Only indicators for which information was available from GMNCHIS were included in the validation. A complete list of validated indicators can be found in Table S1.

Data Analyses
We cleaned and matched data collected through the community survey and those abstracted from the GMNCHIS. A number of databases were exported from the electronic system, including pregnant women's general information, early antenatal care, other antenatal care, high-risk pregnancy, antenatal screening tests, delivery, postnatal, and child care databases. When processing the exported data, record of results of a test or examination was treated as the evidence of the receipt of the test or examination, and the lack of such a record was treated as evidence of not receiving the service. Records in different databases were matched using the maternal and child health identification unique for each pregnancy. Record of receipt of a service in any of the databases was considered evidence of the receipt of the service.
To identify potentially duplicated records for the same pregnancy in the electronic system, we first identified records for the same woman, defined as records with either the same national identification or the same name and village, and less than four years of differences in reported age. We then identified the records for the same pregnancy, defined as records of the same woman who had first antenatal visits fewer than 30 days apart, or last menstrual periods fewer than 30 days apart, or delivery records fewer than 150 days apart so that a pregnancy loss before another pregnancy was not identified as the same pregnancy. Lastly, we collapsed all the duplicated records for the same pregnancy. A total of 15,189 unique women with 16,049 unique pregnancies for the whole county were identified in the electronic system.
To match women's recall in the community-based survey with those collected through the GMNCHIS, we first matched survey data with those exported from the electronic system using the following combinations of information: (1) maternal and child health identification and women's names, or (2) first 14 digits of the national identification which includes an area code, date of birth, women's names, and children's date of birth, or (3) women's names, village names, and children's date of birth. Then we used data abstracted from the booklets as the reference standard for those women and indicators which were not matched using data exported from the electronic system. Our unit of analysis was live birth.
We grouped the results into four categories for presentation, including antenatal care, delivery care, postnatal care, and child vaccination services. Antenatal care includes routine antenatal care, blood screening for sexually transmitted diseases, and blood screening for congenital abnormalities. During the routine antenatal care, the first ultrasound scan was done on the first antenatal visit, usually around 10-14 weeks. It provided information on gestational age and examined fetal conditions, including measuring nuchal translucency to screen for Down's syndrome. Normally, a few antenatal tests are done during each antenatal visit, including weight/height/blood pressure measurements, urine test, and fetal heart monitoring. For these repeated tests, we only measured whether they were received at least one time. Pregnant women are screened for thalassemia by a combination of mean cell volume count, erythrocyte osmotic fragility test, and hemoglobin electrophoresis [26,27].
We calculated sensitivity and specificity of the self-reported coverage. We graphed the overall validity in a receiver operating characteristic (ROC) plot with true positive rate, or sensitivity, plotted against false positive rate, or one minus specificity. We also quantified the overall validity by the area under the ROC curve (AUC) [28]. The uncertainty associated with validity, as represented by the 95% confidence interval (CI), was estimated assuming a binomial distribution.
Population-level accuracy of the coverage estimates was also examined, which is measured by the test to actual positive (TAP) ratio or the reported over-recorded coverage [29]. It can be demonstrated mathematically that the TAP ratio is determined by the validity of self-reported coverage in combination with the actual (or recorded) coverage [29]. If the recorded coverage is high, the TAP ratio approximately equals sensitivity and is independent of specificity [29]. A combination of low recorded coverage and low to moderate specificity results in a high TAP ratio [29,30]. We also investigated and discussed the complex mathematical relationship observed between the TAP ratio, validity, and recorded coverage.
For the purpose of describing the study results, we categorized coverage, sensitivity, and specificity as low, moderate, and high, applying two cut-off points at 0.33 and 0.66. We also considered the overall validity high if the AUC was at or above 0.67 and moderate and low otherwise. We qualitatively defined the population-level bias based on the TAP ratio as small (0.8,TAP ratio,1.2), moderate (0.5,TAP ratio,1.5), and large (TAP ratio,0.5 or TAP ratio.1.5). We also conducted two sensitivity analyses to treat information solely abstracted from the booklets as the reference standard and to limit the study sample to women who gave birth in the past two years.

Ethical Review
The study protocol was approved by the Institutional Review Boards of the Johns Hopkins Bloomberg School of Public Health and Peking University.

Characteristics of the Study Sample
Nine hundred and thirty-six women were recruited between 10 and 22 November 2011. Among them, 914 agreed to participate in the survey and delivered 994 eligible live births. Interviews on 961 eligible live births were completed, among whom mothers of 431, 115, 343, and 793 live births brought the antenatal, maternal and child health, child care, and vaccination booklets to the community survey, respectively. Seven hundred and twelve live births were matched using electronic information. Another 196 live births were matched using information from at least one booklet, yielding a total of 908 matched live births. The remaining 53 live births did not have any matched indicators and could not be validated.
The socio-demographic characteristics of the surveyed live births by matching status are presented in Table 1. Overall, mothers of almost 60% live births were aged between 25 and 34 years, and of more than half had secondary or higher education. Similar to Gongcheng's general population, mothers of 59% of live births were of Yao ethnicity. The majority (84%) of the live births sampled was the only one born in the past five years and 42% were under one year old. More than half of the live births lived in households with an annual income per capita ranging between 1,000 and 5,000 Yuan, or 158 and 791 dollars.
Mother's age, education, and household annual income per capita were not significantly different between the matched and unmatched live births. However, the matched live births were more likely to have been born to mothers of Yao ethnicity, to be the only live birth in the last five years, and to be younger than 24 months.

Validity of the MNCH Indicators
Reported coverage derived from the community survey, recorded coverage derived from the GMNCHIS and TAP ratio are reported in Table 2. Sensitivity, specificity, AUC, and their corresponding 95% CI are presented in Table 3. The reported coverage of the routine antenatal care indicators was high (.81%). Their recorded coverage was also high, with the exception of the first antenatal visit before 12 weeks of gestational age. Selfreported coverage of routine antenatal interventions had sensitivity close to 0.90 and specificity below 0.25. Recorded coverage of the antenatal HIV and hepatitis B antibody (HBsAb) tests was similar, although their reported coverage differed greatly. Among the postnatal care indicators, coverage of occurrence of at least one postnatal visit was reported to be higher than the recorded value, with a moderate sensitivity (0.57 [95% CI: 0.50-0.63]) and a high specificity (0.72 [95% CI: 0.68-0.76]). The rest of the postnatal indicators had high reported and recorded coverage, with moderate to high sensitivity (0.66-0.93) and low to moderate specificity (0.21-0.35). Reported and recorded coverage of vaccination was consistently high, with the exception of measles vaccine. Self-reported coverage of vaccination also had high sensitivity (.0.86) and a wide range of specificity (0.02-0.70).
The AUC estimates reported in Table 3 and the ROC plot shown in Figure 1 demonstrate the overall validity of self-reported coverage by indicator. Self-reported coverage of cesarean section had the highest overall validity when compared to the reference standard, with the AUC being 0.90 [95% CI: 0.88-0.92]. Diphtheria-pertussis-tetanus (DPT) vaccine ranked the second, with the AUC being 0.80 [95% CI: 0.75-0.84]. Self-reported coverage of thalassemia screening and measles vaccine also had high overall validity (AUC.0.69). The remaining indicators had either moderate or low overall validity, with the AUC of a number of them not significantly different from 0.5, indicating validity equivalent to a random guess.
Despite varying overall validity, the TAP ratios ranged between 0.8 and 1.5 for self-reported coverage of 24 of the 28 indicators examined, suggesting mostly small to moderate degree of bias at the population level ( Figure 2). However, it was particularly large for four indicators, including measles vaccine (TAP ratio = 2.0), first antenatal visit before 12 weeks of gestational age (TAP ratio = 2.7), screening for neural tube defect (TAP ratio = 2.7), and screening for Down's syndrome (TAP ratio = 3.2). Both sensitivity analyses for using the booklets as the reference standard and limiting the sample to women who gave birth in the last two years gave quantitatively similar results (not shown).

Discussion
To our knowledge, this is the first study to validate self-reported coverage of a range of MNCH indicators by systematically comparing women's self-reports solicited from a population-based survey of MNCH care records in a LMIC. We found that across the indicators examined, self-reported coverage had moderate to high sensitivity and low to moderate specificity. The overall validity is high for the self-reported coverage of a few indicators including cesarean section, diphtheria-pertussis-tetanus vaccine, measles vaccine and screening for thalassemia, yet moderate to low for the remaining indicators. The finding of moderate levels of overall validity is not unexpected, as similar results have been reported in previous studies in high-income countries [6,9].
The variation in validity across indicators seems to suggest that the more distinctive the experiences women had while receiving certain interventions, the better was the validity of the selfreported coverage. The positive association between event distinctiveness and recall accuracy is supported by the psychology literature [31]. The variation could also be the result of the social desirability bias associated with self-reports. That is, when women perceived that it was socially desirable to receive a certain service, they were more likely to report receipt of the service regardless of whether they had actually received it or not. An example illustrating the potential social desirability bias can be drawn from the coverage and validity of the HIV and HBsAb tests. The two tests had similar levels of recorded coverage, yet widely different levels of reported coverage. We hypothesize that women may be less willing to report receipt of an HIV test than an HBsAb test, as the former is less socially desirable.
Despite varying validity, self-reported coverage of the majority of the examined indicators had only a small to moderate degree of population-level bias. At least two reasons can perhaps explain this. The first reason is based on the mathematical relationship between bias, validity, and the recorded coverage. It can be demonstrated that when recorded coverage is high, the TAP ratio approximately equals sensitivity and is independent of specificity [29]. Because a large proportion of the indicators had high recorded coverage and high sensitivity, their TAP ratio did not deviate greatly from 1. Of note, although high coverage may have limited our power to accurately estimate specificity, the accuracy of coverage is not much affected, as specificity is almost irrelevant to population-level bias when coverage is high.
The second reason is better recall and recognition of certain interventions due to better community knowledge associated with high coverage. We observed that the higher the recorded coverage is, the higher the sensitivity is, and the correlation is marginally significant (p = 0.06). This should not be the case under normal circumstances, as sensitivity and specificity are in theory intrinsic to the estimation of self-reported coverage, and are usually independent of the actual coverage [32]. As a result, high coverage is also likely associated with high sensitivity.
On the other hand, the self-reported coverage of a few indicators had large bias, including screening for Down's syndrome and neural tube defects, first antenatal visit before 12 weeks of gestational age, and measles vaccine. This is likely due to a similar mathematical relationship-a combination of low recorded coverage and low to moderate specificity results in a high TAP ratio [29,30]. In fact, for the current recorded coverage and sensitivity of screening for Down's syndrome, for example, specificity needs to be as high as 0.96, compared to the current 0.46, to yield a TAP ratio of 1 [29]. For a combination of low prevalence and low specificity, coverage derived from self-reports in population-based surveys always overestimates the actual coverage. The degree of overestimation increases with the decrease of the actual coverage and specificity.
In summary, despite moderate and varying validity, the population-level bias in coverage estimates was mostly small to moderate in this study, particularly for indicators with high recorded coverage or low recorded coverage but high specificity. Of note, although the bias may not be large at the population level, the degree of misclassification at the individual level could still be large due to unsatisfactory validity of some indicators.
Our study is subject to a number of limitations. First, our reference standard has some caveats. The fact that the selfreported coverage of a number of indicators had low or lower than expected specificity, including that of cesarean section which would normally have closer to 100% specificity, suggests that the   quality of the GMNCHIS is perhaps not optimal. For a distinctive event like cesarean section, self-reporting might even be more reliable. MNCH services received outside the study county may not be recorded in the GMNCHIS, although one had to deliver in Gongcheng to be included in the study. Anecdotal evidence also suggests that the completeness of the electronic system has improved over time since its initiation in 2007. This was further supported by the finding that live births were significantly less likely to be matched using the electronic system in 2007-2009 compared to more recent years (p,0.001). However, our sensitivity analysis shows that the validity is the same between women who gave birth in the past five years and those in the past two years. This suggests that although the completeness may be lower in 2007-2009, among women recorded in the electronic system, the validity of their self-reports in the past five years is similar to that in the past two years. If however, self-reports of women that were not captured by the electronic system in its early stage had very different validity from those captured later, the limited completeness may introduce bias to our findings, although the direction is difficult to determine. Second, our study sample may not necessarily be representative of the whole county. The study participants were drawn from village vaccination rosters, which may have missed children born outside of the national family planning policy. However, this bias is likely to be small as Gongcheng is a minority-concentrated area and such women could usually have up to two children. In addition, within the primary sampling units, study participants were recruited based on availability by going through the vaccination rosters until the desired sample size was reached. This process may not be completely random, which could introduce selection bias, although we do not have reason to believe that the bias is systematic. The fact that our matched and unmatched study sample differed in mothers' ethnicity, number of live births in the past five years, and children's age may also introduce bias if these characteristics are associated with recall accuracy. However, the lack of representativeness does not affect the validation results, but may limit the study generalizability. Third, there are other factors that may limit our generalizability. The study was conducted in a setting where coverage of selected MNCH indicators was in general higher than that in countries where DHS and MICS surveys are normally conducted. In addition, we conducted this study in an area that is relatively more developed than those where most DHS and MICS surveys are usually conducted. If socio-economic development factors, such as education, are associated with validity as found previously [10,11], the study results could not be directly applied to other settings with different development levels. Despite these limitations, our study results could be subject to fewer selection biases and be more generalizable than other facility-based studies in this Collection, although they would have a higher-quality reference standard based on direct observation [15][16][17][18].
Factors associated with the design and implementation of the survey may also affect the external validity of the study. We interviewed women in a central location in the community rather than in their households, which may affect validity of certain indicators. However, we speculate that this influence is likely to be small for the indicators studied, most of which are not sensitive at all in this context. Validity or reliability of questions included in the survey instruments could also affect the study's internal and external validity. For instance, we failed to include the age limit of the measles vaccine in the questionnaire, which is 8 months or older in China [33]. As a result, the coverage of measles vaccine had high false positive rate and large bias. It is illustrated by the fact that children older than 8 months only constituted 60% of the matched live births, whereas the measles vaccine coverage rate was reported to be 70%, which is unlikely to be true.
In conclusion, more population-based validation studies are warranted with an improved reference standard and survey instruments. Future research should further examine the generalizability of observed validity to other LMIC settings. Nevertheless, the current study contributes to our understanding of validity of self-reported coverage of a range of MNCH interventions. It provides insights into the population-level accuracy of self-report based on a population survey in the LMICs.

Supporting Information
Table S1 List of the 28 matched indicators and the corresponding questions used in the community questionnaire, in comparison with those used in the DHS and MICS. (DOCX)