The Malawi Developmental Assessment Tool (MDAT): The Creation, Validation, and Reliability of a Tool to Assess Child Development in Rural African Settings

Melissa Gladstone and colleagues evaluate the reliability and validity of an assessment tool for evaluating child development in rural African settings.


Introduction
Worldwide, poverty, poor health and nutrition are responsible for more than 200 million children under 5 y of age failing to reach their developmental potential [1].We know that such outcomes could be prevented if early intervention programmes were available for these children [2].However, the implementation of these internationally funded programmes is critically dependent on tools to assess child development, and there is a dearth of such tools for use in non-Western settings.Programmes and studies using development as an outcome measure in resource-limited countries have tended to use Western assessment tools [3].Many are simply translated [4] or adapted [5], with limited validation [6] before use.This approach may enable some comparison between groups, but it will not provide robust outcome measures because these tools contain many items alien to children of a non-Western culture [7].More recently, some tools have been adapted and validated, and normal reference ranges or scores for ages to assess attainment have been developed.These tools have been created for children of a limited age range, [8], have been based solely on urban children [9], or have excluded important domains of development such as language and social skills [10].
The aim of this study was to create a culturally appropriate developmental assessment tool, the Malawi Developmental Assessment Tool (MDAT), for use in rural Africa.In a preliminary study we evaluated the use of Western developmental items in a rural Malawian setting [11].We discovered that a high proportion of gross motor 33/34 (97%), language 32/35 (91%), and fine motor 27/34 (79%) items were reliable and showed a good fit with logistic regression.The social items 18/35 (51%), however, performed less well and many were judged to be culturally inappropriate.This stimulated us to conduct a qualitative study addressing concepts and ideas of child development with ten focus groups of villagers and two focus groups of professionals in Malawi [12].While all domains were discussed, gross motor and social milestones were the main domains of interest.Concepts and ideas from this study were then used to generate new items and modify items from the preliminary study.Examples of concepts used were Figure 1.Stages in creation of final MDAT tool.Draft MDAT I created out of 110 items from the preliminary study with the addition of 52 items from the qualitative study, as well as the modification of some items.Draft MDAT II created after face and content validity with addition of 13 items and eight items removed as well as the modification of some items.Draft MDAT III created after piloting where nine gross motor, six fine motor, nine language, and four social items were added or modified, and one gross motor, five language, and three social items were removed.The Final MDAT tool consisted of 136 items with 34 in each domain having had eight gross motor, nine fine motor, 23 language, and nine social items removed.doi:10.1371/journal.pmed.1000273.g001 ''carrying items on head,'' ''body healthy and flexible,'' ''carrying out duties and chores,'' ''sharing,'' and ''taking up leadership roles.''All items once created or modified from the preliminary tool were tested in a large community study and normal reference ranges were found for each item.Final items were subsequently selected at a consensus meeting.By these methods we have created the MDAT, a simple to use, reliable, valid, and easily accessible tool for use by community health workers and researchers looking at developmental outcomes of children in sub-Saharan Africa.

Creation of a Culturally Appropriate Developmental Assessment Tool (Pilot Phase)
As shown in Figure 1, at the start of this study, MDAT Draft 1 contained 162 items.This draft was created from items in the preliminary study as well as from the qualitative study [11,12].We ensured consistency and clarity of items by translating and back translating the tool with the help of a language expert from the University of Malawi.Many items were then illustrated with a picture drawn by a Malawian artist (CZ) (Figure 2).We prepared a small basket of props to be used with the questionnaire (Figure S1).We then assessed face validity (where items were reviewed by untrained judges to see whether they think the items look acceptable) and content validity (the subjective measurement of the comprehensiveness to which an instrument appears logically to examine the characteristics or domains it is intended to measure) [13] through group discussions with six research midwives and ten Malawian medical students.In assessing face validity, individual discussions were also carried out with two of the investigators (EU, MN) and a language expert.These individuals commented on each item and whether the items were understandable and relevant to the Malawian population.At this phase of validation, some items were removed and some added, producing MDAT Draft II (Figure 1).
MDAT Draft II was then piloted on 80 children in two stages over a 6-wk period.Pilot assessments were observed by three investigators (MG, EU, and MN) and there were group discussions every 2 wk with the research midwives.The three investigators met three times during piloting and some items were added to improve clarity or precision and other items were removed either because they were not felt to be discriminatory enough in assessing child development or they were difficult to carry out in the field [14].At this stage MDAT Draft III was produced with any new items added having face and content validation and being repiloted.An example of the gross motor domain is shown in Figure 2.
The study protocol complied with the principals of the Helsinki Declaration [15].The research midwives explained the purpose of the developmental assessment to each child's parent or carer and obtained their informed consent to participation in the study.The study received ethical approval from the College of Medicine

Assessing the Performance of Items and Establishing Normal Reference Ranges in a Large Sample
To test the performance of MDAT Draft III, we recruited and assessed 1,513 children from four sites in the Southern region of Malawi.These were three rural and one semi-urban site (Namitambo, Mikolongwe, Nguludi, and Bangwe), which were all taking part in an antenatal trial with the same research midwife team [16].Assessments occurred over a 1-y period from June 2006 until July 2007 using the team of six research midwives in local antenatal clinics in each of these areas.Normal healthy children of mothers attending clinic (one per family) between the ages of 0 and 6 y were included.Those with significant malnutrition (weight for height Z score ,22 using WHO criteria [17]), significant medical problems, prematurity of 32 wk or less (reported or measured on antenatal ultrasound), or significant neurodisability were excluded.In all cases, we ensured that they were receiving appropriate medical support.A decision was made to exclude these children from the ''normal population'' as the aim was to create a developmental assessment tool that identified children with developmental delay.We gathered sociodemographic character-istics using the same questions as the Malawi Demographic Health Survey (MDHS) [18].We recruited children by asking one in every three mothers in clinic to bring one child to their next appointment.We used a quota sampling technique similar to that used by the Denver II [19] where target numbers of children for 34 age groups were sought (Table S1).Children's ages were determined from available birth data or the ''health passport'' that mothers in Malawi carry with them for all health appointments.Once we had recruited enough children of a particular age range, no more children of that age range were invited to participate.We then targeted ages where there were inadequate numbers by asking mothers to only bring children of those ages.We approached 1,657 families (Figure 3).82 families refused and 62 children were ineligible due to serious medical problems as listed above, resulting in 1,513 children in the final assessment.67 (4.4%) of these were then excluded prior to analysis (Figure 3) leaving 1,446 children in the final analysis.A subsample from this population were recruited for reliability testing.
The assessment using the new tool (MDAT Draft III with 185 items) took approximately 35 min in a quiet location, often outdoors.Five to seven children were assessed in a morning session by two to three research midwives at two of the four different sites each day.Where possible, items were directly observed, but items were accepted on report if the mother was very clear that the child could do the item and there was no doubt when assessing associated areas of development.We scored items as pass or fail, and if the child was uncooperative or unwell, items were scored as ''don't know.''Items were assessed until the child failed seven consecutive items [20,21].The data for each item were then fitted using logistic regression and normal reference ranges were established (see statistical analysis section).

Reliability
Children were invited to participate in reliability testing as follows.The first child on the testing day was assessed for interobserver immediate reliability, the second child for interobserver delayed reliability, and the third child for intra-observer delayed reliability.We measured interobserver immediate reliability by assessing the same child independently on the same occasion by two observers (56 children).Interobserver delayed reliability was measured by observing the same child independently on the same day at different times by two observers (52 children).We measured intra-observer delayed reliability by the same observer assessing the same child 2 wk apart (124 children).Reliability testing was carried out on all 185 items in the Draft MDAT III.

Final Evaluation of Items by Consensus
An expert panel consisting of two Malawian paediatricians, two British paediatricians, and a statistician (MN, Mac Mallewa, MG, RLS, and GAL) reviewed the results and decided which items should remain, which should be further modified, and which removed as previously described [11].Items were evaluated at these meetings in terms of their fit in a logistic regression, their reliability, subjective ratings, and the effect of gender in the logistic regression.We wanted (as much as possible) items with a good fit, good to excellent reliability (kappa .0.6), few problems when rated subjectively, and no effect of gender.As there were some items where the age ranges for attainment were exactly the same, the consensus meeting used this forum to also choose only one of these items in any one domain.The selection procedure through consensus has been described elsewhere in more detail [11].

Validity
Once the final set of items was chosen, children were then scored in two ways.Firstly a score was generated by a categorical pass or fail assessment, and each score was used to validate the tool in a series of tests.All items relevant to the age of testing were scored in a similar way to the Denver II screening test [19].If the child failed two items or more in any one domain at the chronological age at which 90% of the normal reference population would be expected to pass, then they failed the test.Secondly, a continuous score was obtained by adding up the total number of items passed by the child per domain and in total.These scores varied with the age of the child.Both sets of scores were then used to validate the tool by comparing firstly with a group of children with neurodisability.We recruited 80 children up to 6 y of age with known neurodisabilities from the ''Feed the Children'' centre for children with disabilities (previously Cheshire Homes) in Blantyre [22].Exclusions from this group were children unwell at time of examination, those with severe malnutrition (as previously defined), and any blind or deaf children.A second comparison group was 120 children up to 6 y of age with marasmus (height/weight ,80% expected), as there is good evidence that these children often have moderate developmental delay [23,24].Within this group, children with fevers or other illnesses (including HIV sero-positivity) were excluded.HIV testing was routinely performed in the malnutrition unit.Each of these groups was compared with a subset of age-and sex-matched children from the normal study population.This sample was chosen because of practicality issues and time constraints.To avoid bias, the comparison group was selected randomly (within those of the same sex and age to one decimal place) by a computer-generated random number list.

Data Entry and Statistical Analysis
All data were double entered by a data entry team with any discrepancies and outlying results reviewed.Data were analysed using Microsoft Access version 7.0 and SPSS for Windows version 12, Stats-direct, STATA version 8 and Epi Info computer programs for the analysis.We measured socioeconomic status in quintiles through principal components analysis of multiple assets following methods from the World Bank [25][26][27].We determined height and weight for age (HAZ and WAZ) through Epi Info using US Centers for Disease Control reference data [28,29].
We constructed normal reference ranges for the children passing items using logistic regression analysis with decimal age as the explanatory variable.A logistic regression analysis is one where a prediction is made about the probability of an event taking place by fitting the data to a logistic curve.In this case, this would be the probability of carrying out a certain item of development e.g.''walks well'' at certain decimal ages.The fitted values from the model for each item were plotted against the observed data and graphs were drawn for each item.To determine whether or not the fitted curve was a sufficiently good representation of the data, it was visually assessed for each graph but also statistically assessed.The goodness-of-fit statistic was calculated for each fitted curve and for any item where the fit was significantly poor at the 5% significance level [30], refitting was done using triple split spline regression [31,32].To do this, the ages corresponding to the 35th and 65th percentiles were calculated from the original fit to determine the cut points, and three logistic curves were then fitted, one for each region.This calculation is described in more detail in a previous paper [11].Using the predicted probabilities found from the logistic regression analyses, the ages corresponding to 25%, 50%, 75%, and 90% percent of the children passing were determined for each item.These numbers were then used to plot the age norms of achievement of each milestone in a box-type representation in graphs similar to the procedure described for the Denver II (see Figures 4-7).In a further exploratory analysis, we added other explanatory variables (sex, socioeconomic status, and height for age [HAZ] and weight for age [WAZ] Z scores) to assess their effect on the probability of passing an item.
Reliability was measured using kappa (k) statistics as well as percentage agreement to assess observer agreement for each question.Positive values of 0 to ,0.2 indicate poor agreement, .0.2 to 0.4 fair agreement, .0.4 to 0.6 moderate agreement, .0.6 to 0.8 good, and .0.8 to 1 very good agreement [33].
To compare statistically the differences in numbers of pass/ fails achieved by the different groups in the construct validity assessment, a paired McNemar's test was used.We used paired t-tests to compare the numerical scores.Sensitivity and specificity were calculated for children with neurodisabilities in comparison to normal children, as by definition, children with neurodisabilities clearly should fail a test assessing normal development.

Characteristics of Population for MDAT
Demographic data (Table 1) demonstrate the MDAT population was very similar in socioeconomic status to the national average, although the MDAT population had a higher number of mothers with some secondary education (23% versus 10%) and a lower number with no education (11% versus 25%).The MDAT population was nutritionally less stunted than the national average, with a lower proportion of HAZ scores less than 2 or 3 standard deviations (SDs) (,2 SD) below the norm (38% compared to 48%) and for WAZ scores (15% compared to 21%) were ,2 or 3 SD below the norm.

Face and Content Validity and Piloting
Initial validation of the Draft MDAT I demonstrated good content and face validity (Figure 1).At this stage, after comments from discussants, 13 items were added to the gross motor, language, and social domains as it was felt there were too few items for certain age ranges.Eight items were also removed in the fine motor and gross motor domains as they were not felt to be culturally appropriate or suitable for testing.The MDAT appeared to assess development in children in ways that were felt to be important.Discussants were happy that the questionnaire examined the various domains of development in a comprehensive and logical fashion and that it was representative and relevant to developmental milestones of children in a Malawian setting.
After face and content validation, the tool was piloted.At this stage, nine language items were added or modified from the previous version for clarity and consistency of items.Nine gross motor items of increasing difficulty were added as it was found that many of the older children were able to do all items in the gross motor section earlier than expected.This was also the case with four social items.Six fine motor items were also added at this stage, often these were items that could be tested differently at different ages and therefore were separated into subsections and consequently different questions, to decrease ambiguity on testing.For example, the item ''puts pegs into board'' was subdivided as ''puts pegs into board in up to 30 secs'' and ''puts pegs into board in up to 2 minutes.''

Performance of Items and Normal Population Reference Ranges
Information regarding the final items and how they performed in terms of logistic regression as well as with the additional explanatory variables are shown in Table 2.There were no items in the gross motor domain that had poor goodness of fit in the logistic regression analysis, whereas 50% of items in the social domain needing refitting using splines.A few items (eight) showed gender differences in the analysis but were kept in the tool after discussion at the consensus meeting.Five of these were in the social domain and were considered relevant and useful in the Malawian setting.These items are shown in Table S2.Socioeconomic status had a significant effect in the logistic regression analysis in up to 26% of items in some domains and nutritional status had a similar effect in the analysis and attainment of milestones in all developmental domains (HAZ score in 47%-65% of items and WAZ in 38%-56% of items).
Figures 4-7 show the normal population reference ranges displayed as graphs of age ranges of attainment of milestones.There is one graph for each domain of development.

Reliability
Overall, reliability was excellent (k.0.75) for 99% (134/136) of interobserver immediate reliability (Table 3), for 89% (121/136) interobserver delayed reliability, and 71% (96/136) of intraobserver-delayed 2-wk assessments.The remaining assessments had fair-to-very good reliability (k.0.4) with only two items having poor reliability (k,0.4) in the interobserver immediate category.In terms of the developmental domains, gross motor, fine motor, and social items had good kappa values for reliability, whereas in the language domain there were more moderate-to-good agreements.Delayed intra-observer reliability performed less well than the other forms of reliability in all the domains with excellent agreement in only 47%-88% of items, depending upon the domain.tool, whereas in the social domain, only 12/34 items remained from the preliminary version in their original or modified form, and 22/34 new items were created, most of these (18/24) being newly created from the qualitative study described elsewhere [12].

Validity
The MDAT correctly identified almost all of the children with neurodisabilities, with 97% failing compared with 18% of normal age-matched controls.Sensitivity was therefore very high (97%), and specificity was 82%.When we compared the children's scores, those with neurodisabilities had average scores 63.9 points lower than age-and sex-matched controls, with highly significant differences in scores in all domains (Table 5).
When comparing the children with marasmus to controls, 72% failed the MDAT compared with 6% of controls.Children with marasmus had overall average scores 14.9 points lower than controls (Table 5), with scores significantly different in all domains except social development.Differences in scores were 5.1 points in fine motor but only 1.8 points in social development.

Discussion
We have managed to develop a tool with normal reference values to assess childhood development up to the age of 6 y for a rural setting in Africa.We have demonstrated its sensitivity in the detection of neurodisability but also more subtle neurodevelopmental delay as seen in children with malnutrition.We have demonstrated good face and content validity of the tool.This instrument is therefore culturally appropriate for the rural sub-Saharan African setting of Malawi, and is likely to be applicable in other similar settings.The tool is easy to use, has good reliability, only requires a small basket of props, and takes approximately 30 min to administer.It also has clear pictorial representations of many of the items in the tool, making it understandable to all who use it.The MDAT could be used by local health workers with little training as well as by researchers needing a tool to use as an outcome measure when assessing development of children in these settings.
There is much evidence that the large scale problem of disability and developmental delay in resource-poor settings has a high total cost to societies and contributes to continuing cycles of poverty preventing improvements in children's achievement in these settings [1].The benefits of preventative measures and integrated programmes to improve child development have been shown, however, few robust developmental tools are available to assess the outcome of these programmes [2].The MDAT has demonstrated good sensitivity in detecting children with neurodisabilities as well the more subtle differences in development that would be expected between children with marasmus and normal age-matched controls [23].To be able to use tools such as this to identify disability and developmental delay is an exciting prospect when there are few robust instruments for detection of disability, especially for those children under 2 y and where tools such as the ''ten question disability screen'' are inadequate [34].
We have been fortunate to have access to a large population of normal rural African children through antenatal clinics allowing us the opportunity to create normal reference values for a typical Malawian child population.The MDAT population is very similar in economic status to the Malawian childhood population.The percentages of children with stunting and malnutrition in the MDAT population were a little lower than those seen in the MDHS population, partly due to the fact that we excluded any children who were severely malnourished (,2 SD weight for height), but also because our population had more semi-urban children in it than the national average.We wanted a tool that reflected the normal population of Malawi, however, we also wanted to reflect a population that was clinically well.Although these conditions were difficult to achieve and the population used was not an ''ideal'' population (one in which health and development would be at its most ideal), it was a population that we felt reflected the normal population, but not including those with severe medical problems and in need of specific support.
Previous literature makes it clear that malnutrition will affect the achievement of developmental milestones [1,35].We have found that height for age and weight for age did affect the normal reference values in approximately half of the items in the tool, demonstrating that many of the developmental items are sensitive to differences in nutritional status between children.Furthermore, as expected, socioeconomic status within the groups studied does seem to also play a role in attainment of some items, particularly in the social domain.85% of children in Malawi live in rural areas [18] with half of children stunted, therefore we would argue that a developmental tool should be appropriate for use in this type of population.The normal reference ranges have therefore not been adjusted for height for age, weight for age, or socioeconomic status.
We have developed a robust methodology for creating developmental assessment tools that can be applied in any setting and that could therefore be used in many different cultures worldwide.This includes a systematic series of initial qualitative studies, piloting, and translation to create a more culturally accessible tool that can then be tested and analysed item by item to attain reference values through logistic regression as well as to determine reliability.Before validation, a final consensus meeting with an appropriate group of assessors can select items for the final tool.We have found in our construct validity studies that the MDAT is identifying 18% false positives.Our figures are, however, based on a case control method of sampling that may influence our results for sensitivity and specificity [36].Although the tool is sensitive enough to pick up children with known neurodisabilities using the pass/fail scoring system that we have implemented, we still need to determine how well it can identify those with more subtle developmental delay.We have found that the MDAT can identify the developmental delay present in a subgroup of children with malnutrition.We identified 72% of children in this group with a delay in one or more areas of development and with average scores 14.9 points lower than the normal controls.This finding is consistent with evidence demonstrating that children with malnutrition have moderate developmental delay with overall DQ (developmental quotients) 20 to 30 points lower than normal children [23,24,35].Despite these results, further research into scoring of the tool, as well as validation in groups of children with more subtle developmental delay, is necessary to provide further evidence of how the tool works.
The MDAT has broad applications both as a clinical tool in early identification of neurodevelopmental problems and as an outcome measure, for example in clinical trials of perinatal interventions.It is clear that settings such as Malawi have limited services to support this population and at present this tool may be more useful as an outcome measurement tool for research practice.However, by being able to identify children with neurodevelopmental delay, scarce government resources as well as international intervention programmes can be directed most effectively.Furthermore, without measures such as this, there will be no evidence as to whether interventions to improve outcomes in early childhood are effective in these settings.N The World Health Organization has information on disability, prevention, and management in children and adults worldwide

Supporting Information
N UNICEF has a site on early childhood and in particular, provides information on programming experiences for early child intervention programs worldwide N Disability World is a website for international views and perspectives on disability worldwide.It provides information and links about the worldwide state of disability in children and adults in developing countries N Source, the International Information Support Centre has a good website of information about disability, inclusion, and development in children with links to many other sources of information N Wikipedia has a page on child development (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages) N The US Centers for Disease Control and Prevention provides information on developmental screening and on developmental milestones N The American Academy of Pediatrics also provides information on developmental stages and on developmental milestones N The UK National Health Service Choices site has an interactive guide to child development N MedlinePlus has links to further resources on infant and toddler development (in English and Spanish)

Figure 3 .
Figure 3. Flow diagram of the recruitment of families and children for the MDAT study.doi:10.1371/journal.pmed.1000273.g003

Figure 6 .
Figure 6.Normal reference values for language milestones.doi:10.1371/journal.pmed.1000273.g006 After consensus, from the draft tool of 185 items, we created a final version of the tool with 136 items, 34 in each domain of development (see FiguresS2-S5for this final questionnaire).Items removed at consensus and the reasons for this are outlined in Table4.In the gross motor domain, most items in the final tool (27/34) were retained or modified from the preliminary

Table 2 .
Number (%) of items in each domain of development that had poor goodness of fit and where gender, socioeconomic status, HAZ, or WAZ were significant effects in logistic regression. doi:10.1371/journal.pmed.1000273.t002

Table 3 .
Reliability by area of development for final items in MDAT.

Table 4 .
Reasons for removal of items in the consensus meeting within each domain of development.

Table 5 .
Comparison of scores for children with neurodisabilities or malnutrion and their age-matched controls using the MDAT.GAL EK NRvdB RLS.Analyzed the data: MG GAL EU EK RLS.Collected data/did experiments for the study: EU MN EK.Enrolled patients: EU MN EK.Wrote the first draft of the paper: MG GAL RLS.Contributed to the writing of the paper: MG GAL EU NRvdB RLS.Helped in data collection and transcription and clean-up of the data: EK.Worked with MG throughout the study from the piloting up to the end: EK.Technical oversight and ensured integrity of study: NRvdB.Babies can do very little when they are first born.But, gradually, over the first few years of life, they learn to walk and run (gross motor skills), they learn to manipulate objects with their hands (fine motor skills), they learn to communicate with words and gestures (language skills), and they learn how to interact with other people (social skills).After piloting this version on 80 children in rural Malawi, they modified it further to produce MDAT Draft III, which was used to assess 1,426 normal children aged 0-6 years from rural Malawi and to derive age-standardized norms for each item.After statistically analyzing the performance of each item in MDAT Draft III, all the items were considered at a consensus meeting, and items that were badly performing, unnecessary, and difficult to administer were removed, leaving 136 items (MDAT).The researchers then validated MDAT by using it to assess children with neurodisabilities (disorders of the nervous system that impair normal functioning) and children with delayed development because of malnutrition.The tool was reliable (different testers got similar results for individual children and individual testers got similar results when they retested specific children), sensitive (it correctly identified most children with a neurodisability or delayed development), and specific (it correctly identified most children who were developing normally; that is, it did not give false-positive results).What Do These Findings Mean?These findings show that MDAT is a culturally relevant assessment tool that reliably identifies children with neurodisabilities and delayed development in rural Malawi.Importantly, they also provide a detailed illustration of how to create and validate a culturally relevant assessment tool.Although MDAT is likely to be applicable in other similar settings, further research is needed to test its generalizability and to test whether it will work in children with more subtle developmental problems.MDAT, the researchers note, should be useful as a clinical tool for the early identification of neurodisabilities and as an outcome measure in clinical trials of interventions designed to improve child development.However, they stress, because developing countries have limited resources available for screening and for helping children whose development is delayed or disrupted, for now tools like MDAT are more likely to be used for research studies than for routine developmental assessments in Malawi and other African countries.Additional Information.Please access these Web sites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1000273.
a Passing in each domain = one or no failures in any one domain (gross motor, fine motor, language, social) for the age of the child.bPassing for all domains = no failures in any domain of development.doi:10.1371/journal.pmed.1000273.t005WhatDid the Researchers Do and Find?The researchers assessed the ''face validity'' (do the items look acceptable to untrained judges?) and ''content validity'' (does the tool examine all the domains it is meant to measure?) of MDAT Draft I and modified it to produce MDAT Draft II.