Use of artificial intelligence on Electroencephalogram (EEG) waveforms to predict failure in early school grades in children from a rural cohort in Pakistan

Universal primary education is critical for individual academic growth and overall adult productivity of nations. Estimates indicate that 25% of 59 million primary age out of school children drop out and early grade failure is one of the factors. An objective and feasible screening measure to identify at-risk children in the early grades can help to design appropriate interventions. The objective of this study was to use a Machine Learning algorithm to evaluate the power of Electroencephalogram (EEG) data collected at age 4 in predicting academic achievement at age 8 among rural children in Pakistan. Demographic and EEG data from 96 children of a cohort along with their academic achievement in grade 1–2 measured using an academic achievement test of Math and language at the age of 7–8 years was used to develop the machine learning algorithm. K- Nearest Neighbor (KNN) classifier was used on different model combinations of EEG, sociodemographic and home environment variables. KNN model was evaluated using 5 Stratified Folds based on the sensitivity and specificity. In the current dataset, 55% and 74% failed in the mathematics and language test respectively. On testing data across each fold, the mean sensitivity and specificity was calculated. Sensitivity was similar when EEG variables were combined with sociodemographic, and home environment (Math = 58.7%, Language = 66.3%) variables but specificity improved (Math = 43.4% to 50.6% and Language = 32% to 60%). The model requires further validation for EEG to be used as a screening measure with adequate sensitivity and specificity to identify children in their preschool age who may be at high risk of failure in early grades.


Introduction
Universal primary education is regarded as the key to the successful development and prosperity of future generations. United Nations Education, Scientific and Cultural Organization (UNESCO) states that literacy skills are fundamental to informed decision-making, personal However, a major hurdle in the use of EEG in developing countries, which also have the greater burden of out of school children, is that it is resource intensive and requires highly trained personnel for interpretation, thereby limiting its use for mass screening. However, automating interpretation of these waves using artificial intelligence can help overcome the hurdle of lack of trained individuals and increase availability of EEG based screening in such contexts.
Machine-learning (ML), a more defined subset under the umbrella of Artificial Intelligence (AI), is an effective tool for the efficient analysis of large sets of data, especially those in which there are recognizable patterns [15] which has also been documented to be advantageous for analysis of EEG signals due to smaller probability of bias and high sensitivity to pattern recognition [16]. Several studies have used ML models on existing performance scores of students to predict students school performance with an accuracy of 70-80% and dropout with an accuracy of 63-83% [17,18].
Combining the reduced probability of human bias when recording EEG signals with the ability of a ML algorithm to classify data provides a new avenue with which children at-risk of failure in early grades may be identified. The objective of this study was to determine the potential of a ML algorithm to evaluate the power of EEG data collected at age 4 in predicting academic achievement (Math and Language) at age 8 among rural children in Pakistan.

Study setting
The study sample included 219 children whose EEG data were collected to examine the links between EEG gamma power with cognition at age 4 years in an earlier study [13]. These 219 children were a sub-sample of a large trial birth cohort (n = 1489) exposed to early interventions between birth to 2 years of age [19]. The sub-sample had been randomly selected from the full cohort, stratified by the respective intervention groups at a prospective follow-up at age 4 years [13]. In the previous study, EEG at 4 years was associated with executive functioning at the same age and in the current study we sought to examine if similar association continued with academic achievement at age 8 years. An analysis of the sample included indicated no significant differences on the academic achievement scores when compared to the full sample at 4 years. The included sample had slightly higher IQ (t = 6.1, p = 0.013) and was younger (t = 6, p = 0.014) compared to the rest of the sample. EEG was recorded using a 64-channel high-density Geodesic sensor net (Electrical Geodesics, Inc.; Eugene, OR) and a Net Amps 300 high input amplifier. Continuous EEG was recorded for four blocks of one minute each with a total of four minutes. A central fixation cross was presented on a gray background and a brief silent video with bubbles popping up played between blocks to keep the child engaged [13].

Data collection
Detailed procedures of the data collected at 4 years (sociodemographic, intelligence test scores and EEG) have been described in the paper published earlier [13]. The demographic data measures included questions regarding household income, number of family members, parental education, occupation, household assets and food insecurity. Intelligence test scores were assessed using an adapted version of Wechsler Preschool and Primary Scales of Intelligence (WPPSI) III while home environment was measured using the Home Observation for Measurement of Environment (HOME) Inventory. EEG data were collected by trained research associates. Trained community-based teams obtained data at 8 years of age. The academic achievement was measured at 8 years using academic achievement test developed according to framework by USAID for measurement of Mathematics [20] and Language [21]. The test items were aligned with provincial curriculum for grade 2 and had sections on English, Mathematics, and local language (Sindhi). For the current analysis, we included only Mathematics and local language. English was excluded as it is not the mother-tongue of the participants and may have additional factors contributing toward failure on the test. Children attaining scores of at least 40% were considered to have passed this test. Ethics approval was obtained from the Ethics Review Committee of the Aga Khan University in Pakistan. The primary caregivers provided written consent (or thumb impression) for the assessments at 8 years and could withdraw from the study anytime. The data received for analysis were fully anonymized.

EEG feature extraction
The 64-channel EEG data were used at the input source and exported in European Data Format (edf). Data were sampled at 500Hz, one second per segment. Out of 64-channels, the data of 6 channels i.e. frontal (6,12,60) and parietal (28,34,42) were selected based on literature available for EEG waveforms in young children [13] and used for preprocessing. The EEG signal was four minutes long consisting of four baseline events. The start time of each event was identified. Each baseline event was of 60 seconds in length. The first baseline event was selected for further analysis as it was present in the majority of the recorded EEG data files. For the selected EEG baseline event, the 60 seconds epoch data were analyzed and any recording with bad channel data within the selected channels and epoch was discarded. Afterward the data was segmented into an array and the power spectral density (PSD) was calculated using the Welch method [22]. The length of the segment was 1 second and the "Hann" window was used as an appropriate size of overlap. After calculating the PSD, the mean PSD between the gamma brainwave frequency (21-45Hz) was extracted for all 6 channels and its log transformed values were used as a feature in the machine learning algorithm. Feature extraction and preprocessing of EEG was done using the MNE library [23] available in python [24]. Python's SciPy library [25] was used to calculate the PSD using the Welch method.

Proposed machine learning methodology
Classification is the method of identifying patterns/learning concepts from a dataset and predicting the label/class of the dataset. For predicting the performance, classification was conducted separately for outcomes in mathematics and language based on the following model combinations i) EEG features, ii) IQ feature, iii) Sociodemographic (socioeconomic status, household food security, parent education and parent occupation), home environment and IQ features combined iv) Sociodemographic, home environment and EEG features. We sought to examine the predictive power with and without sociodemographic and home environment variables and also compare the predictive power with traditional IQ testing scores.
We used K Nearest Neighbors (KNN) for classification. KNN is a non-parametric technique used to classify new instances based on the similarity (distance metric) with its neighbors [26,27]. The KNN classifier was trained using the default parameters as defined in the scikit-learn implementation of the algorithm [28]. The default settings are n_neighbors = 5, weights = 'uniform', leaf_size = 30, p = 2, metric = 'minkowski'. The parameter details can be reviewed from the official documentation [28].

Validation methods
Supervised machine learning techniques require a considerable amount of data to learn meaningful relationships and validate the results. However, when the dataset is small and imbalanced it requires advanced techniques such as resampling the data to model the available features for extracting useful insights. Also, a simple train-test split in such cases may not provide an accurate understanding of the model performance. Hence to overcome this problem we used 5 Stratified Folds [29] to validate the performance of our machine learning classifier across different distributions of the data. For each fold Synthetic Minority Oversampling Technique (SMOTE) [30] was applied on the training data to balance the classes. SMOTE is used to avoid over fitting of the ML model on skewed classes. The performance of the model trained across each fold was then tested on the remaining unseen data for that fold. The performance metrics used were sensitivity and specificity. For all these metrics the average of testing data for each fold was reported. Fig 1 describes the process of our implemented methodology.

Statistical analysis
Data were examined for normality prior to analysis. Data on the demographic characteristics of the study participants was reported as percentages and mean +/-standard deviations or frequencies as appropriate. To test the difference between participants by the outcomes, t-test was applied for continuous variables and chi-square was used for categorical data adjusted using Fisher exact test. Table 1 indicates the two metrics used to evaluate the performance of machine learning algorithms.

Results
Out of 219 children, 96 tracings were used in the ML model. Reasons for exclusion included lost to follow-up at age 8 years (n = 11), data not available due to corrupt EEG files, or no data for baseline event 1 (n = 34) and bad channel data (n = 78). The analysis of the outcomes of final sample of 96 children indicated that children had poorer performance on the language test with~74% failing the test compared to 55% on test of mathematics test. Demographically from the final sample of 96 children, 66.7% of mothers and 24% of fathers were illiterate. Mothers were predominantly housewives (74%) and fathers mainly were skilled workers (75.3%). Demographics and academic performance characteristics at 7 to 8-year follow-up are shown in Table 2. The table shows the difference between the groups (either passed or failed on the mathematics and language achievement test) based on their sociodemographic characteristics. There were significant differences between both groups on paternal education (χ 2 = 4.3, p = 0.039 for Math and (χ 2 = 7.4, p = 0.006 for language), maternal education (χ 2 = 11.1, p = 0.001 for Math; (χ 2 = 10.8, p = 0.001 for language), socioeconomic status
Mean gamma power spectral density was calculated for the groups using the above methodology [Figs 2 and 3].
The four different datasets according to the different model combinations were combined with math pass and fail labels as well as language pass and fail labels separately. This resulted in eight different experiments the results are shown in Table 3. The results indicated that sensitivity of EEG only model was similar to the model with addition of SES and home environment variables for Math (58.7% and 57.8%), Language (66.3% and 62.1%) respectively but specificity improved for EEG model with addition of sociodemographic variables for both scores: Math (from 43.6% to 50.6%) and language (from 32% to 60%). The models with IQ scores and sociodemographic scores had greater sensitivity (64.4% and 67.8%) and specificity (55.6% and 76%) compared to the EEG models for Math and language respectively.

Discussion
The objective of this study was to use ML techniques as a tool to examine the power of EEG data to predict failure in early years of school. The study found that the EEG data alone were not sufficient to use as a screening tool given very low specificity. However, its combination with sociodemographic and home environment variables considerably increased the specificity. A similar pattern was observed with the models using IQ score as a feature. One reason could be class imbalance that resulted in a low predictive accuracy [31]. While the findings may indicate the EEG alone may not be an adequate screening tool, further validation to examine the predictive power of the combined features model with minimal loss of validity compared to standardized IQ tests may be an interesting avenue to explore.
Factors that predict school performance have been an area of key interest for educationists and policy makers [32]. In order to improve outcomes of school performance, identification of high-risk students is critical so that appropriate interventions can be planned to decrease school dropout rate. In this regard, several standardized intelligence tests have been used to assess physical, cognitive, communication, social, emotional, and/or adaptive development in children mostly developed and normed in the US. A significant effort is required to not just adapt the test culturally but also run psychometric analyses to interpret the scores [33]. The test for IQ scores used the WPPSI III [34] and went through extensive adaptation efforts [35]. Another challenge is that although useful, these measures of cognition for preschoolers have shown limited predictive validity regarding academic performance [34]. This can be a major limitation for use of these cognitive tests in low resource settings. An EEG can also be as resource intensive as the recording also takes at least 20 minutes and a lot of post-processing Table 1. Metrics used to evaluate performance of machine learning.

Metric Explanation Formula
Sensitivity It is the ratio of correctly predicted positive samples to all the samples of the class of interest (failure).

Sensitivity ¼ tp tpþfn
Specificity It is the ratio of correctly predicted negative samples to all the samples of the class not of interest (pass).

PLOS ONE
Use of artificial intelligence on Electroencephalogram (EEG) waveforms time. However, the advantage is that the EEG could be a more objective biomarker then tests because the psychological tests might be biased by human factors like subjectivity and induce bias of the assessor or compliance and motivation at the level of the child whereas rest-EEG is only biased by artefacts. Additionally, we believe automation of EEG interpretation using artificial intelligence will be a reality soon thus alleviating the need of highly skilled individuals to interpret EEG waveform abnormalities. Predicting students' academic performance at an early age is critical as appropriate interventions can be designed for those who may be struggling with school requirements especially when the government has competing priorities for resource allocation. ML algorithms have been used in the past to predict educational performance of students. Marquez-Vera et al reported that using scores of humanities, language and mathematics, their algorithm was able to accurately identify 98.7% of high school student failures [36]. Ibrahim et al used the students' demographic profile and the grade point average for the first semester of the undergraduate studies to predict students' academic performance in the enrolled program and found that ANN appeared to be the best approach to predict the outcome with an accuracy of 80% [37]. Similar work has been done on university students where pre-university and/or first semester grade point average was used to accurately identify 80% of freshmen dropout [38]. It is important to note that all of this work has been done on high school or college students in high income countries and used school grade scores for prediction. A few studies in North America and China have looked at correlations between EEG and neuropsychological or academic status of younger children indicating EEG spectrum to be sensitive to attention-deficit hyperactive disorder [39,40] and predictive of emergent math skills in preschoolers in another study [41]. To the best of our knowledge, our study is the first in a developing country to use EEG data collected on children at the age of 4 years to use ML to predict early school failure in early primary grades.
Under 5 children in developing countries are exposed to multiple risks, including poverty, malnutrition, and un-conducive home environments, which affect their cognitive development [42]. Children not meeting their developmental potential are not well prepared for school with a high risk of failure in early grades and subsequent school drop-out. The proposed model has huge implications for the millions of children who may be at risk of failure in early grades and subsequent drop-out of school. This early development period may identify a unique window of opportunity to intervene early and hence ensure that human capital is used to its utmost potential. These interventions include, but may not be limited to, counseling and family support regarding maximizing the child's developmental potential, one-on-one attention, and individual educational plans suited for the child's unique pace and learning abilities [43].
In Pakistan, a large portion of children who do not attain primary education are from rural parts of the country where interventions and screening are limited. In far flung and resource constrained areas, EEG administration by community health workers (CHWs) with decision making by the AI model may assist in screening all pre-primary school age children in the region. Once identified, these high-risk children can receive intervention through trained early childhood educators also enhancing much needed coordination between health and education sectors to help vulnerable children achieve optimal early childhood development outcomes.
This study has strengths and limitations. To the best of our knowledge, there are no existing published studies utilizing ML algorithms that have been used to assess failure in early school grades especially in LMICs. A major limitation of our study was our small population size with an imbalance of the classes especially the language test scores where about two-thirds of children had failed. Though techniques like SMOTE were adopted to deal with this unbalanced data but such techniques do not alleviate the risk of overfitting or bias of the model towards the dominant class. Apart from the small population size the loss of a major proportion of data due to poor quality of data and complete case analysis approach to handle missing data for drawing inferences from the study is also a limitation. ML models improve in accuracy and predictive capability as the size of the data pool increases. However, the current study was a proof of concept and a larger study with prospective data collection specifically targeting the question of validating the algorithm is needed. Further, EEG waves need to be fairly clean data for use in ML which may be a challenge in real world situations, especially in resource constrained settings.
The findings of this study may not be generalized to other countries with low or no failure rate for primary school students unlike Pakistan. Despite its limitations, there is potential of EEG in combination with other variables to predict early grade failure allowing early targeted intervention for highrisk children. In the future, AI system needs to incorporate functional imaging findings that can be applied on children along with EEG findings. The neuroimaging technique is known as functional near infrared spectroscopy (fNIRS) which can be applied on children [44]. fNIRS was used in neurobiological feedback in training that could translate to better educational outcomes such as measures of learning curve by Khoe et al., 2020 [45]. Furthermore, fNIRS can assess hemodynamic changes in the brain when a subject performs cognitive tasks [46].