Machine learning algorithms’ application to predict childhood vaccination among children aged 12–23 months in Ethiopia: Evidence 2016 Ethiopian Demographic and Health Survey dataset

Introduction Childhood vaccination is a cost-effective public health intervention to reduce child mortality and morbidity. But, vaccination coverage remains low, and previous similar studies have not focused on machine learning algorithms to predict childhood vaccination. Therefore, knowledge extraction, association rule formulation, and discovering insights from hidden patterns in vaccination data are limited. Therefore, this study aimed to predict childhood vaccination among children aged 12–23 months using the best machine learning algorithm. Methods A cross-sectional study design with a two-stage sampling technique was used. A total of 1617 samples of living children aged 12–23 months were used from the 2016 Ethiopian Demographic and Health Survey dataset. The data was pre-processed, and 70% and 30% of the observations were used for training, and evaluating the model, respectively. Eight machine learning algorithms were included for consideration of model building and comparison. All the included algorithms were evaluated using confusion matrix elements. The synthetic minority oversampling technique was used for imbalanced data management. Informational gain value was used to select important attributes to predict childhood vaccination. The If/ then logical association was used to generate rules based on relationships among attributes, and Weka version 3.8.6 software was used to perform all the prediction analyses. Results PART was the first best machine learning algorithm to predict childhood vaccination with 95.53% accuracy. J48, multilayer perceptron, and random forest models were the consecutively best machine learning algorithms to predict childhood vaccination with 89.24%, 87.20%, and 82.37% accuracy, respectively. ANC visits, institutional delivery, health facility visits, higher education, and being rich were the top five attributes to predict childhood vaccination. A total of seven rules were generated that could jointly determine the magnitude of childhood vaccination. Of these, if wealth status = 3 (Rich), adequate ANC visits = 1 (yes), and residency = 2 (Urban), then the probability of childhood vaccination would be 86.73%. Conclusions The PART, J48, multilayer perceptron, and random forest algorithms were important algorithms for predicting childhood vaccination. The findings would provide insight into childhood vaccination and serve as a framework for further studies. Strengthening mothers’ ANC visits, institutional delivery, improving maternal education, and creating income opportunities for mothers could be important interventions to enhance childhood vaccination.

This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.

Unfunded studies
Enter: The author(s) received no specific funding for this work.This manuscript might be important for addressing information or research gap about predicting childhood vaccination.The findings might be critical for health policymakers for immunization coverage and reduction of vaccine dropout.This manuscript is in line with the journal's scope and objective.
The manuscript was not submitted and is not under consideration in another journal.
If there is anything not well addressed, please let us know and learn from the journal editor and reviewers.For further communication please contact the corresponding authors via addisalemworkie599@gmail.com.

Introduction
Globally, nearly 44% of child deaths occurred under 28 days of birth [1].Around 75% of child deaths occur within 12 months of birth, and an estimated 4.1 million infants are projected to die in 2017 [2].The rate of child deaths in developing countries is the highest in the world [2,3].Around 1.2 million children are predicted to have died in Africa in the first 28 days of birth [4], and nearly 49% of child deaths are predicted to have occurred in Sub-Saharan countries [3].According to the World Health Organization (WHO), more than half of child deaths are caused by infectious diseases that are easily preventable and treatable through simple and affordable interventions [5].Worldwide, childhood mortality and morbidity are caused by tuberculosis, diphtheria, pertussis, tetanus, polio, and measles [6].
Child deaths due to diphtheria, pertussis, tetanus, polio, and measles are easily preventable through vaccines.Childhood vaccination is one of the most successful and cost-effective public health interventions for common childhood illnesses like pneumonia, diphtheria, tetanus, whooping cough, and measles [7].Nowadays, nearly 3 million child deaths due to diphtheria, tetanus, whooping cough, and measles are prevented through child vaccination [8].Over the past decade, more than 1,000,000 children's lives have been saved by immunization programs, and infectious and communicable diseases have been controlled through child vaccination [9].Nonetheless, 12.9 million children did not receive recommended vaccines across the world [8].
Sufficient numbers of children did not complete their immunization schedule due to various challenges and barriers [10].Nearly 21 million of children have been projected to miss out on vaccines, and two third of vaccination missing were occurred in developing regions due to outbreak of new case [11].Nearly 21 million children have been projected to miss out on vaccines, and two-thirds of the missing vaccinations occurred in developing regions due to an outbreak of new cases [12].In Ethiopia, infant vaccination doses were usually delayed, with 63.8 percent of Diphtheria Pertussis Tetanus (DTP) dose 1, 63.1 percent of Polio dose 1, and 68.5 percent of measles delivered after the recommended date [13].According to the Ethiopia Demographic and Health Survey (EDHS), data on vaccination coverage among children aged 12-23 months who received specific vaccines at any time prior to the survey revealed that only four out of ten children (43%) had received all basic vaccinations [14].According to the WHO, the mean dropout rates of Bacillus Calmette-Guérin (BCG) and measles are 34.6% and 28.6%, respectively [15].
According to a traditional and multilevel logistic regression analysis report, different factors are reported that could affect child vaccination and immunization coverage.Maternal education, knowledge of mothers about the vaccines and their schedule, maternal age, fear of side effects, having antenatal care (ANC) visits, and giving birth at a health institution are some of the maternally related factors that affect childhood vaccinations [16][17][18][19].Additionally, availability of vaccines, migration of caregivers, household income level, and sex of household heads are factors that affect childhood vaccination status [20,21].Moreover, the sex and age of children, their birth interval and order, multiple children born at a time, a mother's media exposure, being a rural resident, and having distant health facilities are also factors associated with childhood vaccination [22,23].However, the odds ratio and relative risk of traditional and multilevel logistic regression do not meaningfully classify attributes and do not discover new insight [24].
Despite the efforts of the government to improve child vaccination, increase vaccination coverage, and reduce vaccine dropout rates, vaccine providers and health programmers lack available on-site information handling tools to target high-risk children for vaccine dropout, late and incomplete vaccination [15].Therefore, low-income countries would model and visualize the childhood vaccination risks on large datasets to identify attributes for childhood vaccination and target children who are at a high risk of dropping out or delaying the next vaccine dose.
Massive amounts of biomedical and public health data are categorized, and predicted using a variety of predictive algorithms to gain new knowledge and reveal hidden relationships and trends [25].Multidimensional data mining techniques were used to correctly forecast future immunization outcomes based on existing data, and to predict features of typical childhood immunization schedules [26].Predictive analytics tools are potent and widely applicable for learning.Numerous machine learning algorithms have reportedly been used in earlier studies to predict disease prevalence, the use of healthcare services, vaccination uptake [27], routine immunization [15], childhood vaccination and mortality [28,29].For automated detection, identifying connections that aren't leaner, and identifying significant patterns in data, machine learning algorithms are essential [30].Specifically, random forests, logistic regression, J48, logit boost, and Addaboost algorithms were used to predict under five and neonatal mortality [29,31], undernutrition status of children [32], malnutrition among children [33,34].Additionally, Naïve Bayes and PART algorithms are also used to forecast and classify text documents [35].Prediction of childhood vaccination based on machine learning techniques is insufficient.Currently, massive amounts of data are being generated.So, these must be presented with the best data analysis tools.Policy makers and stakeholders need accurate predictions on various aspects of immunization and other health parameters for effective actions.Researchers are needed to test and compare various prediction and classification algorithms that are needed to classify and predict childhood vaccination.
Therefore, this study aimed to: (1) evaluate different machine learning algorithms using model evaluation matrix parameters; (2) identify important attributes for childhood vaccination based on the best performance algorithm; and (3) generate an association rule that together determines the vaccination of children aged 12-23 months in Ethiopia.

Study design and setting
A cross-sectional study design was conducted across the nine regions of Ethiopia.Ethiopia is located in the Horn of Africa and is bordered by Eritrea to the North, Djibouti and Somalia to the East, Sudan and South Sudan to the West, and Kenya to the South.Ethiopia has nine regional states with two administrative cities.These are subdivided into different administrative zones (817 Woredas and 16253 Kebeles).

Data source
The 2016 Ethiopian Demographic and Health Survey (EDHS) dataset was used from the DHS program website (https://dhsprogram.com).The survey was conducted by the Ethiopian Public Health Institute (EPHI) in collaboration with the Central Statistical Agency (CSA).The actual data collection period was conducted from January 18, 2016, to June 27, 2016.

Sampling techniques and procedures
The sampling frame used for the 2016 EDHS is a frame of all Census Enumeration Areas (EAs) created for the 2016 Ethiopia Population and Housing Census (EPHC) and conducted by the Central Statistical Agency (CSA).The census frame is a complete list of the 84, 915 EAs, covering an average of 181 households, created for the 2016 EPHC.The sample for the 2016 EDHS was designed to provide estimates of key indicators for the country as a whole, for urban and rural areas separately, and for each of the nine regions and the two administrative cities.Two-stage stratified cluster sampling was used.Each region was stratified into urban and rural areas.In the selected EAs, a household listing operation was done, and the results were used as a sampling frame for household selection in the second stage.Finally, a fixed number of households per cluster was selected.Samples of EAs were selected independently in each stratum through implicit stratification and equal proportional allocation.

Study populations
In this study, all living children aged 12-23 months were the source population, and all sampled living children aged 12-23 months living with their mother were the study population.Details about the methodology of the data source, sampling procedure and source population were presented in 2016 EDHS report [36].

Dependent variable
Childhood vaccination among children aged 12-23 months.

Independent variables
Socio-demographic characteristics of households, such as wealth status, educational status of mothers, age of mother, region, residency, sex and age of children, birth interval and birth order, sex of households' heads, ANC visit, place of delivery, working status, visiting health facility, and media exposure were used as independent attributes to predict childhood vaccination mong children aged 12-23 months in Ethiopia.

Childhood vaccination
Childhood vaccination among children aged 12-23 months was assessed using one dose of BCG, three dose of polio vaccine, three doses of DPT vaccine, and one dose of measles vaccine.
Accordingly, the children had basic childhood vaccination if the children received at least one dose BCG vaccine, three doses of the polio vaccine, three doses of DPT vaccine, and one dose of measles vaccine, else children had not received basic childhood vaccination.Information on basic childhood vaccination status was obtained (1) written vaccination cards, (2) the mothers' verbal reports, and (3) health facility records [36].

Birth interval
The period between two successive live birth is birth interval.For this study, a birth interval of less than <33 months between two consecutive live births is short birth interval, whereas birth interval of 33 and above is an optimum birth interval [37].

ANC visit
The pregnant women had visited a health facility during their pregnancy for ANC services.
Accordingly, the women had adequate ANC visits when the women visited the health facility at least four times for ANC services, otherwise inadequate ANC visit [38,39].

Media exposure
If the mothers had access of either radio or television or both, then the mothers were had media exposure; otherwise unexposed to media [40].

Data management and statically analysis
Data cleaning and labelling were performed using STATA version 15 software to prepare the data for analysis.Variables were recoded to meet the desired classification.To ensure the representativeness of survey results at the national level [41], sampling weights were applied during the analysis.The STATA version 15 software was used for data management and logistic regression analysis.Weka version 3.8.6 software was used for data pre-processing, important attribute selection that could predict childhood vaccination and generating rules associated with the childhood vaccination.

Ethical approval and consent to participate
Ethical clearance was not necessary for this study since it was based on publicly available data sources.Informed consent from the study participants was also not applicable to this study.
There are no attributes that uniquely identify individuals or households in this study.As a result, specific ERs, individuals, and households cannot be identified uniquely (S1File).

Data pre-processing
Data pre-processing was used to remove missing and incomplete records.In the dataset, noise, outliers, and inconsistency are common.Therefore, all these unnecessary data values, including duplicate variables, were removed from the data set.At this stage, all strings and categorical variables were transformed into nominal data types.

Feature selection
In this study, there were two stages of variable selection in machine learning algorithm.At the first stage, a traditional logistic regression analysis was employed for feature or independent variable selection.A variable with a p-value of less than 0.05 with backward stepwise logistic regression analysis was selected as a candidate for further important attribute selection.In the second stage, a best-performance machine learning algorithm with information gain values was used to find important features or attributes that have a major contribution to predict childhood vaccination among children aged 12-23 months in Ethiopia.The highest information gain value of an independent attribute is the most important attribute to predict the childhood vaccination.
Then the next important attributes were selected based on their order of highest information gain value.

Data split and model selection
In this step, the datasets were splitted as training and testing data sets.Training and testing data sets were used for model classification and evaluation, respectively.
Various machine learning algorithms were used to predict child mortality and health service utilization [25,33,34].The various appropriate machine learning algorithms such as Naïve Bayes (NB), PART, logistic regression (LR), multilayer perceptron, J48, logit boost, random forest (RF), and addaboost were used to predict childhood vaccination among children aged 12-23 months in Ethiopia.

Naïve Bayes (NB)
Naïve Bayes algorithm is a supervised machine learning algorithm, which is based on Bayes theorem and used for classification and prediction of problems.In Naïve Bayes algorithm, attributes are conditionally independent for the target class [25].Naïve Bayes has a computational efficiency that number of attributes and classification time is linear with number of attributes, and not affected by training time.Naive Bayes algorithms had an incremental learning behaviour, could directly predict patterns with low variance, and its performance is measured by confusion matrix elements [42].

PART
PART is a hybrid approach of rule-based classification algorithm, and it uses separate and conquer classification process [35].It creates a partial decision tree from all the iterations and considers the suitable leaf into a rule.So, it is best for performing if/ then rules to extract and build knowledge for childhood vaccination [43].

Logistic regression (LR)
Logistic regression is type of regression model which is important to model the categorical dichotomous outcome variable or feature.LR is a statistical model used to classify and predict different parameters in health [44].It might be a binary (Binary logistic) and (multiple) model used to predict binary (multiple) outcome variable.LR has different assumptions, of which the target variable is dichotomous, and independent variables that affect the target variable are independent with each other [45].

J48 classifier algorithm
A J48 classifier algorithm is one of the best machine learning algorithms that examine categorical data based on top down recursive divide and conquer strategy [46].J48 classifier is a simple C4.5 decision tree for classification to creates a binary tree.The algorithm is crucial for classifying the problems, and the J48 algorithm is important to ignore the missing values and able to predict for the item of missing value based on what is known about the records of another attribute.The process is to divide the available data into range based on the attribute values for that item that are found in the training data, and then classification done and rules are generated from the attributes [47].

Random forest (RF)
A random forest is a supervised machine learning algorithm used to classify and predict health problem and health service utilization [48].RF is fastest to train and work with subsets of features.RF is important to detect complex relationships, including nonlinear and high-order interactions, and yield smallest prediction errors [49].

Addaboost and logit boost
Addaboost is an ensemble meta-learning method that enhance efficiency of binary classification tree.Addaboost uses an iterative approach to learn from the mistakes of weak classifiers, and turn them into strong ones [50,51].Ada Boosting is curtail to boost the performance of decision trees based on binary classification problems [52].Another very powerful boosting classifier algorithm (logit boost) was used to predict childhood vaccination in this study.Logit boost algorithm is designed as an alternative solution is to address the limitations of Addaboost in handling noise and outliers [53].The overall knowledge flow of model building for data processing, analyzing and visualizing are presented in figure 1.

Model evaluation
The performance of all the included algorithms has been evaluated using the confusion matrix.
The accuracy of actual and predicted classes has been visualized by the confusion matrix model [54].The predicted and actual classifications of under-five child mortality were compared using confusion matrix elements, such as true positive (TP), false positive (FP), true negative (TN), and false-negative (FN).The receiver operators' curve (ROC) was also used for model evaluation based on sensitivity, and specificity relationships.Since ROC is based on probability, the area under the ROC curve (AUC) is crucial to representing the degree or measure of separability.It tells how much the model is capable of distinguishing between classes.Hence, the higher the AUC, the better the model is at predicting true classes as true and false classes as false.Usually, the AUC value is good if it is greater than 80%, fair if it is between 70% and 80%, poor if it is between 60% and 70%, and failed if it is less than 60% [55].A metric of interrater agreement i.e. kappa statistics was used to measure the degree of agreement/ inter rater reliability, and used to evaluate the accuracy of a classification.If Kappa statistics value is ≤ 0 indicating the agreement is worse than random agreement, 0.01-0.20 slight agreement, 0.21-0.40fair agreement, 0.41-0.60moderate agreement, 0.61-0.80substantial agreement, and 0.81-1.00almost perfect agreement [56].
The formula for the confusion matrix's element is presented in box 1.

False positive (FP):
The model incorrectly predicts positive class in the response outcome.

True negative (TN):
The model correctly predicts negative class in the response outcome.

False-negative (FN):
The model incorrectly predicts negative class in the response outcome.
Sensitivity: Sensitivity is the test to measure correctly positive predicted events out of a total number of positive events, and it show that the value of how many positives are predicted out of total positive classes.Specificity: Specificity is the proportion of real negative cases that got predicted as the negative.This indicate that there will be another proportion of real negative cases, which would get predicted as positive and could be termed as false positives.
Precision: Precision is a positive predictive value, and it is the correct events divided by the total number of positive events that the classifier predicts.
F_measure: F measure is the inverse relationship between accuracy and recall.The higher value of the F-measure score predicts a better model.

Prediction and association rule mining
Once the model is built and its performance assessed, the death of children before their fifth birthday is predicted based on the independent predictors.Important variables selected based on a best-performance model were used to predict child mortality before their fifth birthday.
Although important variables are used to predict child deaths before their fifth birthday, the predictive model does not show which nominal variables are jointly associated with child deaths before their fifth birthday.Therefore, association rule mining analysis (the If (antecedent)/ then (consequent) statements) is used to discover relationships between seemingly independent relational attributes.
Association rule mining analysis is important for non-numerical and categorical types of data attributes.It is important to observe frequently occurring patterns, and identify the dependencies between attributes by supporting how frequently the if/then relationship appears in the observations, and confidence in the number of times the relationships have been found to be true.The if/ then association rule mining analysis is critically important to select important features that jointly determine under five child mortality and is the easiest way to interpret [57].
For the association rule mining analysis, apriori algorithm method was used to identify strong and frequently related attributes.The If then association rule is the pair of X and Y (X, Y) attributes expressed as X->Y, where X is an antecedent and Y a consequent that is as X happen Y is would also happen [58].These rules are critically important for the prevention and control of health problems and crucial for health policymakers' proactive decision-making purposes.
Various studies had widely used if/then rules in healthcare research, such as predicting childhood care and child mortality [59], predicting parasite infection [60], the pattern of new cases and stroke [61,62], and maternal healthcare service utilization and service discontinuation [63] to identify important features.The relationship between X and Y attributes is expressed in the following way [62].

Sociodemographic characteristics of the study participants
A total of 1617 weighted sample of children aged from 12-23 months were included for analysis.Three majority (62.52%) of children were born form mothers aged above 35 years.
The mean age of mothers was 3.53.The majority (72.5%) of children were born from mothers who had not had formal education.Seven hundred thirty (45.1%) and two hundred eighty-eight (17.8%) of children were from Oromia and Amhara regions, respectively.The majority (91.2%) of the children were born from rural residents' mothers.Nearest to five hundred fiftythree (34.2%) and four out of ten (40.3%) of children were born from mothers whose religious were Orthodox and Muslim, respectively.Seven hundred sixty (47%) of children' mothers were poor.Nearest to half (52.9%) and majority (86.2%) of children and households heads were female and male, respectively.Six hundred seventy-five (41.7%) of children were under the age of 12-15 months, and their mean age was 1.844 (Table 1).

Children's and mothers' characteristics
Less than half (47.2%) of children were visited health facility in the last 12 months after birth, and the majority (70.6%) of children's mother had not work during the time of interview.Only 29.8% of children's mother had media exposure, and 29% of mother had gave birth health institutions.The majority (70.6%) of the mothers had not adequate ANC visits during their pregnancy period.The majority (64.5%) of children had birth order less than five, and 65.1% of children had an optimal birth interval (Figure 1).

Importance attributes of childhood vaccination in Ethiopia
The information gain coefficients with 10-cross fold validation process were used to select important attributes of childhood vaccination in Ethiopia.

Discussion
For this study, the 2016 EDHS dataset was used, with a total of 1617 observations.A 70% and 30% of total observations were used as training, and testing data sets, respectively.The aim was to evaluate machine learning algorithms using the confusion matrix element, and select important attributes that could accurately predict childhood vaccination among children aged 12-23 months in Ethiopia.Accordingly, eight machine learning algorithms were included to achieve the study objectives.
The included eight machine learning algorithms were evaluated and compared by their classification accuracy and AUR score values.Hence, the accuracy and AUR value of the PART algorithms were 95.53% and 91.89% with 10-fold cross validations, respectively.
Hence, the PAR algorithm was the first accurate model to predict childhood vaccination among children aged 12-23 months.This finding was agreed with study done about data classification and terms of association [35].The j48, multilayer perceptron, and random forest algorithms were the second, third and fourth best machine learning algorithm to predict childhood vaccination with 89.24%, 87.20%, and 82.37% of accuracy, respectively.This finding was supported by various studies conducted to predict under-five child mortality [29,32,64]., contraceptive discontinuation [63], stunting and malnutrition among children [65][66][67].
The second objective of the study was to select important attributes that could predict childhood vaccination among children aged 12-23 months in Ethiopia.From the attributes selected to predict childhood vaccination, adequate ANC visits, deliveries at health facilities, visits to HF, higher education of mothers, rich wealth status, children from urban areas, female household heads, a mother's age greater than 35 years, a child's birth order less than five, and mothers currently working were important attributes to predict vaccination of children aged 12-23 months in Ethiopia.
Adequate ANC visits were the top-ranked attribute to predict childhood among children aged 12-23 months in Ethiopia, with a 0.087 information gain value.This finding was agreed upon with the previous similar studies done in Ethiopia [14,68], and Zimbabwe [69].This might be due to women who attend ANC follow-up might get counseling about child immunization [70], and mothers might receive adequate education about the importance of postnatal visits and activities [71].Institutional delivery was the second-most important attribute to predict childhood vaccination.This finding is supported by similar studies done in Ethiopia [8,14], and Nigeria [72].This might be because some vaccines such as, BCG and OPV 0 are often given immediately after birth at health facilities [70].
Visiting HF was the third most important attribute to predict childhood vaccination in Ethiopia.
This finding was in line with studies done in Ethiopia [14], and similar resource-limited settings [73,74].This might be due to the fact that mothers who visit a health facility might receive adequate education and counseling about child immunization, and mothers after birth are recommended to visit a health facility for postnatal check-ups and services.
Higher educational status of mothers was the fourth important attributes to predict childhood vaccination among children aged 12-23 months.This study also similar with study done in Bangladesh that maternal education are important features to predict anemia among under five years of children [75].Another study done in India also support this finding of the study [76].
This might be due to educated mothers might know the importance of vaccine for child care, and educated mothers might empower them and feel free to make decisions to visit the health facility for child health services [77].Being rich and urban resident were the fifth and sixth important attribute to predict childhood vaccination among children aged 12-23 months in Ethiopia.This finding was similar with study done in Bangladesh [75], and Ethiopia [8].This might be because mothers from Urban areas might have more access to media, which play a very vital role in disseminating educational information and creating awareness [78,79].
Therefore, children in urban areas are more likely to uptake vaccines.
Generating rules for childhood vaccination was the third objective of the study.Consequently, seven association rules were generated to determine vaccination status among children aged 12-23 months in Ethiopia.According to association rule 1, the probability of a childhood vaccination would be 86.73%,if and only if mothers' wealth status was rich, mothers had adequate ANC visits, and children were urban residents.This might be because women with rich wealth status might be able to afford to pay any costs needed for vaccination, mothers who had adequate ANC visits might had adequate awareness and knowledge about child vaccination during their health facility visits during their pregnancy period, and health facilities in urban areas might be easily accessible for mothers to vaccinate their children.The effects of these three attributes are critical for childhood vaccination, and the combination of these factors might make it particularly important for children to be vaccinated when they were under 12 to 23 months.Based on Rule 2, childhood vaccination would be 82.14%,if mothers gave birth at a health institution, children were from higher-educated mothers, and if the household head was female.The if/ then rules are critical to discovering hidden relationships between attributes, extracting knowledge from a set of data, and accurately representing knowledge and information about vaccination of children.The findings presented in this study are critically important for policymakers and stakeholders to support public health action, decision-making purposes, and the storage of knowledge regarding child vaccination status.
work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS ONE for specific examples.
Funded studiesEnter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• NO -Include this sentence at the end of your statement: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.• YES -Specify the role(s) played.• * typeset Competing Interests Use the instructions below to enter a competing interest statement for this submission.On behalf of all authors, disclose any competing interests that could be perceived to bias this work-acknowledging all financial support and any other relevant financial or nonfinancial competing interests.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate and that any funding sources listed in your Funding Information The author declared that there are no competing interests in this work.Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation later in the submission form are also declared in your Financial Disclosure statement.View published research articles from PLOS ONE for specific examples.NO authors have competing interests Enter: The authors have declared that no competing interests exist.Authors with competing interestsEnter competing interest details beginning with this statement: I have read the journal's policy and the authors of this manuscript have the following competing interests: [insert competing interests here] /A" if the submission does not require an ethics statement.General guidance is provided below.Consult the submission guidelines for detailed instructions.Make sure that all information entered here is included in the Methods section of the manuscript.N/A Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Format for specific study types Human Subject Research (involving human participants and/or tissue) Give the name of the institutional review board or ethics committee that approved the study • Include the approval number and/or a statement indicating approval of this research • Indicate the form of consent obtained (written/oral) or the reason that consent was not obtained (e.g. the data were analyzed anonymously) • Animal Research (involving vertebrate animals, embryos or tissues) Provide the name of the Institutional Animal Care and Use Committee (IACUC) or other relevant ethics board that reviewed the study protocol, and indicate whether they approved this research or granted a formal waiver of ethical approval • Include an approval number if one was obtained • If the study involved non-human primates, add additional details about animal welfare and steps taken to ameliorate suffering • If anesthesia, euthanasia, or any kind of animal sacrifice is part of the study, include briefly which substances and/or methods were applied • Field Research Include the following details if this study involves the collection of plant, animal, or other materials from a natural setting: Field permit number • Name of the institution or relevant body that granted permission • Data Availability Authors are required to make all data underlying the findings described fully available, without restriction, and from the time of publication.PLOS allows rare exceptions to address legal and ethical concerns.See the PLOS Data Policy and FAQ for detailed information.Yes -all data are fully available without restriction Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation A Data Availability Statement describing where the data can be found is required at submission.Your answers to this question constitute the Data Availability Statement and will be published in the article, if accepted.Important: Stating 'data available on request from the author' is not sufficient.If your data are only available upon request, select 'No' for the first question and explain your exceptional situation in the text box.Do the authors confirm that all data underlying the findings described in their manuscript are fully available without restriction?Describe where the data may be found in full sentences.If you are copying our sample text, replace any instances of XXX with the appropriate details.If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs.If this information will only be available after acceptance, indicate this by ticking the box below.For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).• If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.• If neither of these applies but you are able to provide details of access elsewhere, with or without limitations, please do so.For example: Data cannot be shared publicly because of [XXX].Data are available from the XXX Institutional Data Access / Ethics Committee (contact via XXX) for researchers who meet the criteria for access to confidential data.The data underlying the results presented in the study are available from (include the name of the third party • All relevant data are within the manuscript and its Supporting Information files.Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation and contact information or URL).This text is appropriate if the data are owned by a third party and authors do not have permission to share the data.• * typeset Additional data availability information: Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Dear Plos one journal This is to submit a manuscript entitled Predicting childhood vaccination among children aged 12-23 months in Ethiopia: Using machine learning algorithms for consideration of publication in the journal.
A 7:3 classification rules were considered for training and testing dataset classification and model building.A total of 1617 instances/ observations were included to predict childhood vaccination.From a total of 1617 observations, 1132 observations (70% of total observations) were used as training dataset, and the remaining 485 observations (30% of total observations) were used as testing dataset.
>1X and Y positively associated to determine under five child mortality.if the left attribute <1X and Y negatively associated to determine under five child mortality.If the left attribute=1No relation between X and Y to determine under five child mortality.The detail of data preparation, model building, important variable selection, and analysis work flow is presented in figure 2.

Figure 2 :
Figure 2: Work flow for data pre-processing, and child death prediction processing.

Figure 1 :
Figure 1: Children's and mothers' characteristics Vaccination coverage among children aged 12-23 months in Ethiopia

Figure 2 :
Figure 2: Children aged 12-23 months vaccinated with recommended vaccination types Models performance to predict childhood vaccination in Ethiopia using 2016 EDHS data

Figure 3 :
Figure 3: Comparison the eight machine learning algorithms using AUR value.

Table 3 :
Information gain value for each predictor variablesThe association rule generation process was done based on important attributes selected by bet performing machine learning model.A total of seven association rules were generated, and the details of the rule were presented in box 1.