Identifying intentional injuries among children and adolescents based on Machine Learning

Background Compared to other studies, the injury monitoring of Chinese children and adolescents has captured a low level of intentional injuries on account of self-harm/suicide and violent attacks. Intentional injuries in children and adolescents have not been apparent from the data. It is possible that there has been a misclassification of existing intentional injuries, and there is a lack of research literature on the misclassification of intentional injuries. This study aimed to discuss the feasibility of discriminating the intention of injury based on Machine Learning (ML) modelling and provided ideas for understanding whether there was a misclassification of intentional injuries. Methods Information entropy was used to determine the correlation between variables and the intention of injury, and Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Adaboost algorithms and Deep Neural Networks (DNN) were used to create an intention of injury discrimination model. The models were compared by comprehensively testing the discrimination effect to determine stability and consistency. Results For the area under the ROC curve with different intentions of injuries, the NB model was 0.891, 0.880, and 0.897, respectively; the DT model was 0.870, 0.803, and 0.871, respectively; the RF model was 0.850, 0.809, and 0.845, respectively; the Adaboost model was 0.914, 0.846, and 0.914, respectively; the DNN model was 0.927, 0.835, and 0.934, respectively. In a comprehensive comparison of the five models, DNN and Adaboost models had higher values for the determination of the intention of injury. A discrimination of cases with unclear intentions of injury showed that on average, unintentional injuries, violent attacks, and self-harm/suicides accounted for 86.57%, 6.81%, and 6.62%, respectively. Conclusion It was feasible to use the ML algorithm to determine the injury intention of children and adolescents. The research suggested that the DNN and Adaboost models had higher values for the determination of the intention of injury. This study could build a foundation for transforming the model into a tool for rapid diagnosis and excavating potential intentional injuries of children and adolescents by widely collecting the influencing factors, extracting the influence variables characteristically, reducing the complexity and improving the performance of the models in the future.


Methods
Information entropy was used to determine the correlation between variables and the intention of injury, and Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Adaboost algorithms and Deep Neural Networks (DNN) were used to create an intention of injury discrimination model. The models were compared by comprehensively testing the discrimination effect to determine stability and consistency.

Results
For the area under the ROC curve with different intentions of injuries, the NB model was 0.891, 0.880, and 0.897, respectively; the DT model was 0.870, 0.803, and 0.871, respectively; the RF model was 0.850, 0.809, and 0.845, respectively; the Adaboost model was 0.914, 0.846, and 0.914, respectively; the DNN model was 0.927, 0.835, and 0.934, respectively. In a comprehensive comparison of the five models, DNN and Adaboost models had higher values for the determination of the intention of injury. A discrimination of cases with unclear intentions of injury showed that on average, unintentional injuries, violent attacks, and self-harm/suicides accounted for 86.57%, 6.81%, and 6.62%, respectively.

Conclusion
It was feasible to use the ML algorithm to determine the injury intention of children and adolescents. The research suggested that the DNN  the determination of the intention of injury. This study could build a foundation for transforming the model into a tool for rapid diagnosis and excavating potential intentional injuries of children and adolescents by widely collecting the influencing factors, extracting the influence variables characteristically, reducing the complexity and improving the performance of the models in the future.

Background
Injuries can be classified as unintentional or intentional injuries. Intentional injuries include violent attacks and self-harm/suicide. Children and adolescents are high-risk populations for injuries. Violent attacks and abuse against children and adolescents often involve covert crimes; self-harm/suicide is often related to adverse childhood experiences (ACEs) [1,2]. Violent attacks and self-harm/suicide represent higher disease burdens in adolescents than in other age groups [3]. In recent years, intentional injuries have also received widespread attention, as Chinese media outlets have reported that children and adolescents have increased physical violence, school bullying, self-harm/suicide and other issues. The difference in injury intentions can help us better understand the different types of injury mechanisms, facilitate the development of interventions and advance prevention work. However, the distinction between intentional and nonintentional injuries is not always easy or effective. Research based on data estimates has shown that at least half of children in Asia, Africa and North America experienced violence in the past year [4]. A meta-analysis of global data found that child sexual abuse was 30 times higher and that physical abuse was 75 times higher than in official reports [5,6]. Studies of Chinese children and adolescents showed that the overall prevalence of suicidal ideation and attempts was 16.10% and 3.60%, respectively; 13.20% reported having been threatened or injured by violence in schools [7]. A South Korean study reported that 10.50% of teen injuries reported by outpatient/emergency rooms were intentional injuries [8]. Amanullah S et al. [9] reported that 10.00% of children who were injured in the school environment were injured intentionally based on the National Electronic Injury Monitoring System. In many countries, the true severity of intentional injury problems is greatly underestimated. On the one hand, there is the possibility of underreporting or misreporting because data often come from passive reporting by health systems; on the other hand, there may be some misclassification of injury intentions for various other reasons. Compared with other studies, injury monitoring from outpatient/emergency rooms among Chinese children and adolescents captured low levels of intentional injury (4.84%) [10]. Doctors play a key role in identifying and preventing the recurrence of intentional injuries. However, because unintentional and intentional injuries have many common mechanisms of injury, this distinction can be very challenging for doctors [11]. For example, road traffic injuries may be unintentional injuries, or they may involve a violent attack or self-harm/suicide. In practice, doctors usually make judgements by asking patients or guardians for information to inform the clinical diagnosis, but this information can be subjective. Although the distinction between intentional and unintentional injury is valuable, the commonality to the subfields makes the distinction difficult. This intent distinction is not always easy or effective. It does not rule out the existence of misclassification of intentional injuries.
The study of the misclassification of intentional injuries in the literature is relatively lacking. Machine Learning (ML) is an interdisciplinary subject developed in recent years and is the core of artificial intelligence. To date, no relevant study reports using the ML algorithm to judge the intention of injury have been offered.
This study intended to understand whether there was a misclassification of intentional injuries, to explore the actual level of intentional injury by extracting the characteristics of injury cases among children and adolescents from outpatient/emergency rooms, to calculate the contribution of the independent variable classification of intention of injury, and to use Naive Bayes (NB), Decision Tree (DT), Random Forest (RF), Adaboost algorithms and Deep Neural Networks (DNN) to model the discrimination between intentional and unintentional injury to screen for the best model. This study was expected to objectively discriminate between intentional and unintentional injury through the model, to reduce the false positive rate of unintentional injury, and to effectively avoid the wrong classification; meanwhile, this study aimed to identify the potential intentional injuries of children and adolescents through discrimination models.

Data collection
Cases aged 0-17 who were diagnosed as injured were used from the Chinese National Injury Surveillance System (NISS) in Zhuhai City, China, from January 1, 2006, through December 31, 2017. NISS collected injury cases on the initial visits for all injuries in emergency rooms and outpatient clinics in 3 sentinel hospitals.
There were three types of intents resulting in injuries. Unintentional injury meant an injury due to accidental events. Self-harm/suicide refers to injuries inflicted by the patient who was known to be injured, either directly or indirectly, by some positive or negative action that could have resulted in injury or death. Violent attacks meant the patient had been deliberately attacked or violently injured by another person. Either self-harm/suicide or violent attack was classified as intentional injury.

Ethics statement
This study protocol had been approved by the Ethics Committee of Center for Disease Control and Prevention in Zhuhai, and followed the tenets of the Declaration of Helsinki. When the doctor treated the injured patient, it was accompanied by the data collection process, with the nurse and the patient's guardian at the scene. The patient's age, gender, time and location of the injury were included in the general medical record of the consultation. Verbal informed consent was obtained from parents whose children and adolescents aged <18 years after the nature of the study had been explained. Our research would not cause damage to the patients, and the personal privacy of the patient's name, address, contact information, and so on was not involved in the research. The verbal informed consent instead of written consent obtained from the children's guardian had been approved by the ethics committee.
Data pre-processing and characterization. The Educational level and nature of the injury contained one missing value, which was filled with the majority. The epidemiological characteristics of the factors were described from the perspective of the intention of injury classification. It was calculated to characterize the differences between different intentional injuries (including age group, gender, region, time of injury, etc.), and using a χ 2 or fisher exact test, p < 0.05 was considered statistically significant.
Calculate the contribution of the predictor. Using information entropy to determine the correlation between variables and injury intentions (calculating the contribution of predictors), the purpose was to understand the correlation between discrete features, to filter out variables that were unmeaningful to the classification, and to reduce redundant features. The correlation between discrete features was described by the information gain and information gain ratio in the classic algorithm of the decision tree. Some of the "non-discriminative" variables were removed from the model. The predictive variables with a contribution index <0.007 were removed from the model. On the one hand, the efficiency of the modelling program was improved, and on the other hand, when the model was built using all variables, it was found that the variables did not participate in the final prediction.
Training\Building model. Data on the intention of injury = 1/2/3 (1 = unintentional injury, 2 = self-harm/suicide, 3 = violent attack) were used for training. K folds cross-validation method (5 folds) was used to train the data set. Since the data imbalance (proportion of intention = 1 >90.00%) caused the characteristics of the sample with sparse samples to be insufficiently learned, and the prediction effect deteriorated, it was improved by resampling in the training set.
NB, DT, RF, Adaboost algorithms and DNN were applied to establish a classification model and to provide model evaluation indicators such as false positive probability confusion matrix, recall, precision, and accuracy. In addition, for unbalanced data sets, it was necessary to investigate the F1-score value. The Receiver Operating Characteristic (ROC) curve and the area under the curve were drawn and calculated.
Confusion matrix: Schematic Model test. The data with intention of injury = 4/5 (4 = unclear, 5 = others) was used to test whether the prediction of different algorithm data was stable and consistent. The result of the comparison between the two algorithms was output.
Unintentional injuries, violent attacks and self-harm/suicide among children and adolescents in different regions, of different genders, with different household registrations, and with different education levels and so on had different numbers of injuries (p<0.001). There was no difference in whether the injury occurred on weekdays/weekends (χ 2 = 1.92, p = 0.383). Falls accounted for 50.37% of unintentional injuries, followed by animal bites (13.81%) and road traffic injuries (12.02%). Blunt injuries accounted for 59.43% of violence/attacks. Knife/sharp injuries and poisoning accounted for 37.07% and 23.41% of self-injury/suicide, respectively (Table 1).

Modelling to discriminate intentional from unintentional injuries based on Machine Learning
The contribution of predictors of mechanisms of injuries, educational level, occupation, age group, places where injury occurred, injured body part, activities at the time of injury, and the nature of the injury was 0.0955, 0.0401, 0.0377, 0.0357, 0.0271, 0.0202, 0.0165, and 0.0121, respectively. The other 8 predictive variables with a contribution index <0.007 were removed from the model to reduce redundant features during modelling and improve the program running efficiency ( Table 2).
The accurate probability of discrimination for true value = 1 (unintentional injuries) by the Adaboost and DNN models was 0.980 and 0.977, respectively; the NB model had the highest accuracy (0.502) for the true value of the intention of injury = 2 (self-harm/suicide) and the highest accuracy (0.658) with the true value = 3 (violent attacks), but the accuracy (0.925) with the true value = 1 (unintentional injuries) was lowest ( Table 3).
The RF and Adaboost models had the highest accuracy (0.956), followed by the DNN and DT models (0.955), and the NB model (0.911). For the precision, in general, when the intention of injury = 1, the positive predictive value tended to be predicted correctly, and the intention of injury = 2/3 was opposite, which was related to the imbalance of classification of the intention of injury in the data. For the recall, the sensitivity of the DNN model was higher relatively. The F1-score of the five models was >0.950 when the intention of injury = 1, but they were lower when the intention of injury = 2/3 ( Table 4 and Fig 1).
For the area under the ROC curve with intention of injury = 1/2/3, the NB model was 0.891, 0.880, and 0.897, respectively; the DT model was 0.870, 0.803, and 0.871, respectively; the RF model was 0.850, 0.809, and 0.845, respectively; the Adaboost model was 0.914, 0.846, and 0.914, respectively; the DNN model was 0.927, 0.835, and 0.934, respectively. Comparing the five models, the DNN and Adaboost models had higher predictive values for intention of injury determination (Fig 2).

Model test
The cases with the intention of injury = 4/5 (unclear/others) were judged by the five models. The Consistency rate between the DT and Adaboost models was 96.83%, between DT and DNN models was 94.61%, and between Adaboost and DNN models was 94.41% ( Table 5). The average percentage judged by the five models for the intention of injury = 1/2/3 was accounting for 86.57%, 6.81%, and 6.62%, respectively.

The significance of the study on the intention of injury
Study has shown that unintentional injuries are the most common cause of injury-related deaths (57.00%) among children and adolescents, and 43.00% of injuries were intentional injury [12]. Studies in Pakistan have shown that intentional injuries account for 8.20% of the outpatient/emergency injuries of children �18 years old [13]. South Korea's outpatient /emergency treatment injury study captured 10.50% of intentional injuries [8]. Gallaher JR et al. [14] analysed all injuries in a traumatic centre in Malawi <18 years old, showing that intentional injuries accounted for 8.10%. In this study, only 4.88% of injury cases among children and adolescents were intentional injuries as judged by doctors; another 0.83% of injury cases were others/unclear as for the intention of injury. Compared with other countries and studies, this study captured low levels of intentional injuries in the outpatient / emergency treatment population among children and adolescents. The possible reasons were that if the patient did not seek medical help after the injury, it was difficult to collect cases of injuries, especially selfharm/suicide; children and adolescents seeking medical help may intentionally conceal the true intention of their injury and may self-report the injury as an unintentional injury because of the stigma of the intentional injury, resulting in misclassification of injuries; some violent abuses, for sensitive reasons, may be reported by guardians as unintentional injuries to hide potential criminal behaviour, leading to misclassification bias.  Studies showed that children between the ages of 6 and 16 were often subjected to physical attacks [10], and teenagers may experience school-based violence or bullying [15]. In addition, in cases of child abuse, it was often claimed that the injury was caused by an accident [16]. Child sexual abuse in China has been a common and serious problem, with a combined incidence of childhood sexual abuse of 18.20% [17]. Although there are children suspected of having traumatic wounds that have been caused by violent attacks, research on the intent of these injuries is relatively lacking in the literature. In many countries, the true severity of intentional injury problems has been greatly underestimated. Data often come from passive reporting by health systems with the possibility of underreporting or misreporting, and there are other general misclassifications of the intention of injury.  The accurate discrimination of the intentions of injury intentions is of great significance. First, it helps to detect potential intentional injuries in children and adolescents in a timely manner, such as abuse, campus bullying, and self-harm/suicide. Second, it helps to provide timely implementation of case tracking, of finding illegal activities, of psychological counselling for children and adolescents, and of peer education. Third, it reduces the occurrence of intentional injury behaviours of children and adolescents, protects children's physical and mental health, and achieves effective ways to protect children's rights and sustainable development goals. Finally, it is of great significance for reducing the burden of diseases such as death and disability caused by intentional injury to children and adolescents, and even to promote physical and mental health after adulthood.

The Machine Learning application of discriminants of the intention of injury
In terms of injury-related research, the ML model was developed to help identify injury cases in narrative texts and classify the mechanisms that cause injuries in a timelier manner [18]. In addition, studies have used ML algorithms to predict injury outcomes and burn mortality [19,20]. In terms of the discrimination of the intent of injury, Paek SH et al. [21] reported a relatively low rate of suspected child abuse and developed a child abuse screening tool called "FIND". The Influencing factors of "FIND" included physical examination, contradictory injury mechanisms, delayed visits, inappropriate guardianship, poor child hygiene, and long-term bone injuries in the head or other parts of the child. Kim PT et al. [22] developed a logistic model to assess potential predictors of increased risk of intentional injury. It was found that 1/4 of infants with head injuries in public places and no witnesses were identified as victims of intentional injury. This model was used to routinely screen high-risk populations to avoid missing intentional injuries. Bousema S et al. [23] screened suspicious child abuse in children with burns and confirmed that 9.00% were suspected of abusing or neglecting children. However, as of now, no relevant research reports using an ML algorithm to judge the intention of injury and mine potential intentional injuries had been found. This study hoped to establish a model through ML to discriminate the intention of injury, to reduce the false positive rate of intention of injury, and to effectively avoid the misclassification. Model discrimination can also exploit potential intentional injuries of children and adolescents.

Injury intention discrimination modelling
For the mechanisms of injuries, study has shown that among adolescents, 61% of intentional firearm deaths resulted from homicide and 98% of intentional suffocation deaths resulted from suicide [12]. Studies of intentional injury in adolescents indicated that self-inflicted poisoning was common in adolescents and was a risk factor for suicide [24,25]. In addition, the most common methods of self-harm/suicide were cuts and poisoning; blunt injuries mainly involved violent attacks. For the nature of injury, some studies found that soft tissue damage was most common in violent attacks, and cuts and fractures were more common in self-harm [16]. Gallaher JR et al. [14] showed that the average age of intentional injury patients was than that of patients with unintentional injuries, with intentional injuries reported more often in men than women; among intentional injuries, there were more injuries at night and more injuries with soft tissue damage, with head injury the most common cause of death. Studies had also shown that intentional injuries in rural students was higher than in urban students [7]. Therefore, based on the feature extraction of the influencing factors of intention of injury, this study found that there were differences in the intentions of injury of children and adolescents of different age groups, genders, regions, household registration, injury body sites, etc. The mechanisms of injury were different with the different intentions of injury. For example, children and adolescents who committed suicide often chose to cut the upper limbs with sharp objects, and blunt instruments were often used to hit the victim's head when violent attacks happened. Therefore, the model included 16 variables in the very beginning. To reduce redundant features and to improve program operational efficiency, the model described the factors with the most influence on the intention of injury, and the 8 predictive variables with a lower contribution were removed from the model. The mechanisms of injury and the patient educational level, occupation, age group, and the place of injury were used as predictors.
The NB, DT, RF, Adaboost and DNN models in the ML algorithm were used to establish the model to discriminate the intentions of injury. The discrimination of the intentions of injury was a multi-classification that used various variables as the influencing factors to predict the three kinds of intentions of injury. The influencing factors were all categorical variables. In this study, the classical algorithms in ML were suitable for discriminative modelling. For example, NB was suitable for modelling categorical variables. DT constructed a sequence of trees, each of which learned to compensate for the errors left by the previous tree and got the classifier. RF and Adaboost were comprehensive lifting algorithms to integrate multiple weak classifiers. DNN combines low-level features to form a more abstract high-level to represent attribute categories or features. After comparing the model evaluation indicators, DNN and Adaboost models had a good discriminating ability for determining the intention of injury.
The intentions of injury in this study were unbalanced. If unbalanced data were used to model directly, then the discriminant results would easily be biased towards a larger number of categories. If the model cannot effectively identify the intentional damage, then the model is meaningless. The random sampling and processing were used for the unbalanced data, and thus the modelling effect tended to be good. The average proportion of unintentional injury, self-harm/suicide and violent attacks was 86.57%, 6.81%, and 6.62%, respectively using the model to determine the injury intentions of the cases for which intentions were unclear. The judged proportion of intentional injury in the study was similar to that of reported by Han H et al. [8], Amanullah S et al. [9], and Gallaher JR et al. [14] Whether the result of the determination of the intention of injury was in line with the actual situation needs to be further explored in future research.

Limitations
There were many factors that affected the intentions of injuries, such as psychological and emotional factors, severe punishment of parents, low parental monitoring, parental migration patterns and parent-child attachment levels, and some factors that have not yet been recognized. This information was not included in the report, which theoretically reduced the performance of the model to discriminate the intentions of injury. Modeling with retrospective data mean that the test data was not verification, but the results of the percentage of intentional injuries derived from the test data were similar to other studies, and the verification model was our next research direction.

Conclusion
This study used the ML algorithm to determine the intentions of injury in children and adolescents. It suggested that the DNN and Adaboost models had higher values for the discriminating the intentions of injury. It was expected to transform the model into a tool for rapid diagnosis and to further discriminate the intention of injury through the model and to explore the potential intentional injury of children and adolescents.