Figures
Abstract
Physical literacy in children has become a significant research topic in both education and psychology. Recently, machine learning, as a cutting-edge AI technology, has started to play a crucial role in these fields. This study aimed to apply machine learning models to predict physical literacy in 4–6-year-old children and to comprehensively analyze the influence of individual and family factors. We evaluated the physical literacy of 1,734 children aged 4–6 and systematically examined the impact of both individual factors (such as gender, age, body type, sedentary behavior, screen time, moderate-to-vigorous physical activity (MVPA), sleep duration, and sleep quality) and family factors (such as parents’ education level, occupation, exercise frequency, support for children’s physical activity, household annual income, and family exercise environment) using various machine learning models. Results showed that the ensemble learning model achieved the best performance in predicting physical literacy, with an AUC of 86.2%. Among all predictive factors, mother’s exercise frequency, family exercise environment, and time spent on MVPA were identified as the most important. These findings provide new insights into enhancing children’s physical literacy and underscore the critical role of family environment and lifestyle in its development.
Citation: Wang X, Jiang Y (2025) Application of machine learning models in predicting physical literacy in 4–6-year-old children: A comprehensive analysis of individual and family factors. PLoS One 20(9): e0332997. https://doi.org/10.1371/journal.pone.0332997
Editor: Timoteo Salvador Lucas Daca, Faculty of Physical Education and Sports at Pedagogical University of Maputo, MOZAMBIQUE
Received: September 4, 2024; Accepted: September 8, 2025; Published: September 30, 2025
Copyright: © 2025 Wang, Jiang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data files are available from the Kaggle database (https://www.kaggle.com/datasets/wangxiaofen/machine-learning-in-predicting-physical-literacy).
Funding: This work was supported by the Fujian Provincial Philosophy and Social Sciences Planning Project (FJ2025C129 to X.W.) and the 2024 General Research Project of Fujian Provincial Education Science Planning (FJJKBK24-112 to X.W.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
In the field of physical literacy research, British scholar Whitehead, drawing from monism, existentialism, and phenomenology, defined physical literacy as the combination of motivation, confidence, physical competence, knowledge, and understanding needed to maintain lifelong physical activity [1]. The Australian Physical Literacy Framework further breaks down physical literacy into physical, psychological, social, and cognitive domains [2]. Although various definitions exist, they all share a holistic view, emphasizing the psychological, physical, and cognitive attributes necessary for lifelong participation in physical activities [3]. Physical literacy, therefore, can be viewed as a continuous journey, where its components interact to foster lifelong engagement in physical activity, ultimately contributing to population health, well-being, and quality of life.
With the global spread of physical inactivity, particularly among younger children [4], fostering physical literacy has become a key area in promoting children’s health [5]. For children aged 4–6, who are at a critical stage in developing physical literacy, their rapid physical and cognitive growth offers an essential opportunity for early cultivation. Thus, focusing on this early life stage is crucial to ensuring their future active participation in physical activities and maintaining good health [6].
Recent studies have highlighted the importance of physical literacy. Integrating physical literacy into active school recesses has been shown to significantly improve children’s physical fitness and academic achievement [7]. A systematic review and meta-analysis also confirmed that physical literacy is significantly and positively associated with cardiorespiratory fitness in children and adolescents [8]. However, despite widespread recognition of the importance of physical literacy, research has primarily focused on adolescents and adults [9], leaving a noticeable gap in studies on preschool children aged 4–6. Additionally, the influence of family environment and parental behavior on children’s physical literacy has increasingly gained attention. Studies have shown that parents’ exercise habits, the family’s exercise atmosphere, and their support for children’s physical activity significantly impact children’s physical literacy, with objective measurements revealing a strong correlation between parents’ and children’s physical activity levels [10]. Moreover, the family exercise environment, especially aspects related to physical activity, such as parents’ exercise preferences and household setups, plays a critical role in shaping children’s physical behavior [11]. However, existing research has predominantly focused on school curricula [12,13] and teacher-related factors [14], with limited exploration of how the family environment directly influences early physical literacy development. Family factors, including parents’ attitudes, behaviors, and the home environment, have a direct and lasting influence, especially for children aged 4–6, who rely heavily on their families [15–17]. Therefore, expanding the scope of research to include family environment factors, particularly parental exercise habits, is essential to understanding their impact on preschool children’s physical literacy.
With the rapid advancement of data science and AI, machine learning has demonstrated considerable advantages in handling large-scale, complex data, and is gradually being applied in children’s health research [18]. Compared to traditional statistical methods, machine learning can identify more complex and hidden health risk factors, thereby improving data analysis accuracy and prediction reliability [19]. For example, Guerrero et al. used decision tree analysis to study Canadian children’s adherence to 24-hour movement guidelines during the COVID-19 pandemic, finding that parental perceptions significantly influenced children’s compliance with exercise guidelines [20]. Similarly, Bitew et al. employed various machine learning algorithms to predict malnutrition in under-five children in Ethiopia, with the xgbTree algorithm showing superior predictive ability in identifying key factors [21]. Xu and Sun used machine learning methods to explore the relationship between physical fitness and academic performance in primary school students, successfully predicting the correlation between these variables [22].
Despite the promising potential of machine learning in children’s health research, current studies remain in the early stages and are largely confined to specific applications. Particularly in preschool children’s physical literacy, the application of machine learning is still relatively limited, with related research mainly focusing on school-aged or older samples [23,24]. Moreover, the existing literature on physical literacy largely relies on traditional approaches [2].Most existing studies rely on single models (as above), missing out on the advantages of ensemble learning. By integrating results from multiple algorithms, ensemble learning enhances model stability and prediction accuracy [25], and recent evidence has demonstrated the advantages of ensemble learning approaches in children’s health [26]. However, the potential of ensemble learning models in handling complex, multidimensional data related to children’ s health has yet to be fully explored [27].
This study aimed to systematically analyze and compare the predictive effects of individual and family factors on physical literacy in preschool children aged 4–6 using ensemble machine learning models.
Research methods
Research subjects
This study employed a cross-sectional observational design with a stratified cluster sampling method to select participants from 18 kindergartens across nine cities in Fujian Province, including Fuzhou, Xiamen, Putian, Sanming, Quanzhou, Zhangzhou, Nanping, Longyan, and Ningde. In each city, two kindergartens were selected to represent both main urban districts and urban–rural fringe areas based on administrative divisions [28], with “main urban districts” referring to the core urban areas and “urban–rural fringe areas” referring to the transitional zones between urban and rural areas. One class each from the large, medium, and small age groups was randomly selected from each kindergarten. A total of 1,885 children were initially recruited. After excluding 151 questionnaires due to missing data or patterned responses, 1,734 valid cases were retained for analysis. The required sample size was evaluated with reference to two principles: (1) scale development best practices suggest a minimum of 5–10 participants per item [29], and given that this study employed the 30-item PL-C Quest and the 23-item Family Environment Scale, the minimum required sample size would be 265–530; (2) the events per variable (EPV) principle recommends at least 10 outcome events per predictor variable [30], and with 16 predictors, the minimum required sample size would be about 160. The actual sample size of 1,734 preschool children far exceeded these requirements, ensuring sufficient statistical power and robustness. Parents were invited to participate in the study, and informed consent was obtained before they completed the survey. The inclusion criteria were children aged 4–6 years, excluding those with physical developmental disabilities, intellectual disabilities, genetic disorders, or severe organic diseases. The data collection was conducted between November 2023 and May 2024.This study was ethically reviewed and approved by the Scientific Research Promotion Department at Chengyi College, Jimei University.
Research tools
Basic information questionnaire.
We designed a “Basic Information Questionnaire” tailored to the study’s objectives. This questionnaire gathered data on children’s personal information, including gender, age, kindergarten location, body type, sedentary behavior, screen time, moderate-to-vigorous physical activity (MVPA) time, sleep duration, and sleep quality. It also collected family-related information, such as parents’ education levels, occupations, exercise frequency, support for children’s physical activity, and household annual income. Parents completed the questionnaire on behalf of their children. Sedentary behavior and MVPA were assessed using the International Physical Activity Questionnaire—Short Form (IPAQ-SF) [31], a widely recognized and validated instrument. Previous studies have demonstrated its reliability and validity in the Chinese population [32]. The questionnaire consists of seven items, covering physical activity and sedentary time. It measures the weekly frequency (days per week) and daily duration (minutes per day) of low-intensity, moderate-intensity, and vigorous-intensity physical activities, as well as daily sedentary time (minutes per day) over the past seven days. The average daily MVPA time was calculated as “(MPA frequency × duration + VPA frequency × duration)/7.” Furthermore, drawing on both international evidence [33,34] and domestic preschool research [35], this study categorized children into sedentary (>6 hours/day) and non-sedentary (≤6 hours/day) groups based on their daily sedentary time.
For parental exercise frequency, parents reported the number of times per week they engaged in physical activity, categorized as“Never,” “1–2 times/week,” and “≥3 times/week.”The level of parental support for children’s physical activity was assessed separately for fathers and mothers, with reference to previous research [36]. In this study, parental support was categorized into three levels: High Support (e.g., actively participating in the child’s physical activity), Medium Support (e.g., providing encouragement and logistical support), and Low Support (e.g., seldom engaging in or promoting physical activity participation).
Physical literacy in children questionnaire (PL-C Quest).
The study employed the PL-C Quest, developed by Sport Australia, to assess the physical literacy of children aged 4–6 years [37,38]. This is the world’s first pictorial self-assessment tool designed for young children to evaluate their physical literacy. The questionnaire covers four domains—physical, psychological, social, and cognitive—with 30 items in total. Children were assessed through face-to-face interviews using images, and their responses were scored on a four-point scale. Higher total scores indicated better self-perceived physical literacy. The PL-C Quest has been validated for reliability and feasibility in children aged 4–12 in China [39], showing strong test-retest reliability (total scale: r = 0.90) and good to excellent internal consistency (total scale: α = 0.94). In this study, the Cronbach’s alpha for the PL-C Quest was 0.949, with domain-specific alphas ranging from 0.859 to 0.896. In physical literacy research, percentile-based methods are commonly used to distinguish groups at different relative levels [40,41]. In line with the “Healthy China 2030” Planning Outline, which set a national goal of achieving a 20% health literacy rate among residents by 2020 [42], this study classified children based on their PL-C Quest scores, with the top 20% classified as the “high physical literacy” group and the remaining 80% as the “needs attention” group. Although health literacy and physical literacy are not entirely identical constructs, international curriculum policies frequently locate them within the same Health and Physical Education (HPE) framework and pursue them under shared holistic health goals [43]. It should be noted that being in the “needs attention” group does not indicate poor physical literacy but rather reflects that there is room for improvement.
Family environment scale on motor development for pre-school urban children.
We used the Family Environment Scale on Motor Development for Pre-school Urban Children, developed by Hua Jing et al. [44], to assess the family exercise environment. This parent-reported scale evaluates four dimensions: outdoor space, indoor space, toys (hardware environment), and parenting style (software environment), with 23 items in total. Higher scores reflect a better family exercise environment [45]. The scale has demonstrated good reliability (Cronbach’s α = 0.875) and validity, with acceptable fit indices (χ2/df = 4.810, GFI = 0.949, RMSEA = 0.046) [44]. In our study, the Cronbach’s alpha for this scale was 0.933.
Quality control
Before the survey, researchers involved in the assessment of children’s physical literacy received standardized training. Prior to data collection, all parents signed an informed consent form before completing the questionnaire and allowing their children to be assessed. The form clearly outlined the purpose of the study, the evaluation process, and confidentiality measures, ensuring that parents participated voluntarily with full knowledge of the study details. It also ensured that parents understood the requirements and guidelines for completing the questionnaire. The questionnaires were reviewed by researchers, with strict exclusion of those that did not meet logical requirements, contained missing answers, or exhibited patterned responses. The data were entered into an Epidata database using a double-entry by two individuals to ensure data accuracy and correctness.
Statistical methods
Data analysis was conducted using SPSS 21.0 and Python 3.6. Categorical data were summarized as frequencies and percentages, with group comparisons performed using chi-square tests. Normally distributed continuous data were presented as mean ± standard deviation and compared using t-tests. Non-normally distributed data were expressed as median (P25, P75) and compared using the Mann-Whitney U test. Statistical significance was set at P < 0.05. Sixteen significant factors from univariate analysis were selected for further modeling using eXtreme Gradient Boosting (XGBoost), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) algorithms. To account for potential clustering effects due to the stratified cluster sampling design, five-fold cross-validation was employed during model training and validation [46].Hyperparameter tuning was also conducted to optimize performance.The validation set comprised 20% of the data. XGBoost was implemented using Python’s xgboost 2.1.0, while LR, RF, and SVM were implemented using sklearn 1.5.0. Model performance was evaluated using receiver operating characteristic (ROC) curves, area under the curve (AUC), sensitivity, specificity, accuracy, and F1 score.
Single models can sometimes overfit the training data, especially with small or noisy datasets. Ensemble learning, which combines multiple models, can mitigate this risk by improving overall accuracy and reducing overfitting. This study employed a stacking ensemble method, combining the outputs of the three top-performing models based on AUC to create a final ensemble model, which was then tested on the prediction set.
ROC curves were used to assess the discriminatory ability of the models, with AUC values closer to 1 indicating better performance. The F1 score, which balances precision and recall, was used to evaluate model sensitivity, with a score of 1 indicating optimal performance and 0 indicating poor performance.
Feature importance for each model was calculated using SHAP (SHapley Additive exPlanations) values [47]. This method is based on Shapley value theory from cooperative game theory and quantifies the contribution of each feature to the model prediction by computing the weighted average of its marginal contributions across all possible feature subsets.
Result
General information
This study included 1,734 children aged 4–6 years from 18 kindergartens across nine cities in Fujian Province. Among the participants, 950 were boys (54.8%) and 784 were girls (45.2%). The majority were 6 years old (42.3%). Regarding kindergarten location, 1,031 children (59.5%) were from main urban districts and 703 (40.5%) were from urban–rural fringe areas. Body type was classified based on the Chinese national standard Growth Standards for Children Under 7 Years (WS/T 423–2022) [48], in which thinness and severe thinness were grouped as “Thin,” overweight, obesity, and severe obesity were grouped as “Overweight,” and the remainder were classified as “Normal.” Based on this classification, most children (74.0%) fell into the Normal BMI range. Notably, 80.3% of the children were reported to engage in low levels of sedentary behavior, and 65.3% were reported to have good sleep quality. Regarding the parents, 36.9% of fathers had a bachelor’s degree or higher, and most were self-employed (31.8%). The majority of fathers exercised 1–2 times per week (35.8%) and were generally supportive of their children’s physical activities (73.4%). For mothers, 36.7% had a bachelor’s degree or higher, and most worked in public institutions or mid-level management roles (29.9%). Interestingly, 38.1% of mothers did not exercise, although 75.3% were supportive of their children’s physical activities. The highest reported household income was 200,000 RMB or more, accounting for 27.0% of the families.
Descriptive statistics of children’s physical literacy scores
The average physical literacy score for children aged 4–6 was 84.80 ± 14.26, with an average score per item of 2.83 ± 0.48. The mean scores across the domains were as follows: physical (2.62 ± 0.55), psychological (2.85 ± 0.53), social (3.17 ± 0.54), and cognitive (2.97 ± 0.55). The domain scores ranked from highest to lowest were: social, cognitive, psychological, and physical. Detailed statistics are provided in Table 1.
Feature extraction
One-way analysis revealed significant differences between the “high physical literacy” group and the “needs attention” group across various factors, including age, body type, sedentary behavior, screen time, MVPA time, sleep duration, sleep quality, father’s occupation, father’s exercise frequency, father’s support for children’s physical activity, mother’s education level, mother’s occupation, mother’s exercise frequency, mother’s support for children’s physical activity, household annual income, and family exercise environment (P < 0.05). The detailed comparisons are provided in Table 2.
Model prediction results
Among the 19 variables initially considered, 16 significant variables were identified through univariate analysis and were subsequently analyzed using LR, SVM, XGBoost, RF, and ensemble learning models to determine feature importance. The results showed that maternal exercise frequency, family exercise environment, and children’s MVPA time consistently emerged as key predictive factors across all models (Figs 1–5). The ensemble learning model, which integrated the strengths of multiple models, significantly improved prediction accuracy and robustness, reaffirming the critical impact of these factors on children’s physical literacy. Additionally, the ensemble learning model highlighted the importance of age, sedentary behavior, sleep duration, and parental support for children’s exercise. In contrast, factors such as parents’ occupation, household annual income, and children’s body type had relatively lower predictive power across the models (Fig 5).
Model performance and comparison
In this study, the 16 significant variables were analyzed using four machine learning models, with 5-fold cross-validation for training and validation, and 20% of the data used as the test set. The models were trained with specific parameters: XGBoost was set with a binary classification objective, a learning rate of 0.05, max tree depth of 3, min child weight of 2, L2 regularization of 3, a feature sampling ratio of 0.2, and 100 estimators. The RF model used the Gini criterion, max tree depth of 4, and 100 estimators. SVM had a regularization factor (C) of 1, an rbf kernel, and a tolerance of 0.01. LR had a regularization factor (C) of 1, with max iterations of 500, and used the lbfgs solver.
The performance of each model is shown in Tables 3 and 4. The ensemble learning model consistently outperformed the individual models, particularly in AUC, specificity, accuracy, and F1 score. For the training set, the RF model had the highest AUC of 0.879 and an F1 score of 0.82, followed closely by XGBoost with an AUC of 0.870 and an F1 score of 0.80. The SVM and LR models showed weaker performance with AUCs of 0.847 and 0.831, and F1 scores of 0.79 and 0.76, respectively. In the test set, the ensemble learning model achieved the best results, with an AUC of 0.862, specificity of 0.829, accuracy of 0.816, and an F1 score of 0.83, demonstrating its superior ability to handle complex data and enhance model robustness. XGBoost and RF models performed similarly in the test set, both achieving AUCs of 0.858 and F1 scores of 0.80 and 0.81, respectively. The SVM and LR models lagged behind, with AUCs of 0.848 and 0.834, and F1 scores of 0.79 and 0.78, respectively. Overall, the ensemble learning model showed the best performance across all metrics, particularly in the test set, indicating its higher applicability and stability in predicting children’s physical literacy. While XGBoost and RF models also demonstrated relatively good predictive performance, they were still outperformed by the ensemble model. The SVM and LR models faced certain limitations in processing this type of data, resulting in slightly lower predictive performance.
The ROC curves for each model are illustrated in Figs 6 and 7.
Discussion
This study assessed the physical literacy of 1,734 children aged 4–6 years in Fujian Province and revealed three key findings. First, children’s overall physical literacy level was relatively low, with notable variations across domains; the social domain achieved the highest scores, whereas the physical domain scored the lowest. Second, ensemble learning models outperformed single algorithms, underscoring their advantages in prediction accuracy and robustness. Third, mother’s exercise frequency, the family exercise environment, and children’s MVPA time consistently emerged as the most influential predictors of children’s physical literacy.
Overall low levels of children’s physical literacy, with variations across domains
The findings of this study indicate that the overall physical literacy score for children in Fujian Province was 84.80, with an average item score of 2.83, slightly above the midpoint of 2.5. When compared with recent validation studies of the PL-C Quest in Chinese children, this score is relatively low. For example, one study reported a mean total score of 96.76 among 642 children aged 6–12 years [49], while another study reported a mean overall score of 98.8 in a larger sample of 1,870 children aged 4–12 years [39]. The substantially lower score observed in our sample suggests that children’s physical literacy levels in Fujian Province are generally low, highlighting a considerable gap compared to national reference data and indicating significant room for improvement. Among the four assessed domains, the social domain had the highest score, while the physical domain had the lowest. The high score in the social domain reflects that children display good ethical behavior, cooperation skills, and respect for different cultures and values during physical activities, indicating a solid understanding of socially expected physical literacy. Research shows that social relationships significantly influence behavior and beliefs, especially in children, where positive sports experiences heavily rely on social support [50]. Encouragement, respect, and understanding in physical activities can enhance children’s participation and confidence, thereby fostering continuous involvement in sports. Such social support not only enhances physical literacy but also contributes positively to overall development. However, this study revealed that children aged 4–6 scored the lowest in the physical domain, with an average score of 2.62, indicating deficiencies in motor skills and physical fitness. The physical domain is fundamental to children’s daily life and sports participation [50], and its importance for their future physical development cannot be overlooked. Previous studies have highlighted that children’s free time is often spent sitting, either studying or engaging in screen-based activities, with insufficient time dedicated to physical activities [51]. This trend may lead to a decline in overall physical literacy, particularly in physical ability, daily behavior, and psychological motivation and confidence [52]. Therefore, this study recommends a focus on improving the physical domain by encouraging children to increase their physical activity, thereby enhancing their fitness and motor skills and ultimately improving their overall physical literacy.
Superior predictive performance of ensemble learning models for children’s physical literacy
By combining multiple models, including LR, SVM, XGBoost, and RF, this study represents the first application of ensemble learning methods in the field of children’s physical literacy, with the goal of improving prediction accuracy and model robustness. This study utilized five machine learning models—LR, SVM, XGBoost, RF, and Ensemble Learning—to train and validate 16 significant factors for constructing an efficient predictive model for physical literacy in children aged 4–6. The results showed that the Ensemble Learning model outperformed the others across key metrics, including AUC, specificity, accuracy, and F1 score, indicating its clear advantage in prediction accuracy and robustness. Compared to existing research, this study further validates the application of machine learning in predicting children’s physical literacy. Previous studies often relied on single algorithms. For instance, the XGBoost ensemble algorithm has been employed to develop an early health prediction framework [53], and it has also been used for assisting in orthopedic disease classification and prediction [54]. The RF algorithm has been utilized to predict adverse health events, demonstrating its potential in medical risk prediction [55]. In this study, the Ensemble Learning model not only exceeded single algorithms in predictive accuracy but also showed greater robustness. This finding is consistent with results demonstrating the effectiveness and adaptability of Ensemble Learning in processing complex data in dynamic multi-objective optimization [56]. Moreover, a review further supported the superiority of Ensemble Learning in disease prediction across various datasets [57].
Key predictors of children’s physical literacy
This study evaluated the physical literacy of 1,734 children aged 4–6 and analyzed multiple individual and family factors using various machine learning models. The results showed that while the importance of different features varied across models, mother’s exercise frequency, the family exercise environment, and children’s MVPA time were consistently identified as the most critical predictors, underscoring their significant roles in the development of children’s physical literacy.
The Ensemble Learning model demonstrated the highest predictive capability with an AUC of 86.2%. This result indicates that leveraging multiple machine learning models enables a more comprehensive assessment and prediction of the factors influencing children’s physical literacy. Specifically, mother’s exercise frequency emerged as the most important predictor, with higher mother’s exercise frequency correlating with a higher likelihood of children being in the high physical literacy group (P < 0.05). This finding aligns with previous research [58], which reported that preschool children’s physical activity is closely linked to their mothers’ activity levels, with each additional minute of maternal exercise boosting preschool children’s MVPA participation by 10%. Although existing research has not deeply explored the relationship between maternal exercise and children’s physical literacy development, this study highlights the unique influence of maternal exercise frequency. According to the family influence model [59,60], a mother’s exercise behavior can directly impact her child’s physical activity and literacy through role modeling, emphasizing the crucial role of mothers in nurturing physical literacy. Given that previous studies have shown that mothers generally have low activity levels [61], future interventions should focus on increasing maternal physical activity to enhance children’s physical literacy.
Additionally, the family exercise environment was validated across multiple models as a key factor, with children in better exercise environments at home being more likely to achieve superior physical literacy (P < 0.05). A favorable family exercise environment not only provides sufficient space and resources for physical activities but also lays a solid foundation for the development of physical literacy. This finding aligns with existing research emphasizing the influence of family environments on children’s health behaviors, reinforcing the importance of creating a supportive exercise atmosphere for children. According to ecological systems theory, the family is the primary environment for child development, influencing physical literacy through individual, parental, and environmental factors. Although many studies focus on the roles of kindergartens and communities [62], research on the relationship between the family exercise environment and children’s physical literacy remains limited. Qualitative studies have pointed out that a lack of family exercise space hinders children’s participation in physical activities, especially self-initiated exercises, while encouraging more screen time [63]. Another review highlighted the critical role of the home environment in shaping children’s activity levels [64]. Our findings are consistent with these results. Additionally, a cross-sectional study on Chinese preschool children found that the family environment significantly impacts the scientific fitness literacy of both preschool and school-aged children [65]. Thus, the family exercise environment is an essential factor in cultivating children’s physical literacy.
Moreover, children’s MVPA time was consistently identified as an important predictor across all models, with children in the“high physical literacy” group having significantly more MVPA time than those in the “needs attention” group (P < 0.001). This finding suggests that children’s physical literacy is directly influenced by their level of physical activity. The study also identified sedentary behavior as a negative predictor, which aligns with related research. Studies have shown that physical literacy is associated with both physical activity and sedentary behavior [66]. For example, a study on Canadian children found that those who met the daily guideline of 60 minutes of moderate to vigorous physical activity scored higher in physical literacy, particularly in physical competence, motivation, confidence, and knowledge/skills [67]. This relationship may be due to the impact of exercise on cardiovascular health. Research by Lang et al. found a strong correlation between children’s cardiovascular health and their physical literacy and its components [5]. Lima R A et al. also highlighted a positive relationship between physical activity and motor competence, with cardiovascular endurance potentially acting as a mediator [68]. As previous studies have emphasized, physical literacy is not only a prerequisite for physical activity but also developed through it [69]. Therefore, we believe that insufficient physical activity and prolonged sedentary behavior are likely associated with lower physical literacy levels. Reducing sedentary time and replacing it with physical activity may be an effective strategy for improving children’s physical literacy [67]. Additionally, earlier studies have shown that prolonged sedentary behavior is negatively associated with cardiometabolic risk factors, such as childhood obesity, hypertension, abnormal cholesterol levels, and elevated insulin. Notably, these risks may persist from childhood through adolescence and into adulthood [70]. Therefore, reducing sedentary behavior and increasing physical activity during childhood is essential for promoting healthy growth.
Finally, this study also found that age, sleep duration, and parental support for physical activity play significant roles in predicting children’s physical literacy. The significance of age in the models suggests that physical literacy levels can change significantly as children grow older, highlighting the need to consider age when planning early interventions. Moreover, sleep duration was identified as an important predictor, with children in the“high physical literacy” group getting slightly more sleep than those in the “needs attention” group (P = 0.008). This finding suggests that adequate sleep contributes to better physical literacy, consistent with Lemes et al., who found that well-rested children are more willing to engage in physical activities [52], thereby enhancing their physical literacy. However, the extent and mechanisms of how sleep affects physical literacy still require further exploration. The connection between sleep and physical literacy likely involves several factors. On one hand, sleep is crucial for physiological and cognitive functions, with studies confirming that a lack of sleep impairs motor performance by disrupting the autonomic nervous system and reducing coordination [71]. On the other hand, psychological factors such as stress, anxiety, and depression are closely linked to sleep problems [72], potentially affecting children’s mental state and motivation during physical activities, which could lower their overall physical literacy. Additionally, parental support was a key factor in predicting physical literacy. Hinkley et al. pointed out that sociocultural factors play a significant role in shaping preschool children’s physical activity levels and patterns [73]. Further research showed that highly supportive parenting helps children build positive beliefs and values related to physical activity [74]. A meta-analysis by Yao and Rhodes also confirmed that parental support and role modeling are strongly linked to children’s physical activity [75]. These findings align with our study’s results, further emphasizing the crucial role of parental support in determining children’s physical literacy and highlighting the family as a key environment for fostering healthy behaviors. It is important to note that although our analysis found that the proportion of children in urban–rural fringe areas classified into the ‘Needs Attention’ group was slightly higher than that in main urban districts, the difference was not statistically significant. This may be due to the strong support of the national Rural Revitalization Strategy in recent years [76], under which local governments have continuously improved facilities and programs for children’s physical activities in urban–rural fringe areas, thereby narrowing the gap between main urban districts and urban–rural fringe areas in the development of children’s physical literacy.
Limitations and future directions
This study has several limitations that should be acknowledged. First, the data were collected from specific regions within Fujian Province, which may limit generalizability. Future studies should expand the geographic scope to validate the results. Second, physical literacy was measured using self-reported data, which is susceptible to recall bias or social desirability effects. Although self-reporting is cost-effective and convenient for large-scale surveys, future research should combine subjective and objective measures to improve reliability. Finally, while 16 influencing factors were included in this study, they may not capture all potential predictors. Future research should broaden the range of predictors to support the development of more comprehensive predictive models of children’s physical literacy.
Conclusions
Our study revealed that the overall physical literacy levels of children aged 4–6 in Fujian Province are relatively low. Among the four domains, the social domain scored the highest, showing that these children have relatively mature social behaviors and cooperative skills in physical activities. However, the physical domain lagged behind, indicating that there’s substantial room for improvement in their physical fitness and motor skills. By applying multiple machine learning models, including LR, SVM, XGBoost, RF, and Ensemble Learning, this study systematically analyzed the key factors influencing children’s physical literacy. The findings revealed that mother’s exercise frequency, family exercise environment, and children’s MVPA time were the most important predictive factors, significantly impacting physical literacy levels. Additionally, the Ensemble Learning model outperformed the others, particularly in AUC, specificity, accuracy, and F1 score, demonstrating its superiority over single models. This validates the effectiveness of Ensemble Learning in handling complex datasets.
This study makes several innovative contributions. First, it fills a gap in research on physical literacy among younger children aged 4–6 by applying multiple machine learning models to analyze the impact of family environment and maternal exercise frequency—an area that hasn’t been fully explored in existing literature. Second, this study is the first to apply an Ensemble Learning method to predict physical literacy in children aged 4–6. By combining SVM, XGBoost, and RF models, it significantly enhances prediction accuracy and robustness, providing a more precise tool for understanding the complex relationships in children’s physical literacy. The study also took a multidimensional analysis approach, systematically examining individual factors like gender, age, and body type, while also considering family factors like parental education, exercise frequency, and support attitudes. This comprehensive framework offers new research pathways and theoretical support for understanding and improving physical literacy in preschool children aged 4–6.
References
- 1. Whitehead M. Definition of physical literacy and clarification of related issues. ICSSPE Bulletin. 2013;65:28–42.
- 2. Carl J, Barratt J, Töpfer C, Cairney J, Pfeifer K. How are physical literacy interventions conceptualized? – A systematic review on intervention design and content. Psychology of Sport and Exercise. 2022;58:102091.
- 3. Shearer C, Goss HR, Boddy LM, Knowles ZR, Durden-Myers EJ, Foweather L. Assessments Related to the Physical, Affective and Cognitive Domains of Physical Literacy Amongst Children Aged 7-11.9 Years: A Systematic Review. Sports Med Open. 2021;7(1):37. pmid:34046703
- 4. Zhou J, Luo Y, Luo D. Study on the relationship between kindergarten outdoor environments and children’s physical activity using behavior mapping. China Sport Science and Technology. 2018;54:91-97,104.
- 5. Lang JJ, Chaput JP, Longmuir PE, Barnes JD, Belanger K, Tomkinson GR, et al. Cardiorespiratory fitness is associated with physical literacy in a large sample of Canadian children aged 8 to 12 years. BMC Public Health, 2018;18:1–13.
- 6. Cairney J, Clark HJ, James ME, Mitchell D, Dudley DA, Kriellaars D. The Preschool Physical Literacy Assessment Tool: Testing a New Physical Literacy Tool for the Early Years. Front Pediatr. 2018;6:138. pmid:29930933
- 7. Zhang D, Shi L, Zhu X, Chen S, Liu Y. Effects of intervention integrating physical literacy into active school recesses on physical fitness and academic achievement in Chinese children. J Exerc Sci Fit. 2023;21(4):376–84. pmid:37927355
- 8. Jiang T, Zhao G, Fu J, Sun S, Chen R, Chen D, et al. Relationship Between Physical Literacy and Cardiorespiratory Fitness in Children and Adolescents: A Systematic Review and Meta-analysis. Sports Med. 2025;55(2):473–85. pmid:39579330
- 9. Liu Y, Chen S. Physical literacy in children and adolescents: Definitions, assessments, and interventions. European Physical Education Review. 2021;27(1):96–112.
- 10. Petersen TL, Møller LB, Brønd JC, Jepsen R, Grøntved A. Association between parent and child physical activity: a systematic review. Int J Behav Nutr Phys Act. 2020;17(1):67. pmid:32423407
- 11. Sheldrick MP, Maitland C, Mackintosh KA, Rosenberg M, Griffiths LJ, Fry R, et al. Clusters of Activity-Related Social and Physical Home Environmental Factors and Their Association With Children’s Home-Based Physical Activity and Sitting. Pediatr Exerc Sci. 2022;35(1):23–34. pmid:35940584
- 12. Dong P, Yu S. Interdisciplinary theme learning of physical education and health courses based on core literacy: Connotation determination, design process and promotion strategy. Journal of Tianjin University of Sport. 2024;39:56–63.
- 13. Coyne P, Vandenborn E, Santarossa S, Milne MM, Milne KJ, Woodruff SJ. Physical literacy improves with the Run Jump Throw Wheel program among students in grades 4-6 in southwestern Ontario. Appl Physiol Nutr Metab. 2019;44(6):645–9. pmid:31032623
- 14. Law B, Bruner B, Scharoun Benson SM, Anderson K, Gregg M, Hall N, et al. Associations between teacher training and measures of physical literacy among Canadian 8- to 12-year-old students. BMC Public Health. 2018;18(Suppl 2):1039. pmid:30285690
- 15. Long B, Chen S, Long Y, Liu Y, Li Y, Wang Y, et al. The predictive relationship between parents’ perceptions of physical activity and children’s physical literacy. Sci Rep. 2025;15(1):24207. pmid:40624220
- 16. Huang W, Luo J, Chen Y. Effects of Kindergarten, Family Environment, and Physical Activity on Children’s Physical Fitness. Front Public Health. 2022;10:904903. pmid:35757641
- 17. Venetsanou F, Kambas A. Environmental Factors Affecting Preschoolers’ Motor Development. Early Childhood Educ J. 2010;37(4):319–27.
- 18. Siddiqui H, Rattani A, Woods NK, Cure L, Lewis RK, Twomey J, et al. A Survey on Machine and Deep Learning Models for Childhood and Adolescent Obesity. IEEE Access. 2021;9:157337–60.
- 19. Madakkatel I, Zhou A, McDonnell MD, Hyppönen E. Combining machine learning and conventional statistical approaches for risk factor discovery in a large cohort study. Sci Rep. 2021;11(1):22997. pmid:34837000
- 20. Guerrero MD, Vanderloo LM, Rhodes RE, Faulkner G, Moore SA, Tremblay MS. Canadian children’s and youth’s adherence to the 24-h movement guidelines during the COVID-19 pandemic: A decision tree analysis. J Sport Health Sci. 2020;9(4):313–21. pmid:32525098
- 21. Bitew FH, Sparks CS, Nyarko SH. Machine learning algorithms for predicting undernutrition among under-five children in Ethiopia. Public Health Nutr. 2022;25(2):269–80. pmid:34620263
- 22. Xu K, Sun Z. Predicting academic performance associated with physical fitness of primary school students using machine learning methods. Complement Ther Clin Pract. 2023;51:101736. pmid:36821949
- 23. Britton Ú, Onibonoje O, Belton S, Behan S, Peers C, Issartel J, et al. Moving well-being well: Using machine learning to explore the relationship between physical literacy and well-being in children. Appl Psychol Health Well Being. 2023;15(3):1110–29. pmid:36628524
- 24. Dong X, Yan Z, Deng J, Johanson Desiral H, Xu M, Huang F. Using physical literacy to predict physical activity among university students: A machine learning logistic regression model. International Journal of Sport Psychology. 2024;55(3):280–96.
- 25. Ganaie MA, Hu M, Malik AK, Tanveer M, Suganthan PN. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence. 2022;115:105151.
- 26. Pourakbari B, Mamishi S, Valian SK, Mahmoudi S, Sadeghi RH, Abdolsalehi MR, et al. Predicting COVID-19 severity in pediatric patients using machine learning: a comparative analysis of algorithms and ensemble methods. Sci Rep. 2025;15(1):29118. pmid:40781476
- 27. Dick K, Kaczmarek E, Ducharme R, Bowie AC, Dingwall-Harvey ALJ, Howley H, et al. Transformer-based deep learning ensemble framework predicts autism spectrum disorder using health administrative and birth registry data. Sci Rep. 2025;15(1):11816. pmid:40195371
- 28.
National Bureau of Statistics of China. In statistics, how are urban and rural areas classified?. Statistical Knowledge. 2024. http://snzd.stats.gov.cn/zsjd/2024/45630.shtml
- 29. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR, Young SL. Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front Public Health. 2018;6:149. pmid:29942800
- 30. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9. pmid:8970487
- 31. Craig CL, Marshall AL, Sjöström M, Bauman AE, Booth ML, Ainsworth BE, et al. International physical activity questionnaire: 12-country reliability and validity. Med Sci Sports Exerc. 2003;35(8):1381–95. pmid:12900694
- 32. Qu N, Li K. Study on the reliability and validity of International Physical Activity Questionnaire (Chinese Version IPAQ). Chinese Journal of Epidemiology. 2004;25(3):265–8.
- 33. Patel AV, Hildebrand JS, Campbell PT, Teras LR, Craft LL, McCullough ML, et al. Leisure-Time Spent Sitting and Site-Specific Cancer Incidence in a Large U.S. Cohort. Cancer Epidemiol Biomarkers Prev. 2015;24(9):1350–9. pmid:26126627
- 34. Agbaje AO. Lean Mass Longitudinally Confounds Sedentary Time and Physical Activity With Blood Pressure Progression in 2513 Children. J Cachexia Sarcopenia Muscle. 2024;15(6):2826–41. pmid:39535381
- 35. Zhao X, Liu Y, Li C, Zhao X, Yi G, Li S. Correlation between physical activity, sedentary behavior and sleep problems in preschool children. Modern Preventive Medicine. 2022;49(19):3517–23.
- 36. Gao W, Wang H, Li C, Sun H, Zhang Y. Development of physical activity family support environment questionnaire for children aged 3–6 years old. Chinese Journal of Health Education. 2023;39(4):329–34.
- 37. Barnett LM, Mazzoli E, Bowe SJ, Lander N, Salmon J. Reliability and validity of the PL-C Quest, a scale designed to assess children’s self-reported physical literacy. Psychology of Sport and Exercise. 2022;60:102164.
- 38. Barnett L, Lander N, Mazzoli E, Salmon J. Development and reliability of the Physical Literacy in Children Questionnaire (PL-C Quest): a self-report scale to assess children’s perceived physical literacy. Journal of Science and Medicine in Sport. 2021;24:S8.
- 39. Diao Y, Wang L, Chen S, Barnett LM, Mazzoli E, Essiet IA, et al. The validity of the Physical Literacy in Children Questionnaire in children aged 4 to 12. BMC Public Health. 2024;24(1):869. pmid:38515090
- 40. Tremblay MS, Longmuir PE, Barnes JD, Belanger K, Anderson KD, Bruner B, et al. Physical literacy levels of Canadian children aged 8–12 years: Descriptive and normative results from the RBC Learn to Play–CAPL project. BMC Public Health. 2018;18(S2):1036.
- 41. Hadier SG, Yinghai L, Long L, Hamdani SD, Hamdani SMZH. Assessing physical literacy and establishing normative reference curves for 8-12-year-old children from South Punjab, Pakistan: The PAK-IPPL cross-sectional study. PLoS One. 2025;20(2):e0312916. pmid:39932941
- 42.
Zhonggong Central Committee, State Council. Healthy China 2030 Planning Outline. Xinhua News Agency. 2016. https://www.gov.cn/zhengce/202203/content_3635233.htm
- 43. Lynch T, Soukup GJ. “Physical education”, “health and physical education”, “physical literacy” and “health literacy”: Global nomenclature confusion. Cogent Education. 2016;3(1):1217820.
- 44. Hua J, Zhang L, Gu G, Qin Z, Meng W, Wu Z. Preliminary compilation of Family Environment Scale on Motor Development for Preschool Urban Children. Chinese Journal of School Health. 2011;32:161–3.
- 45. Yang H, Wang H. Associations of developmental coordination disorders and sensory integration disorders with family environment for motor development. Chin J Sch Health. 2020;41:86–9.
- 46.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1995. 1137–45. https://www.researchgate.net/profile/Ron-Kohavi/publication/2352264_A_Study_of_Cross-Validation_and_Bootstrap_for_Accuracy_Estimation_and_Model_Selection/links/02e7e51bcc14c5e91c000000/A-Study-of-Cross-Validation-and-Bootstrap-for-Accuracy-Estimation-and-Model-Selection.pdf
- 47.
Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, 2017. https://proceedings.neurips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf
- 48.
National Health Commission of the People’s Republic of China. Growth Standards for Children Under 7 Years (WS/T 423–2022). Beijing: National Health Commission, 2022. https://www.nhc.gov.cn/fzs/c100048/202211/5001d7cf57774770a1d49c1df46a291f.shtml
- 49. Wu Y, Wang X, Wang H, Wang L, Tian Y, Ji Z, et al. Validation of the PL-C Quest in China: understanding the pictorial physical literacy self-report scale. Front Psychol. 2024;15:1328549. pmid:38515980
- 50. Tian H, Miao X, Sun M, Yin Z. Pictorial self-assessment: Interpretation and enlightenment of PL-C quest of children’s physical literacy in Australia. Journal of Sports Research. 2022;36:32–43.
- 51. O’Dwyer MV, Fairclough SJ, Knowles Z, Stratton G. Effect of a family focused active play intervention on sedentary time and physical activity in preschool children. Int J Behav Nutr Phys Act. 2012;9:117. pmid:23025568
- 52. Lemes VB, Sehn AP, Reuter CP, Burns RD, Gaya AR, Gaya ACA, et al. Associations of sleep time, quality of life, and obesity indicators on physical literacy components: a structural equation model. BMC Pediatr. 2024;24(1):159. pmid:38454408
- 53. Kumar D, Sood SK, Rawat KS. Early health prediction framework using XGBoost ensemble algorithm in intelligent environment. Artif Intell Rev. 2023;56(S1):1591–615.
- 54. Li S, Zhang X. Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm. Neural Comput & Applic. 2020;32(7):1971–9.
- 55. Cafri G, Li L, Paxton EW, Fan J. Predicting risk for adverse health events using random forest. Journal of Applied Statistics. 2017;45(12):2279–94.
- 56. Wang F, Li Y, Liao F, Yan H. An ensemble learning based prediction strategy for dynamic multi-objective optimization. Applied Soft Computing. 2020;96:106592.
- 57. Mahajan P, Uddin S, Hajati F, Moni MA. Ensemble Learning for Disease Prediction: A Review. Healthcare (Basel). 2023;11(12):1808. pmid:37372925
- 58. Hesketh KR, Goodfellow L, Ekelund U, McMinn AM, Godfrey KM, Inskip HM, et al. Activity levels in mothers and their preschool children. Pediatrics. 2014;133(4):e973–80. pmid:24664097
- 59. Trost SG, Sallis JF, Pate RR, Freedson PS, Taylor WC, Dowda M. Evaluating a model of parental influence on youth physical activity. Am J Prev Med. 2003;25(4):277–82. pmid:14580627
- 60. Bois JE, Sarrazin PG, Brustad RJ, Trouilloud DO, Cury F. Elementary schoolchildren’s perceived competence and physical activity involvement: the influence of parents’ role modelling behaviours and perceptions of their child’s competence. Psychology of Sport and Exercise. 2005;6(4):381–97.
- 61. Uijtdewilligen L, Peeters GMEE, van Uffelen JGZ, Twisk JWR, Singh AS, Brown WJ. Determinants of physical activity in a cohort of young adult women. Who is at risk of inactive behaviour?. J Sci Med Sport. 2015;18(1):49–55. pmid:24636128
- 62. Giles-Corti B, Timperio A, Bull F, Pikora T. Understanding physical activity environmental correlates: increased specificity for ecological models. Exerc Sport Sci Rev. 2005;33(4):175–81. pmid:16239834
- 63. Maitland C, Stratton G, Foster S, Braham R, Rosenberg M. The Dynamic Family Home: a qualitative exploration of physical environmental influences on children’s sedentary behaviour and physical activity within the home space. Int J Behav Nutr Phys Act. 2014;11:157. pmid:25540114
- 64. Maitland C, Stratton G, Foster S, Braham R, Rosenberg M. A place for play? The influence of the home physical environment on children’s physical activity and sedentary behaviour. Int J Behav Nutr Phys Act. 2013;10:99. pmid:23958282
- 65. Pan X, Wang H, Wu D, Liu X, Deng P, Zhang Y. Influence of Family Environment on the Scientific Fitness Literacy of Preschool and School Children in China: A National Cross-Sectional Study. Int J Environ Res Public Health. 2022;19(14):8319. pmid:35886162
- 66. Melby PS, Nielsen G, Brønd JC, Tremblay MS, Bentsen P, Elsborg P. Associations between children’s physical literacy and well-being: is physical activity a mediator? BMC Public Health. 2022;22(1):1267.
- 67. Belanger K, Barnes JD, Longmuir PE, Anderson KD, Bruner B, Copeland JL, et al. The relationship between physical literacy scores and adherence to Canadian physical activity and sedentary behaviour guidelines. BMC Public Health. 2018;18(Suppl 2):1042. pmid:30285783
- 68. Lima RA, Pfeiffer K, Larsen LR, Bugge A, Moller NC, Anderson LB, et al. Physical Activity and Motor Competence Present a Positive Reciprocal Longitudinal Relationship Across Childhood and Early Adolescence. J Phys Act Health. 2017;14(6):440–7. pmid:28169569
- 69. Edwards LC, Bryant AS, Keegan RJ, Morgan K, Jones AM. Definitions, Foundations and Associations of Physical Literacy: A Systematic Review. Sports Med. 2017;47(1):113–26. pmid:27365029
- 70. Li MH, Sit CHP, Wong SHS, Wing YK, Ng CK, Sum RKW. Promoting physical activity and health in Hong Kong primary school children through a blended physical literacy intervention: protocol and baseline characteristics of the “Stand+Move” randomized controlled trial. Trials. 2021;22(1):944. pmid:34930404
- 71. Fullagar HHK, Skorski S, Duffield R, Hammes D, Coutts AJ, Meyer T. Sleep and athletic performance: the effects of sleep loss on exercise performance, and physiological and cognitive responses to exercise. Sports Med. 2015;45(2):161–86. pmid:25315456
- 72. Alfano CA, Zakem AH, Costa NM, Taylor LK, Weems CF. Sleep problems and their relation to cognitive factors, anxiety, and depressive symptoms in children and adolescents. Depress Anxiety. 2009;26(6):503–12. pmid:19067319
- 73. Hinkley T, Salmon J, Okely AD, Hesketh K, Crawford D. Correlates of preschool children’s physical activity. Am J Prev Med. 2012;43(2):159–67. pmid:22813680
- 74. Ha AS, Jia J, Ng FFY, Ng JYY. Parent’s physical literacy enhances children’s values towards physical activity: A serial mediation model. Psychology of Sport and Exercise. 2022;63:102297.
- 75. Yao CA, Rhodes RE. Parental correlates in child and adolescent physical activity: a meta-analysis. Int J Behav Nutr Phys Act. 2015;12:10. pmid:25890040
- 76. Xu Z. Theoretical logic, practical dilemma and practical approach of integrating talents with sports and education to help rural revitalization. Journal of Capital University of Physical Education and Sports. 2025;37:63–70.