Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

PERMA-guided multi-topology graph neural networks for cross-cultural student well-being prediction

  • Lingqi Mo,

    Roles Conceptualization, Formal analysis

    Affiliation Faculty of Education and Liberal Arts, INTI International University, Nilai, Negeri Sembilan, Malaysia

  • Jie Zhang ,

    Roles Data curation, Formal analysis

    i24026180@student.newinti.edu.my

    Affiliations Faculty of Business and Communications (FBC), INTI International University, Nilai, Malaysia, College of Logistics and E-Commerce, Henan Logistics Vocational College, Zhengzhou, China

  • Zixiao Jiang,

    Roles Methodology

    Affiliations Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia, Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China

  • Shuanglei Wang,

    Roles Investigation

    Affiliation Faculty of Liberal Arts, Shinawatra University, Pathum Thani, Thailand

  • ShiouYih Lee

    Roles Validation

    Affiliation Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia

Abstract

Student well-being prediction is of great significance for promoting personalized education and preventing mental health problems, but existing methods suffer from limitations including lack of psychological theory guidance, neglect of student relationship modeling, and insufficient cross-cultural adaptability. This study proposes the PERMA-GNN-Transformer model, which innovatively integrates Seligman’s PERMA positive psychology theory with graph neural networks and Transformer architecture. The model achieves the transformation from raw educational data to psychologically meaningful representations through a theory-driven feature embedding mechanism, designs four types of student relationship graphs based on cosine similarity, Euclidean distance, learning styles, and PERMA weighting, and employs a five-head attention mechanism corresponding to the five PERMA dimensions. Experiments on the Western cultural background Lifestyle and Wellbeing Data (n = 12,757) and the East Asian cultural background International Student Mental Health Dataset (n = 268) demonstrate that compared to the optimal baseline methods, our proposed model achieves an 18.9% performance improvement on large-scale datasets and a 27.8% improvement on small-scale datasets, with the PERMA comprehensive evaluation metric reaching 0.792 and passing statistical significance tests at p < 0.01. This research provides a theory-driven, relationship-aware, and culturally adaptive technical solution for student well-being prediction in cross-cultural educational environments.

1. Introduction

Student well-being prediction has become a central issue in modern educational systems, playing a crucial role in promoting personalized education, preventing student mental health problems, and enhancing educational quality. With the acceleration of globalization, the number of international students is projected to reach 8 million by 2025 [1], presenting unprecedented challenges for monitoring student mental health in cross-cultural educational environments. Research indicates that international students face higher mental health risks compared to domestic students, with anxiety symptoms occurring in 2.4%−43% of cases, depression symptoms in 3.6%−38.3%, and psychological stress in 31.6%−54% [2]. Meanwhile, the application of artificial intelligence in education is experiencing explosive growth, with machine learning technologies demonstrating significant advantages in student performance prediction, achieving prediction accuracies of 70%−96% [3,4]. King et al.‘s large-scale cross-cultural study based on 520,000 students found that school belongingness and sense of meaning are key factors in predicting student well-being [5]. However, existing research primarily focuses on academic performance prediction, lacking deep modeling of student psychological well-being, particularly in cross-cultural educational environments where adaptive prediction remains a significant technical gap.

Despite significant progress in educational data mining and machine learning for student performance prediction, existing methods still face three key limitations in student well-being prediction. First, the lack of psychological theory guidance is a fundamental flaw in current technologies. Traditional methods mostly adopt purely data-driven strategies, which may achieve certain results in prediction accuracy but often lack theoretical foundation and interpretability [6,7]. Chen et al.‘s comprehensive review points out that there exists a significant gap between theory and technical application in existing AI educational applications [8]. Second, neglecting student relationship modeling limits the effectiveness of prediction models. The prerequisite-enhanced category-aware graph neural network proposed by Sun et al., although considering dependencies between courses, still employs a single graph topology structure and cannot comprehensively capture the complex social dynamics of student groups [9]. Li et al.’s Study-GNN model attempted multi-topology modeling but lacked theory-driven graph construction strategies [10]. Third, insufficient cross-cultural adaptability has become a key challenge in the context of globalized education. Bethel et al.’s research demonstrates that cross-cultural transitions significantly impact international students’ psychological adaptation [11], while existing prediction models are mostly developed based on single cultural backgrounds and face performance degradation issues in cross-cultural applications.

Addressing the above limitations, this study proposes the PERMA-GNN-Transformer model, achieving innovative integration of positive psychology theory with deep learning technologies. The main contributions of this paper include three aspects: (1) Theory-driven feature representation learning: This is the first study to structurally integrate Seligman’s PERMA positive psychology theory into deep learning architecture, achieving effective transformation from raw educational data to psychologically meaningful representations through a theory-driven feature embedding mechanism, providing solid theoretical foundation and interpretive framework for model predictions; (2) Multi-topology graph neural network architecture: We designed four types of student relationship graphs based on cosine similarity, Euclidean distance, learning styles, and PERMA weighting, achieving personalized relationship modeling through graph-level attention fusion mechanisms, overcoming the limitations of single topology structures; (3) Cross-cultural adaptability validation: Comprehensive experimental validation was conducted on the Western cultural background Lifestyle and Wellbeing Data (n = 12,757) and the East Asian cultural background International Student Mental Health Dataset (n = 268), demonstrating the cross-cultural universality of PERMA theory and the cultural adaptation capability of the model. Experimental results show that compared to the optimal baseline, our proposed method achieved an 18.9% performance improvement on large-scale datasets and a 27.8% improvement on small-scale datasets, with the PERMA comprehensive evaluation metric reaching 0.792, providing an effective technical solution for student well-being prediction in cross-cultural educational environments.

2. Related work

2.1. Student well-being prediction and educational data mining

Student well-being prediction, as an important branch of educational data mining, has received widespread attention in recent years. Early research primarily employed traditional machine learning methods for student performance prediction. Yağcı et al., based on academic performance data from 1,854 students, compared the prediction performance of algorithms including random forest, support vector machine, and naive Bayes, finding that random forest achieved 70%−75% classification accuracy in final grade prediction tasks [12]. Angeioplastis et al. utilized data from 450 students in the Moodle learning management system and identified course correlations through five algorithms including k-nearest neighbors, random forest, and logistic regression, discovering that strongly correlated courses (+0.3 or above) significantly improved prediction accuracy [4]. However, these methods primarily focused on academic performance and lacked deep modeling of student psychological well-being.

The rise of deep learning technologies has brought new opportunities for student mental health prediction. Rahman et al., based on large-scale cross-regional survey data from 17 universities in Southeast Asia, employed multiple machine learning algorithms including neural networks and random forests to predict student mental health status, finding that deep learning methods have significant advantages in capturing complex psychological constructs [6]. Oparina et al. utilized data from over 1 million respondents from Germany, the UK, and the US, using tree-based machine learning algorithms to predict subjective well-being, establishing statistical upper bounds for well-being prediction and proving the effectiveness of machine learning in well-being modeling [13]. These studies provide important theoretical foundation and empirical support for the deep learning methods proposed in this paper. Recent research continues to advance this field, with Li [14] demonstrating sophisticated intelligent analysis algorithms for student learning behavior through multi-source sensor data fusion, and Mohd Amin et al. [15] exploring AI’s revolutionary impact on open and distance learning while addressing implementation challenges. These contemporary studies underscore the evolving landscape of AI-driven educational analytics and its potential for comprehensive student support systems.

Research on student mental health in cross-cultural contexts reveals the important impact of cultural factors on well-being prediction. Wilson et al.‘s systematic review of international student mental health in Australia found that international students face multiple challenges including loneliness/isolation (60%-65%), work/financial difficulties (15.4%-95%), and discrimination/security concerns (9%-50%) [16]. Bethel et al.’s cross-cultural transition research showed that host national connectedness plays a key mediating role between personal resources and psychological adaptation [11]. Govorova et al.’s cross-cultural analysis based on PISA 2015 data from 35 OECD countries validated the significant impact of school effects on student well-being, providing important reference for the cross-cultural validation in this study [17]. These findings emphasize the urgent need to develop culturally adaptive prediction models in the globalized educational environment.

2.2. Progress of graph neural networks in educational applications

Graph neural networks, as powerful tools for processing relational data, have shown great potential in the educational domain. Wu et al.‘s comprehensive survey pointed out that GNNs effectively capture dependencies in graph structures through message passing mechanisms, achieving breakthrough progress in tasks such as node classification and graph classification [18]. Kumar and Alemran’s graph neural network text classification model proved the effectiveness of GNNs in educational text mining [19]. These foundational works laid the theoretical foundation for GNN applications in complex educational scenarios.

Graph neural network applications in education mainly focus on student modeling and course recommendation. Sun et al. proposed a prerequisite-enhanced category-aware graph neural network that achieved precise course recommendation by constructing course dependency relationship graphs, validating the unique value of graph structure modeling in educational applications [9]. Li et al. developed the Study-GNN model using multi-topology graph neural networks to predict student performance, achieving significant performance improvements compared to traditional methods by simultaneously modeling multiple relationships such as student similarity and course associations [10]. Karimi et al. applied relational graph convolutional neural networks to online academic course performance prediction, proving the important contribution of student relationship modeling to prediction accuracy [20].

Multi-topology graph structure modeling represents the latest development trend of GNNs in education. Zhou et al.‘s review of multilayer graph neural networks emphasized the complementary role of different graph topologies in capturing data complexity [21]. In educational applications, single similarity measures often cannot comprehensively characterize the complex relationships among students, requiring graph structure construction from multiple perspectives. Scarselli et al.’s graph neural network theoretical framework provided mathematical foundation for multi-topology modeling [22]. However, existing research mostly adopts heuristic graph construction strategies, lacking systematic guidance from psychological theories, which is precisely the innovation breakthrough point of this study.

2.3. Psychology theory-driven artificial intelligence methods

The integration of psychological theories with artificial intelligence technologies has opened new research directions for educational applications. Seligman’s PERMA theory, as the core framework of positive psychology, decomposes well-being into five dimensions: Positive emotions, Engagement, Relationships, Meaning, and Achievement, providing scientific basis for quantifying psychological constructs [23]. Ahmad et al. validated the effectiveness of the PERMA model in enhancing student learning engagement in educational technology research [24]. This theoretical framework provides solid psychological foundation for the feature representation learning and model architecture design in this study.

Theory-driven machine learning methods have shown unique advantages in multiple domains. Chen et al.‘s review of AI educational applications pointed out that deep integration of theory and technology is a key trend for future development, and purely data-driven methods have fundamental limitations in interpretability and generalization ability [7]. In bioinformatics, Yang et al. integrated domain knowledge into Transformer architecture, significantly improving the accuracy and interpretability of protein structure prediction [25]. These cross-domain success cases prove the universal value of theory-driven methods, providing important reference for this study’s embedding of PERMA theory into deep learning architecture.

The application of explainable artificial intelligence in educational psychology has received increasing attention. Woodward et al.‘s review emphasized the importance of model interpretability in mental health applications, noting that traditional “black box” models are difficult to gain trust from educators [26]. Dale et al.’s systematic review confirmed the positive impact of healthy lifestyle interventions on mental health, but emphasized the need for interpretable prediction models to guide personalized intervention strategies [27]. This study provides clear psychological interpretation framework for deep learning models through PERMA-aligned attention mechanisms and multi-topology graph visualization analysis, meeting the urgent need for explainable AI in educational practice.

3. Methodology

3.1. Problem statement

3.1.1. Data definition.

This study defines student well-being prediction as a multi-task deep learning problem that integrates psychological theory guidance, graph structure relationship modeling, and cross-cultural adaptation. Given a cross-cultural dataset containing n students, where each student includes a multi-dimensional feature vector, overall well-being label, five-dimensional decomposition labels based on PERMA theory, and cultural background identifier.

where is the student feature vector, is the overall well-being label, represents the five-dimensional PERMA labels, and is the cultural background identifier.

3.1.2. Learning objective.

The research objective is to learn a mapping function that can simultaneously predict overall well-being and scores for the five PERMA dimensions based on student features, multi-topology graph structures, and cultural background. This function needs to maintain stable prediction performance across different cultural backgrounds and ensure that prediction results conform to the inherent logic of PERMA theory.

where represents the set of four graph topologies, represents the cultural context space, and the function outputs overall well-being prediction and PERMA dimension predictions .

3.1.3. Core challenges.

The core challenge of this problem lies in how to effectively integrate the relationship modeling capability of graph neural networks, the sequence representation learning capability of Transformers, and the psychological guidance of PERMA theory to achieve accurate and interpretable student well-being prediction in cross-cultural educational environments. The model must satisfy theoretical consistency constraints, ensuring internal consistency between overall well-being predictions and PERMA dimension predictions.

where is the consistency tolerance threshold, represents the prediction value for the p-th PERMA dimension, and this constraint ensures that model outputs conform to positive psychology theoretical expectations.

3.2. Model architecture

Addressing the problem that existing student well-being monitoring methods lack psychological theoretical support and real-time dynamic assessment capabilities, this study proposes the PERMA-GNN-Transformer model, which innovatively combines the PERMA theory of positive psychology with graph neural networks and Transformer architecture. This model not only achieves theory-driven feature representation learning but also models the complex social relationships among students through multi-topology graph structures, providing technical support for personalized well-being prediction in cross-cultural educational environments. The overall architecture adopts a four-stage processing pipeline, as shown in Fig 1.

3.2.1. Multi-source feature input and student relationship graph construction.

Considering the multidimensionality and individual differences of student well-being, the model first constructs multi-dimensional feature representations encompassing learning behavior, social interaction, physical health, and mental state. For student ii i, the feature vector is defined as:

To overcome the limitation of traditional methods that ignore potential relationships among students, this study innovatively introduces a multi-topology graph neural network architecture. Based on different similarity measures and learning style features, four types of student relationship graphs are constructed:

where the cosine similarity graph captures angular relationships in feature space, the Euclidean distance graph reflects overall feature differences, the learning style graph models personalized learning patterns, and the PERMA-weighted graph is constructed based on psychological theory:

where is the theoretical weight for the pp p-th PERMA dimension, and represents the feature representation of student ii i in the pp p-th PERMA dimension.

3.2.2. PERMA theory-driven feature embedding.

To achieve representation learning from raw educational data to psychologically meaningful representations, this module designs a PERMA theory-based feature embedding mechanism. PERMA theory decomposes well-being into five core dimensions: Positive emotions, Engagement, Relationships, Meaning, and Achievement, providing a theoretical interpretation framework for the model.

The mapping from features to PERMA dimensions is achieved through learnable linear transformations:

where is the feature-PERMA mapping matrix, initialized using weight allocation strategies based on psychological prior knowledge. For example, social time features are assigned higher initial weights for the “Relationships” dimension, while learning engagement is assigned higher weights for the “Engagement” dimension.

To enhance inter-PERMA dimension relationship modeling and support cross-cultural adaptation, a cross-modal attention mechanism is introduced:

where represents the student relationship representations extracted by the graph neural network, achieving deep fusion of psychological theoretical knowledge with social network information.

3.2.3. Multi-topology graph neural network processing.

Addressing the complexity and diversity of student relationships, this module employs a multi-topology graph neural network architecture that collaboratively processes different types of student relationship information through Graph Convolutional Networks (GCN), Graph Attention Networks (GAT), and attention fusion layers.

For the -th graph topology, the GCN layer captures similarity patterns among students through neighborhood aggregation:

The GAT layer further introduces adaptive attention mechanisms to achieve differentiated modeling of different neighbor nodes:

The most critical innovation lies in the graph-level attention fusion mechanism, which can dynamically adjust the importance of different topologies based on learning styles and stress levels:

where and represent the embedding representations of learning style and stress level respectively, achieving personalized graph information fusion.

3.2.4. PERMA-aligned transformer encoding and prediction.

To fully utilize the structured knowledge of PERMA theory, this module designs an enhanced Transformer architecture aligned with PERMA dimensions. The architecture employs a five-head attention mechanism, with each attention head specifically responsible for information processing of one PERMA dimension:

This design ensures balanced attention to the five PERMA dimensions by the model, avoiding the problem of any single dimension dominating the prediction results.

The model output includes overall well-being prediction and five-dimensional PERMA decomposition prediction, providing multi-level decision support information for educators:

To ensure prediction consistency and enhance the theoretical interpretability of the model, a multi-task learning framework is adopted:

where the consistency loss is defined as:

This design ensures theoretical consistency between overall well-being prediction and PERMA dimension predictions, improving the credibility and practical application value of the model.

4. Experiments and analysis

4.1. Experimental setup

4.1.1. Dataset description.

To validate the prediction performance and cross-cultural adaptability of the PERMA-GNN-Transformer model across different cultural backgrounds, this study employs two representative datasets from the Kaggle platform for experimentation. Dataset selection follows three core criteria: sufficient sample size to support deep learning model training, feature dimensions with good mapping relationships to the PERMA theoretical framework, and significant cultural background differences to support cross-cultural validation experiments.

The first dataset is the “Lifestyle and Wellbeing Data,” which contains 12,757 valid samples and 23 feature dimensions, sourced from work-life balance surveys on www.Authentic-Happiness.com. This dataset evaluates subjects’ comprehensive performance in professional and personal life, with features covering five major dimensions: healthy body, healthy mind, professional skill development, social connection strength, and life meaning perception. The subjects are primarily from Western cultural backgrounds, with 62% female and 38% male participants, providing rich data foundation for studying student well-being prediction in Western educational cultural environments.

The second dataset is the “International Student Mental Health Dataset,” which collects mental health and well-being data from 268 students at an international university in Japan, with 50% international students and 50% domestic students. Data collection employed standardized psychological measurement tools, including the Patient Health Questionnaire (PHQ-9) for depression assessment, the Acculturative Stress Scale for International Students (ASSIS) for cultural adaptation stress measurement, social connection scales for social relationship assessment, and indicators related to suicidal ideation and help-seeking behavior. This dataset provides valuable empirical data for student mental health research in East Asian cultural contexts.

The feature dimensions of both datasets can effectively map to the five core constructs of PERMA theory, providing data support for constructing theory-driven feature representations. Meanwhile, the significant differences between Western and East Asian cultural backgrounds provide ideal experimental conditions for evaluating the model’s cross-cultural generalization ability.

4.1.2. Experimental environment configuration.

This study conducts experimental validation based on a high-performance computing platform. The computing environment is configured with NVIDIA GeForce RTX 4090 GPU (24GB VRAM), Intel Core i9-13900K processor, and 128GB memory to support large-scale training and complex graph structure computations of the PERMA-GNN-Transformer model. The software architecture employs PyTorch 2.1.0 deep learning framework and PyTorch Geometric 2.4.0 graph neural network library, combined with Python 3.10 scientific computing ecosystem, including core libraries such as NumPy, Pandas, and Scikit-learn, providing technical support for model implementation and data analysis. This configuration effectively handles the parallel computing requirements of multi-topology graph neural networks while ensuring efficient operation of Transformer attention mechanisms.

Dataset splitting follows standard practices in deep learning research, adopting a 7:2:1 ratio for training, validation, and test sets. Considering the multi-dimensional nature of student well-being prediction and the complexity of PERMA theory, the larger training set proportion (70%) benefits the model in fully learning the nonlinear mapping relationships between student behavioral features and well-being dimensions, while a sufficient validation set (20%) provides reliable basis for hyperparameter optimization and model selection. The data splitting process employs stratified sampling methods to ensure distribution consistency across subsets in key variables such as stress level and learning style, thereby improving the validity of model evaluation. To ensure statistical reliability of research results, all experiments adopt multiple independent runs with averaged results, optimizing model convergence through early stopping mechanisms and learning rate scheduling strategies to maximize model generalization performance.

4.1.3. Model parameter settings.

The parameter configuration of the PERMA-GNN-Transformer model is determined based on theoretical analysis and preliminary experimental results to balance model expressiveness and computational efficiency. The Transformer encoder adopts a 6-layer stacked structure with hidden layer dimension set to 256. The multi-head attention mechanism is configured with 5 attention heads corresponding to the five core dimensions of PERMA theory, with each attention head dimension set to 64. The output dimension of the PERMA feature embedding layer is set to 128, ensuring sufficient representation space while avoiding overfitting risks due to excessive dimensions. The hidden layer dimension of the feedforward network is set to 1024, with GELU activation function to provide smoother gradient characteristics. Dropout rates are set to 0.1 in both attention layers and feedforward networks to enhance model generalization capability.

Graph neural network components are configured separately for four topology structures. GCN layer number is set to 3 layers with hidden dimension of 128 per layer, balancing the propagation depth of graph structure information with computational complexity. GAT components employ 8 attention heads with head dimension of 32, supporting fine-grained modeling of different student relationships. The weights of the graph-level attention fusion mechanism are dynamically adjusted through learnable parameters, with initial weights uniformly distributed as 0.25. Edge weight threshold is set to 0.3 to filter weak connections for noise reduction and improved computational efficiency. Normalization in graph convolution operations adopts symmetric normalization to ensure numerical stability and convergence.

Model training employs the AdamW optimizer with initial learning rate set to 2 × 10−4 and weight decay coefficient of 1 × 10−5 to prevent overfitting. Learning rate scheduling adopts cosine annealing strategy with minimum learning rate of 1 × 10−6 and training period of 100 epochs. Batch size is set to 32, achieving balance between memory constraints and gradient stability. Multi-task weights λ1, λ2, λ3 in the loss function are set to 1.0, 0.8, 0.5 respectively, emphasizing the importance of overall well-being prediction while ensuring effectiveness of PERMA dimension prediction and consistency constraints. Gradient clipping threshold is set to 1.0 to prevent gradient explosion, and early stopping strategy patience is set to 15 epochs, achieving optimal balance between model convergence and training efficiency.

Complete hyperparameter configurations are provided in S1 Appendix, including all optimizer parameters (β1 = 0.9, β2 = 0.999, ε = 1 × 10−8, weight decay = 1 × 10−5), weight initialization strategies (Xavier Uniform for Transformer and PERMA embedding with psychological priors, Kaiming Normal for GNN layers), learning rate scheduler details (Cosine Annealing with 500 warmup steps, minimum LR = 1 × 10−6), regularization settings (gradient clipping threshold = 1.0, dropout rate = 0.1 across all layers), and graph construction parameters (threshold settings for similarity-based graphs, normalization schemes for adjacency matrices). Specifically, Table A1 in S1 Appendix presents the complete training configuration and optimization details, while Table A2 in S1 Appendix documents the architectural parameters for all model components.

4.2. Evaluation metrics

To comprehensively evaluate the prediction performance of the PERMA-GNN-Transformer model and validate its effectiveness in student well-being prediction tasks, this study constructs a multi-level evaluation metric system. This system not only includes standard evaluation metrics for traditional regression tasks but also innovatively proposes PERMA theory-based comprehensive evaluation metrics to fully reflect the model’s prediction capability and theoretical consistency under psychological theory guidance.

4.2.1. Traditional regression evaluation metrics.

This study employs Mean Absolute Error (MAE) as the baseline evaluation metric to measure the average absolute deviation between predicted and true values. The calculation formula for MAE is , where is the number of samples, is the true well-being label of the -th sample, and is the corresponding predicted value. MAE has good intuitiveness and robustness, without excessive penalty due to outliers, making it particularly suitable for evaluating prediction accuracy of subjective evaluation indicators like student well-being. Its physical meaning is clear and can directly reflect the average deviation between model prediction results and students’ actual well-being levels.

Root Mean Square Error (RMSE) serves as a supplementary evaluation metric, evaluating model performance by calculating the root mean square of prediction errors. Its calculation formula is . Compared to MAE, RMSE assigns higher weights to larger errors and can effectively identify model prediction failures in extreme situations. In student well-being prediction applications, the sensitivity of RMSE has important significance because prediction failures for high-risk students may have serious consequences. A smaller RMSE indicates that the model can stably avoid large prediction deviations, ensuring timeliness and accuracy of educational interventions.

4.2.2. PERMA theory comprehensive evaluation metrics.

Based on PERMA positive psychology theory, this study proposes an innovative comprehensive evaluation metric system, which is one of the core theoretical contributions of this paper. Traditional evaluation metrics only focus on prediction accuracy while ignoring the inherent logic of psychological theory and inter-dimensional relationships, making it difficult to effectively evaluate model prediction quality under theoretical guidance. PERMA theory emphasizes that well-being consists of five dimensions: positive emotions, engagement, relationships, meaning, and achievement, with complex interactions among dimensions. Therefore, an effective student well-being prediction model not only needs to accurately predict overall well-being levels but also needs to provide consistent and meaningful prediction results across the five PERMA dimensions, ensuring that model outputs conform to psychological theoretical expectations and possess educational application value.

The PERMA theory comprehensive evaluation metrics constructed in this study include three core components that evaluate model theoretical consistency and prediction quality from different perspectives. The PERMA Dimension Accuracy (PDA) metric evaluates the model’s individual prediction accuracy for the five dimensions of positive emotions, engagement, relationships, meaning, and achievement. The calculation formula is , where and represent the true label and predicted value of the -th student on the -th PERMA dimension respectively. The PERMA Consistency Index (PCI) evaluates the theoretical consistency between overall well-being prediction and PERMA dimension predictions. The calculation formula is , ensuring that model prediction results conform to the basic assumption in PERMA theory that overall well-being should be consistent with the average level of each dimension. The PERMA Comprehensive Evaluation (PCE) metric comprehensively reflects model performance at multiple levels through weighted fusion. The calculation formula is , where , , are weight parameters, and is the normalized RMSE value.

The innovation of this metric system lies in transforming qualitative descriptions of psychological theory into quantitative evaluation standards, providing a scientific evaluation framework for theory-driven artificial intelligence models. Through the PDA metric, researchers can identify the model’s prediction capabilities on specific PERMA dimensions, providing precise guidance for personalized educational interventions. Through the PCI metric, it can be verified whether the model follows the inherent logic of psychological theory, avoiding theoretical contradictions in prediction results. Through the PCE metric, a comprehensive evaluation of the model’s accuracy, consistency, and theoretical compliance can be obtained, providing unified standards for comparisons between different models. This evaluation system is not only applicable to student well-being prediction tasks but also provides a generalizable evaluation methodology for other psychology theory-guided artificial intelligence applications, promoting theoretical development and practical innovation in interdisciplinary research fields.

4.2.3. Statistical significance testing.

To verify the statistical reliability of PERMA-GNN-Transformer model performance improvements, this study employs paired t-tests for statistical significance analysis and evaluates the statistical significance of performance differences between models through p-values. Paired t-tests are particularly suitable for the experimental design of this study because prediction results of different models on the same dataset have natural pairing properties, effectively controlling the influence of individual differences and providing more accurate statistical inference. Let the prediction errors of two models on the -th sample be and respectively, with error difference . The test statistic for paired t-test is , where is the sample mean of error differences, is the sample standard deviation, and is the sample size.

The p-values calculated based on t-statistics can quantify the statistical significance of observed performance differences. In the research context of student well-being prediction, the statistical significance of p-values has important practical value: when p < 0.001, it indicates that model performance differences have extremely high statistical significance, and such differences may have substantial impact in actual educational applications; when , it indicates that performance differences have high statistical significance, sufficient to support actual deployment of the model in educational technology systems; when , it is considered that there is insufficient statistical evidence to prove the significance of performance differences. The experimental results of this study show that the PERMA-GNN-Transformer model achieves significance levels of p < 0.001 compared to traditional machine learning methods on both datasets, and compared to state-of-the-art deep learning methods, fully validating the statistical superiority of combining PERMA theory with graph neural networks and Transformer architecture, providing reliable statistical evidence for theory-driven student well-being prediction models.

4.3. Comparative experiments

As shown in Table 1, to systematically validate the performance advantages of the PERMA-GNN-Transformer model, this study constructs a comprehensive evaluation framework encompassing 22 representative methods. These methods span the technological evolution path from traditional statistical learning to state-of-the-art deep learning, with comprehensive performance comparisons conducted on two datasets with different characteristics. The Lifestyle and Wellbeing Data (n = 12,757) provides a large-scale comprehensive evaluation environment, while the International Student Mental Health Dataset (n = 268) represents small-scale specialized application scenarios, ensuring the universality and reliability of experimental results.

4.3.1. Performance limitations of traditional machine learning methods.

Traditional machine learning methods demonstrate significant performance bottlenecks in student well-being prediction tasks, revealing fundamental limitations of linear and shallow nonlinear models in handling complex psychological constructs. On large-scale datasets, linear regression methods perform worst (MAE = 0.356), indicating that there exist highly nonlinear mapping relationships between student behavioral features and well-being, and simple linear assumptions cannot effectively model this complexity. Ensemble learning methods achieve relatively optimal performance through multi-model fusion strategies, with Gradient Boosting obtaining MAE = 0.276, a 22.5% improvement compared to linear methods, proving the effectiveness of nonlinear ensemble strategies in capturing feature interactions.

However, the fundamental deficiency of traditional methods lies in their lack of structured modeling capability for psychological theories. These methods cannot provide decomposed predictions for PERMA dimensions and are unable to give effective results on theory-driven indicators such as PDA, PCI, and PCE, severely limiting their value in educational psychology applications. Furthermore, traditional methods show more pronounced performance degradation on small-scale datasets, reflecting their strong dependence on large sample data and insufficient generalization capability, which forms a sharp contradiction with the challenges of data scarcity in real educational environments.

4.3.2. Breakthroughs and progress of deep learning methods.

Deep learning methods mark an important breakthrough in student well-being prediction technology, achieving direct modeling and prediction of PERMA theoretical dimensions for the first time. The Transformer architecture, with its powerful self-attention mechanism, achieves performance on large-scale datasets that traditional methods cannot reach (MAE = 0.221, PCE = 0.649), representing a 19.9% accuracy improvement compared to the optimal traditional method. This leap forward stems from deep learning’s fundamental advantages in feature representation learning and nonlinear mapping, enabling models to automatically discover latent patterns and complex associations in student behavioral data.

Sequential modeling methods (LSTM, GRU, BiLSTM) demonstrate unique advantages in PERMA dimension prediction, with LSTM achieving a PDA score of 0.648, significantly outperforming feedforward network architectures. This result confirms the temporal dependency characteristics of student psychological states and the specialized capability of recurrent neural networks in capturing such temporal dynamics. Convolutional neural networks achieve hierarchical feature fusion from local to global through hierarchical feature extraction, with ResNet’s residual connection mechanism further improving training stability of deep networks, achieving excellent performance of 0.667 on the PDA indicator.

Another key contribution of deep learning methods lies in significant improvement of theoretical consistency. PCI indicators generally exceed 0.590, indicating that deep model predictions maintain good internal consistency with PERMA theoretical assumptions, providing educators with interpretable and trustworthy decision support tools, bridging the theoretical gap between technical prediction and educational practice.

4.3.3. Relationship modeling breakthrough of graph neural networks.

Graph neural network methods achieve further performance leaps by introducing student relationship modeling, validating the core role of social network structures in student well-being formation mechanisms. GraphSAGE effectively handles computational challenges of large-scale student networks through sampling aggregation strategies, obtaining optimal graph network performance while maintaining efficiency (MAE = 0.218, PCE = 0.655), achieving 1.4% accuracy improvement and 0.9% comprehensive indicator improvement compared to the best deep learning method.

GAT’s attention mechanism brings new perspectives to student relationship modeling by dynamically allocating importance weights of different neighboring students, achieving personalized social influence modeling. Its excellent performance in PERMA dimension prediction (PDA = 0.678, PCI = 0.645) indicates that attention mechanisms can effectively identify social relationships that have key impacts on individual well-being, providing important insights for understanding student group dynamics. Although basic GCN methods have slightly lower performance, they still significantly surpass traditional deep learning methods, confirming the fundamental value of graph convolution operations in aggregating neighborhood information.

The success of graph neural networks reveals the collaborative importance of individual features and group relationships in student well-being prediction, providing solid empirical foundation for the multi-topology graph design in this study. The effectiveness of this technical approach indicates that future student mental health monitoring systems must fully consider the complex influences of social networks.

4.3.4. Advanced baselines of state-of-the-art methods.

Two state-of-the-art methods represent the current technological frontier in student well-being prediction, providing challenging performance benchmarks for this study. King et al. (2024), based on ecological theoretical framework and large-scale cross-cultural data from 520,000 students, demonstrates the scale advantages of data-driven methods, achieving robust prediction performance on both test datasets (large-scale dataset MAE = 0.201, small-scale dataset MAE = 0.205) [28]. The consistency performance of their method proves that sufficient data scale can effectively alleviate model generalization challenges.

Shahzad et al. (2024) achieved significant progress in PERMA dimension modeling through multimodal data fusion and advanced deep learning technologies, with PDA reaching 0.702 and 0.681 respectively. This method validates the effectiveness of theory-driven modeling strategies and provides technical examples for comprehensive utilization of multimodal student data. Both state-of-the-art methods achieve p < 0.01 levels in statistical significance tests, compared to p < 0.001 for traditional methods, showing smaller performance gaps and reflecting the diminishing marginal effects of technological development [29].

The adaptability differences shown by state-of-the-art methods on different datasets reveal the limitations of current technologies. King et al.‘s advantages on large-scale data versus Shahzad et al.’s stability on small-scale data reflect the trade-off relationship between data scale and theoretical guidance, providing important reference for the theory-driven method design in this study.

4.3.5. Comprehensive advantages of PERMA-GNN-transformer.

The PERMA-GNN-Transformer model achieves optimal performance across all evaluation dimensions, fully validating the effectiveness of the innovative architecture that deeply integrates positive psychology theory, graph neural networks, and self-attention mechanisms. Compared to the best-performing state-of-the-art method King et al., this study’s method achieves significant improvement from MAE 0.201 to 0.163 on large-scale datasets (18.9% improvement), and PCE improvement from 0.681 to 0.792 (16.3% improvement). On small-scale datasets, the performance advantages are even more prominent, with MAE improvement reaching 27.8% and PCE improvement of 17.4%, demonstrating the method’s strong generalization capability.

PERMA theory-driven feature representation learning produces breakthrough effects, with PDA indicators reaching 0.841 and 0.823 on both datasets respectively, representing over 17% improvement compared to optimal baseline methods. This achievement indicates that structured knowledge from psychological theory can significantly enhance the model’s ability to identify and predict fine-grained dimensions of well-being. The excellent performance of PCI indicators (0.798 and 0.785) further confirms the high consistency between model predictions and theoretical assumptions, ensuring the psychological significance and educational application value of prediction results.

The multi-topology graph neural network architecture achieves comprehensive capture of student social network complexity by simultaneously modeling four types of student relationships: cosine similarity, Euclidean distance, learning styles, and PERMA weighting. The graph-level attention fusion mechanism dynamically adjusts the importance of different topologies based on individual features, providing personalized relationship weight allocation for each student. This design significantly improves the model’s adaptability and precision.

4.3.6. Statistical significance and generalization capability analysis.

Statistical significance test results provide rigorous statistical evidence for the performance advantages of the PERMA-GNN-Transformer model. Compared to traditional machine learning and basic deep learning methods, all comparisons reach p < 0.001 extremely high significance levels, indicating that performance improvements have extremely strong statistical reliability. The p < 0.01 significance level compared to graph neural networks and state-of-the-art methods, although with reduced gaps, still holds important statistical significance considering the advanced nature of these methods.

Cross-dataset performance analysis reveals the unique advantage pattern of this study’s method. As dataset size decreases, the relative advantages of PERMA-GNN-Transformer significantly increase, a phenomenon stemming from the key guiding role of PERMA theoretical prior knowledge in data-scarce environments. When training samples are limited, purely data-driven methods face serious overfitting risks, while the five-dimensional structure of PERMA theory provides a stable learning framework for the model, enabling the feature embedding process to fully utilize psychological domain knowledge and significantly reduce dependence on large-scale training data.

The robustness of multi-topology graph structures in small-sample scenarios further enhances the model’s generalization capability. By encoding multiple relationship patterns among students, even in data-scarce situations, the model can still effectively capture group dynamic features and social influence patterns. This finding establishes the unique value of theory-driven methods in practical educational applications, particularly having irreplaceable practical significance in educational environments where data collection costs are high or privacy protection requirements are strict.

4.4. Ablation study

To deeply understand the specific contributions of each component in the PERMA-GNN-Transformer model, this study designs systematic ablation experiments by gradually adding key modules to verify the effectiveness of each innovation point. As shown in Fig 2, experiments are conducted on both datasets to evaluate the performance differences and synergistic effects of different components under different data scales.

4.4.1. Performance analysis of stepwise model component addition.

The ablation experiment results clearly demonstrate a step-wise performance improvement process. On the large-scale dataset, starting from the traditional machine learning baseline (MAE = 0.285), the introduction of deep learning architecture brings significant initial improvement, with MAE decreasing to 0.243, representing a 14.7% relative improvement. This result validates the fundamental advantages of deep neural networks in capturing complex nonlinear relationships in student behavioral features, laying the technical foundation for effective integration of subsequent modules.

The addition of the PERMA feature embedding mechanism achieves further performance leaps, with MAE decreasing from 0.243 to 0.215, bringing an additional 11.5% improvement. This improvement magnitude indicates that transforming structured knowledge from positive psychology theory into learnable feature representations can effectively guide the model learning process, enabling raw educational data to map to psychologically meaningful representation spaces. The PERMA embedding mechanism not only improves prediction accuracy but, more importantly, provides a solid foundation for the model’s theoretical interpretability.

The integration of multi-topology graph neural networks brings significant performance improvements, with MAE further decreasing to 0.198, representing a 7.9% improvement relative to the previous stage. This result fully proves the important value of student relationship modeling in well-being prediction. Multi-topology graph structures comprehensively capture complex social dynamics and mutual influence mechanisms in student groups by simultaneously considering multiple relationship patterns including cosine similarity, Euclidean distance, learning styles, and PERMA weighting.

4.4.2. Fine-grained modeling effects of attention mechanisms.

The introduction of attention mechanisms shows different improvement patterns on both datasets, reflecting their adaptive characteristics in different data environments. On the large-scale dataset, attention mechanisms reduce MAE from 0.198 to 0.189, achieving a 4.5% performance improvement; while on the small-scale dataset, the same mechanism brings a significant 10.6% improvement (from 0.218 to 0.195). This differential performance indicates that attention mechanisms can more effectively focus on key features in data-scarce environments, maximizing the utilization efficiency of limited information through dynamic weight allocation mechanisms.

The final performance of the complete model validates the effective synergistic effects among components. On the large-scale dataset, the complete model achieves excellent results with MAE = 0.163, representing a 42.8% overall improvement compared to the baseline method; on the small-scale dataset, the performance of MAE = 0.148 achieves an even more significant 52.6% improvement. This more pronounced performance advantage on small datasets further confirms the unique value of theory-driven methods in data-scarce scenarios.

4.4.3. Quantitative analysis of component contributions.

Component contribution analysis reveals the specific action mechanisms of different modules in overall performance improvement. The PERMA feature embedding mechanism is the single component with the largest contribution, bringing MAE improvements of 0.028 and 0.033 on large-scale and small-scale datasets respectively, accounting for 23.0% and 20.1% of total improvement. This result confirms the core position of psychological theory guidance in student well-being prediction, where theory-driven feature representations can provide stable learning frameworks and semantic constraints for models.

Multi-topology graph neural networks, as the second largest contributing component, achieve MAE improvements of 0.017 and 0.020 on both datasets respectively, accounting for 13.9% and 12.2% of total improvement. This contribution degree proves the importance of student relationship modeling, where the synergistic effects of multiple graph topologies can capture complex association patterns among students from different perspectives, providing rich group context information for individual well-being prediction.

Attention mechanisms show different contribution patterns on both datasets: contributing 0.009 improvement (7.4%) on the large-scale dataset, while contributing 0.023 improvement (14.0%) on the small-scale dataset. This differential contribution further validates the special value of attention mechanisms in data-scarce environments, where their dynamic feature selection capability plays a more important role under limited data conditions.

4.4.4. Cross-dataset consistency and generalization capability assessment.

Ablation experiment results demonstrate remarkable cross-dataset consistency, with all components showing positive contributions on both datasets with different characteristics, proving the robustness and generalization capability of the model design. Particularly noteworthy is that as dataset size decreases, the model’s relative performance advantages show an increasing trend, with overall improvement magnitude increasing from 42.8% to 52.6%. The fundamental reason for this phenomenon is that the psychological prior knowledge provided by PERMA theory plays a more critical guiding role in data-scarce situations, effectively alleviating overfitting risks in small-sample learning.

Analysis of synergistic effects among components shows that overall performance improvement (0.122 and 0.164) exceeds the simple linear superposition of individual component contributions, confirming the existence of positive nonlinear interactions among PERMA theory guidance, graph neural networks, and attention mechanisms. This synergistic effect embodies the deep integration of theoretical guidance and technological innovation, providing important methodological insights for constructing more effective student well-being prediction models. The systematic results of ablation experiments not only validate the scientific nature of model design but also provide clear improvement directions for optimization and extension of components in subsequent research.

4.5. Hyperparameter experiments

To fully unleash the performance potential of the PERMA-GNN-Transformer model and validate the robustness of model design, this study conducts systematic sensitivity analysis on key hyperparameters that affect model performance. Hyperparameter optimization not only directly impacts model prediction accuracy but also relates to the effectiveness of PERMA theory-driven feature learning and the stability of cross-cultural adaptation capability. Fig 3 shows the impact patterns of four core hyperparameters on MAE performance metrics and PERMA comprehensive evaluation indicators.

4.5.1. Learning rate sensitivity analysis.

Learning rate, as a key hyperparameter in deep learning model training, directly affects model convergence speed and final performance. Experimental results show that the learning rate achieves optimal performance at 1 × 10−4, with MAE dropping to 0.163, while PCE, PDA, and PCI three PERMA indicators reach excellent levels of 0.792, 0.841, and 0.798 respectively. When the learning rate is too small (1 × 10−5, 5 × 10−5), the model converges slowly, with MAE at 0.185 and 0.171 respectively, indicating that PERMA feature embedding and graph neural network components fail to fully learn effective representations. Conversely, when the learning rate is too large (5 × 10−4, 1 × 10−3), MAE sharply increases to 0.179 and 0.201, with PERMA indicators also significantly declining, possibly due to training instability caused by excessively large gradient update steps, particularly prone to oscillation phenomena in multi-topology graph attention fusion mechanisms.

It is noteworthy that the change trends of PERMA comprehensive evaluation indicators are highly consistent with MAE, proving the internal consistency between the theory-driven evaluation framework and traditional accuracy indicators. The PCE indicator reaches its peak under optimal learning rate, indicating that the model can effectively balance the consistency between overall well-being prediction and PERMA five-dimensional decomposition prediction, validating the important role of psychological theory guidance in hyperparameter optimization.

4.5.2. Impact of batch size on model performance.

Batch size experiments reveal the performance characteristics of the PERMA-GNN-Transformer model under different data sampling strategies. Experimental results indicate that the model achieves optimal performance when batch size is 32, with MAE at 0.163 and all PERMA indicators reaching highest levels. Smaller batch sizes (8, 16), although providing more frequent gradient updates, show MAE at 0.178 and 0.169 respectively with slightly decreased performance, possibly due to insufficient sample diversity affecting the graph neural network’s full learning of student relationship patterns.

When batch size increases to 64 and 128, MAE rises to 0.171 and 0.183, with corresponding decreases in PERMA indicators. This phenomenon can be explained from two perspectives: first, larger batch sizes reduce gradient update frequency, potentially causing the model to converge to suboptimal solutions on complex multi-topology graph structures; second, large-batch training may weaken PERMA theory-driven personalized feature learning capability, as heterogeneity among students within batches may blur fine-grained psychological feature patterns.

Particularly noteworthy is that the PCE indicator reaches a peak of 0.792 when batch size is 32, then steadily decreases as batch size increases, indicating that moderate batch size helps maintain theoretical consistency between overall well-being and PERMA dimension predictions, while excessively large batches may disrupt this delicate balance relationship.

4.5.3. Representation capability analysis of hidden layer dimensions.

Hidden layer dimensions determine the model’s representation capability and parameter complexity, directly affecting the effectiveness of PERMA feature embedding and Transformer encoding. Experimental results show that the model performs optimally when hidden layer dimension is 256, with MAE reaching 0.163 and PCE, PDA, PCI indicators at 0.792, 0.841, and 0.785 respectively. Smaller hidden layer dimensions (64, 128) limit the model’s representation capacity, with MAE at 0.189 and 0.175 respectively, making it difficult to fully encode the complex mapping relationships between student behavioral features and PERMA psychological constructs.

As hidden layer dimensions increase to 512 and 1024, although the model’s theoretical representation capability is enhanced, MAE actually rises to 0.167 and 0.174, with PERMA indicators also showing declining trends. This phenomenon reflects the classic overfitting problem in deep learning: excessively large parameter spaces make models prone to learning noise patterns in training data rather than generalizable psychological patterns. More importantly, overly large hidden layer dimensions may disrupt the simplicity of PERMA theoretical structure, as psychological theories inherently have relatively stable dimensional divisions, and overly complex representations may deviate from theoretical essence.

The optimal configuration of dimension 256 precisely balances representation capability with model complexity, enabling sufficient learning of fine-grained features across PERMA five dimensions while avoiding overfitting risks, embodying the elegance of theory-driven model design.

4.5.4. Multi-dimensional modeling effects of attention head numbers.

The choice of attention head numbers directly affects the multi-dimensional information processing capability of PERMA-aligned Transformers. Experimental results indicate that the model achieves optimal performance with 5 attention heads, with MAE at 0.163, perfectly corresponding to the five-dimensional structure of PERMA theory. With single attention head (1 head), MAE reaches as high as 0.198, with PERMA indicators also significantly lower, indicating that single-head attention cannot effectively capture the complex interaction patterns among the five dimensions of positive emotions, engagement, relationships, meaning, and achievement.

When attention head number increases to 3, model performance significantly improves (MAE = 0.178) but still fails to reach optimal levels, suggesting that insufficient attention heads cannot fully model the independence and inter-relationships of PERMA five dimensions. The optimal 5-head configuration achieves perfect integration of theory and technology: each attention head specifically handles information processing for one PERMA dimension, ensuring balanced attention and deep understanding of each dimension.

When attention head numbers further increase to 8 and 12, MAE rises to 0.169 and 0.181, showing performance decline. This phenomenon indicates that excessive attention heads may cause information redundancy and attention dispersion, disrupting the inherent logic of PERMA theoretical structure. Particularly, configurations exceeding 5 heads may introduce attention patterns unrelated to PERMA theory, deviating from the original intention of theory-driven design.

4.5.5. Hyperparameter interaction effects and optimal configuration determination.

Through comprehensive analysis of the independent effects and interactions of four key hyperparameters, this study determines the optimal hyperparameter configuration for the PERMA-GNN-Transformer model: learning rate 1 × 10−4, batch size 32, hidden layer dimension 256, attention head number 5. This configuration achieves optimal performance across all evaluation indicators, with MAE dropping to 0.163, representing a 32.9% improvement compared to baseline configuration, and PCE, PDA, PCI three PERMA indicators reaching 0.792, 0.841, and 0.798 respectively, representing improvements of 27.3%, 29.8%, and 30.4% compared to baseline configuration.

Hyperparameter sensitivity analysis reveals the inherent logic of model design: the optimal configuration not only achieves coordination of deep learning components at the technical level but also embodies the structured guidance of PERMA psychological theory at the theoretical level. The moderate choice of learning rate ensures stable convergence of theory-driven feature learning; the balanced design of batch size balances individual difference modeling with group pattern learning; the reasonable configuration of hidden layer dimensions achieves optimal balance between representation capability and theoretical simplicity; the perfect correspondence between attention head numbers and PERMA dimensions embodies the core advantages of theory-driven architectural design.

4.6. Case study analysis

4.6.1. In-depth analysis cases of typical individual students.

To validate the individual prediction effectiveness of the PERMA-GNN-Transformer model, this study selects three representative student cases for in-depth analysis. As shown in Fig 4, Student A (high well-being) exhibits a balanced development pattern across PERMA five dimensions, with the highest score in positive emotions dimension (0.85), excellent performance in relationships (0.82) and achievement (0.80) dimensions, and only relatively lower meaning dimension (0.75). Student B (moderate well-being) shows obvious developmental imbalance, with optimal performance in engagement dimension (0.60), but significant shortcomings in relationships dimension (0.45), becoming a key factor limiting overall well-being. Student C (low well-being) is at low levels across all dimensions, with the lowest score in meaning dimension (0.28) and positive emotions dimension at only 0.25, reflecting a mental health state requiring focused attention.

thumbnail
Fig 4. In-depth analysis cases of typical individual students.

https://doi.org/10.1371/journal.pone.0338693.g004

The case analysis fully validates the theoretical consistency and prediction accuracy of the model. The PERMA dimension prediction results of the three students highly correspond to their overall well-being levels, conforming to the basic assumption of collaborative effects among dimensions in positive psychology theory. The radar charts clearly display the unique psychological characteristic patterns of different students: Student A’s near-circular balanced distribution, Student B’s relationship dimension depression, and Student C’s overall contracted form. These differentiated prediction results reflect the model’s effective modeling of individual psychological construct complexity, providing precise quantitative basis for theory-driven personalized educational interventions.

Based on PERMA dimension analysis, this study proposes differentiated intervention strategies: Student A should focus on improving the meaning dimension while leveraging peer mentoring roles, Student B should prioritize improving relationship shortcomings while maintaining engagement advantages, and Student C should adopt comprehensive intervention with emphasis on positive emotion cultivation and meaning reconstruction. This case validates that the PERMA-GNN-Transformer model not only accurately predicts student well-being levels but also provides personalized intervention plans with psychological theoretical guidance significance for educators, demonstrating the unique value of theory-driven artificial intelligence models in educational psychology applications.

4.6.2. Multi-topology graph neural network visualization cases.

To validate the effectiveness of the multi-topology graph neural network architecture in the PERMA-GNN-Transformer model, this study demonstrates the differentiated modeling capabilities of four graph topology structures and the adaptive characteristics of graph-level attention fusion mechanisms through visualization analysis. Fig 5 shows the connection pattern differences of cosine similarity graph, Euclidean distance graph, learning style graph, and PERMA-weighted graph four topology structures on the same student population, while Fig 6 further analyzes the attention weight allocation strategies of different types of students for each graph topology.

Fig 5 clearly demonstrates the unique advantages and complementarity of the four graph topologies in capturing student relationships. The cosine similarity graph (104 connections) presents a high-density connection pattern, mainly establishing connections based on angular relationships of academic features and well-being, effectively identifying similarity patterns in feature space. The Euclidean distance graph (57 connections) shows relatively sparse connection structures, establishing relationships through overall feature difference measurements, focusing more on global similarity among students. The learning style graph (30 connections) presents the sparsest connection pattern, reflecting group clustering characteristics based on learning preferences, reflecting the natural grouping properties of learning styles as discrete features. The PERMA-weighted graph (86 connections) displays medium-density connection networks, achieving theory-driven modeling of student psychological relationships by integrating multi-dimensional weights from psychological theory. The well-being level distribution encoded by node colors validates the effectiveness of different topology structures in modeling student population heterogeneity, with high well-being students (dark blue nodes) and low well-being students (red nodes) presenting different connection patterns and neighborhood structures across various graph topologies.

The attention weight analysis in Fig 6 reveals the intelligent mechanism by which the model dynamically adjusts graph topology importance based on individual student characteristics. Academic-oriented students show the highest attention weight (0.40) for cosine similarity graphs, reflecting that such students rely more on academic feature-based similarity connections for well-being prediction; they also maintain high weights (0.30) for Euclidean distance graphs, reflecting the strong association between academic performance and overall feature similarity. Social-oriented students allocate the highest attention weight (0.40) to PERMA-weighted graphs, validating the core role of psychology theory-driven relationship modeling in social-type student well-being prediction. Stress-type students’ attention weights present relatively balanced distribution patterns, with highest weight (0.35) for Euclidean distance graphs, indicating that students under stress need support networks based on overall similarity more. Balanced-type students achieve completely equal weight allocation (0.25 each), reflecting the balanced dependence of comprehensive psychological characteristics on multiple relationship patterns. The heatmap further confirms the rationality and interpretability of attention weight allocation, with weight differences among different student types reflecting the precision of personalized relationship modeling.

The visualization analysis of multi-topology graph neural networks fully validates the theoretical rationality and technical advantages of this architectural design. The connection density differences of the four graph topologies (from 30 to 104) reflect the complementary roles of different similarity measures in student relationship modeling, avoiding the limitations of single topology structures. The graph-level attention mechanism achieves personalized graph information fusion through dynamic adjustment based on learning styles and stress levels, ensuring that each student receives relationship modeling weights most suitable for their psychological characteristics. Particularly, the high weight allocation of PERMA-weighted graphs among social-oriented students directly validates the effectiveness and necessity of positive psychology theory in student relationship modeling. This visualization case not only demonstrates the innovation of multi-topology graph neural networks at the technical level but more importantly proves that theory-driven graph structure design can provide precise personalized relationship modeling support for different types of students, providing solid technical foundation for the accuracy and interpretability of student well-being prediction.

4.6.3. PERMA-aligned attention mechanism interpretability cases.

To validate the theoretical alignment and interpretability of the five-head attention mechanism in the PERMA-GNN-Transformer model, this study conducts in-depth analysis of how the model achieves perfect integration of psychological theory with deep learning technology through attention weight visualization and consistency verification analysis. Fig 7 shows the weight allocation patterns of five attention heads across 25 features, while Fig 8 further validates the internal consistency and prediction reliability of PERMA theory-driven design.

thumbnail
Fig 8. PERMA prediction consistency and theoretical validation.

https://doi.org/10.1371/journal.pone.0338693.g008

The attention weight heatmap in Fig 7 clearly demonstrates the perfect correspondence between the five-head attention mechanism and PERMA theory. Each attention head specifically handles information processing for one PERMA dimension: Head 1 (P) shows extremely high attention weights (0.77–0.89) for positive emotion-related features (such as emotion scores, happiness levels, optimism levels), while weights for other dimensional features are significantly lower (0.10–0.30). Head 2 (E) focuses on engagement-related features, allocating main attention weights (0.62–0.75) to learning time, classroom participation, learning motivation, and other features. Head 3 (R), Head 4 (M), and Head 5 (A) respectively show similar specialized attention patterns in relationships, meaning perception, and achievement attainment dimensions. The five dimensional regions clearly separated by red boundary lines further confirm the theoretical structured design of the attention mechanism, with deep blue regions of each head precisely corresponding to their responsible PERMA dimensions, validating the effectiveness and interpretability of theory-driven architectural design.

The comprehensive analysis in Fig 8 validates the high consistency between model prediction results and PERMA theoretical assumptions. The scatter plot in the upper left corner shows that high-consistency models’ PERMA average predictions exhibit nearly perfect linear relationships with overall well-being predictions, with data points closely distributed around the ideal diagonal line, proving that the model can ensure theoretical consistency between overall well-being and PERMA five-dimensional predictions. In contrast, data points of medium and low-consistency models significantly deviate from the ideal line, reflecting that models lacking theoretical guidance tend to produce internally contradictory prediction results. The box plot in the upper right corner further quantifies consistency differences: high-consistency models have error medians close to 0.05, while low-consistency models have error medians exceeding 0.15, with significantly larger error ranges. The PERMA dimension correlation matrix in the lower left corner reveals the inherent logic of psychological theory, with moderate positive correlations (0.20–0.22) among dimensions, reflecting both the collaborative effects of dimensions in PERMA theory and maintaining their respective independence, conforming to positive psychology theoretical expectations.

The attention-PERMA alignment matrix in the lower right corner of Fig 8 provides quantitative evidence of model theoretical consistency. High values on the diagonal (0.85–0.94) confirm strong alignment relationships between each attention head and corresponding PERMA dimensions, while low values of off-diagonal elements (0.12–0.20) prove the specialization and selectivity of attention mechanisms. Particularly, Head 4’s alignment with the Meaning dimension reaches 0.94, and Head 3’s alignment with the Relationships dimension is 0.88, reflecting the model’s precision in complex psychological construct modeling. This alignment pattern not only validates the theoretical rationality of five-head attention design but more importantly ensures model interpretability: when the model gives high scores for a student’s meaning dimension, it can be clearly known that this result mainly comes from Head 4’s high-weight attention to relevant features, providing clear decision-making basis for educators. This case fully proves that PERMA-aligned attention mechanisms not only achieve deep integration of psychological theory with deep learning technology but also provide reliable interpretability guarantees for theory-driven artificial intelligence models, ensuring the psychological significance and practical application value of student well-being prediction results.

5. Discussion

5.1. Model performance and innovation analysis

The PERMA-GNN-Transformer model demonstrates significant performance advantages in student well-being prediction tasks, which stems from the synergistic effects of three key innovations. First, the theory-driven feature representation learning mechanism achieves effective transformation from raw educational data to psychologically meaningful representations. Although traditional deep learning methods possess powerful nonlinear fitting capabilities, they lack explicit modeling of psychological theoretical structures, resulting in learned feature representations lacking psychological interpretability. This study, through PERMA theory-guided feature embedding, integrates the five psychological dimensions of positive emotions, engagement, relationships, meaning, and achievement into the model architecture, enabling the feature learning process to possess clear psychological significance and significantly enhancing the model’s understanding capability of complex constructs in student well-being.

The innovation of the multi-topology graph neural network architecture is manifested in comprehensive modeling of the complexity of student relationships. Previous research mostly adopted single similarity measures to construct student networks, ignoring the multidimensional and heterogeneous characteristics of inter-student relationships. This study constructs four graph topology structures (cosine similarity, Euclidean distance, learning styles, PERMA weighting) that capture potential associations among students from different perspectives, and achieves personalized relationship weight allocation through graph-level attention fusion mechanisms. Experimental results show that multi-topology design brings 7.9% performance improvement compared to single graph structures, validating the effectiveness of multidimensional modeling of student relationships. Particularly, the PERMA-weighted graph receives 40% attention weight among social-oriented students, fully proving the unique value of psychology theory-driven relationship modeling in specific student populations.

The PERMA-aligned five-head attention mechanism achieves perfect integration of psychological theory with deep learning technology. Although traditional multi-head attention mechanisms can capture different aspects of input sequences, they lack clear semantic correspondence relationships and find it difficult to ensure theoretical consistency of model predictions. This study’s designed five-head attention mechanism ensures specialized processing of positive emotions, engagement, relationships, meaning, and achievement dimensions through direct correspondence with PERMA five dimensions. Attention weight visualization analysis shows that each attention head achieves alignment degrees of 0.85–0.94 with corresponding PERMA dimensions. This strong alignment relationship not only improves model prediction accuracy but more importantly provides clear psychological interpretation frameworks for model predictions, enhancing educators’ trust and operability of model outputs.

5.2. Theoretical contributions and practical value

The core theoretical contribution of this study lies in constructing a psychology theory-driven artificial intelligence model design paradigm, providing new methodological frameworks for interdisciplinary research. Traditional educational data mining research mostly adopts purely data-driven methods, which may achieve certain results in prediction accuracy but often lack theoretical foundation and interpretability, making it difficult to provide meaningful guidance for educational practice. This study achieves deep integration of theoretical knowledge with data-driven methods by embedding structured knowledge from PERMA positive psychology theory into deep learning models. This integration not only improves model prediction performance but more importantly ensures consistency between model outputs and psychological theory, providing scientific theoretical support for AI-based educational decision-making.

Cross-cultural adaptability is another important theoretical contribution of this study. Existing student well-being prediction models are mostly developed based on single cultural backgrounds and often face performance degradation challenges in cross-cultural applications. This study validates the cross-cultural universality of PERMA theory and the model’s cultural adaptation capability through experimental validation on Western cultural background Lifestyle and Wellbeing Data and East Asian cultural background International Student Mental Health Dataset. Particularly noteworthy is that the model shows stronger performance advantages on small-scale datasets (relative improvement 52.6% vs 42.8%), indicating that theory-driven methods have unique application value in data-scarce cultural environments, providing effective technical solutions for student mental health monitoring in globalized educational backgrounds.

From practical application perspectives, this study provides systematic student well-being assessment and intervention tools for educators. PERMA five-dimensional decomposition prediction enables educators to precisely identify students’ specific conditions in positive emotions, engagement, relationships, meaning, and achievement aspects, providing scientific basis for personalized interventions. Case analysis shows that PERMA dimensional patterns of different types of students exhibit significant differences: high well-being students show balanced development, moderate well-being students present specific dimensional shortcomings, and low well-being students require comprehensive interventions. This fine-grained assessment results can help educators develop targeted support strategies, improving the precision and effectiveness of intervention effects.

The model’s real-time prediction capability and high interpretability further enhance its practical application value. Through student relationship information captured by multi-topology graph neural networks and dimensional weights provided by PERMA-aligned attention mechanisms, educators can not only obtain accurate well-being prediction results but also understand the psychological basis and key influencing factors of predictions. This transparency is crucial for building educators’ trust in artificial intelligence systems and lays the foundation for widespread AI applications in education. Additionally, the model’s cross-cultural stability enables it to be deployed in international educational environments, providing unified technical standards for global educational institutions’ student mental health management.

5.3. Limitations and future research directions

Although the PERMA-GNN-Transformer model achieves significant results in student well-being prediction, several limitations warrant acknowledgment and future improvement. Data-level limitations are manifested in constraints of sample size and feature dimensions. While this study validates the model on two datasets from different cultural backgrounds, the data volume remains insufficient relative to the parameter scale of deep learning models, particularly the small-scale dataset (n = 268) which may affect model generalization capability. Additionally, existing features rely primarily on survey questionnaires and self-reported data, lacking integration of objective indicators such as behavioral logs from learning management systems, physiological data from wearable devices, or emotional expressions from social media platforms. Methodologically, although PERMA theory provides psychological interpretation frameworks, the complex interactions of multi-topology graph neural networks and multi-head attention mechanisms retain some “black box” characteristics, particularly the dynamic weight allocation mechanism of graph-level attention fusion which lacks sufficient theoretical explanation. The model’s balance between complexity and interpretability requires further optimization to enhance trust and operability for educational practitioners.

A critical limitation of the current research is the static prediction paradigm that captures student well-being at single time points without modeling temporal dynamics of psychological states. Student mental health naturally evolves over academic semesters, influenced by exam cycles, social network changes, course workloads, and life events. The PERMA framework itself recognizes that well-being dimensions fluctuate over time—positive emotions may vary daily, engagement shifts with academic tasks, relationships develop gradually, while meaning and achievement typically change more slowly over months. However, our architecture lacks explicit modeling of these temporal patterns and dimension-specific temporal characteristics. This limitation restricts the model’s application value in early warning and longitudinal intervention scenarios. Educators need not only to assess students’ current well-being levels but also to predict future trajectories for implementing proactive support measures. The inability to forecast declining trends or identify critical transition periods before crises occur represents a significant gap between current capabilities and practical educational needs. Furthermore, static predictions cannot capture the cumulative effects of sustained stress, the recovery patterns following interventions, or the seasonal variations in student mental health that educational institutions commonly observe.

Addressing the temporal modeling limitation represents the most promising direction for future research. We propose five specific technical extensions to incorporate time-aware mechanisms into the PERMA-GNN-Transformer framework. First, Temporal Graph Neural Networks should be developed to capture the evolution of student relationship networks, modeling how social connections form, strengthen, weaken, or dissolve across semesters through temporal graph convolution operations or dynamic graph attention mechanisms. Second, PERMA dimension-specific temporal attention mechanisms should be designed to account for different temporal characteristics, with each of the five attention heads learning appropriate time windows—shorter horizons for positive emotions that fluctuate rapidly, longer horizons for meaning and achievement that change gradually. Third, longitudinal trajectory prediction capabilities should be integrated by combining recurrent architectures (LSTM, GRU) or temporal Transformers to forecast future well-being trends based on historical patterns, enabling early warning systems to identify high-risk students weeks before crises materialize. Fourth, adaptive temporal graph structure learning should allow relationship network topology to adjust automatically according to semester phases, academic calendars, and campus events, dynamically reallocating importance across the four graph topologies based on temporal context. Fifth, PERMA temporal smoothness constraints grounded in psychological theory should be incorporated as regularization terms in the loss function, ensuring predicted time series exhibit realistic rates of change—constraining rapid fluctuations in typically stable dimensions while allowing appropriate variability in emotion-related measures. These temporal extensions would transform the system from diagnostic assessment to predictive intervention support.

Beyond temporal modeling, future research should pursue several complementary directions. Multimodal data integration represents a key opportunity, combining self-reported surveys with objective behavioral patterns, physiological signals, and natural language expressions to construct holistic student state representations that reduce reliance on subjective assessments. Lightweight model architectures warrant exploration to reduce computational complexity while maintaining prediction accuracy, facilitating deployment on resource-constrained educational platforms and enabling real-time monitoring at scale. Causal inference mechanisms should be incorporated to identify not merely correlations but causal pathways linking specific factors to well-being outcomes, providing actionable insights for intervention design—for instance, determining whether increased social activities causally improve relationships dimension or whether high achievement causally enhances meaning perception. Cross-cultural research requires deepening through validation on datasets spanning diverse geographical regions, educational systems, and cultural contexts beyond the current Western-East Asian comparison, investigating how cultural factors moderate the relationships among PERMA dimensions and identifying culture-specific intervention strategies. Application scope should expand to broader educational psychology scenarios including learning motivation prediction, academic burnout identification, teacher-student relationship assessment, and career development counseling. The model’s effectiveness across different educational stages (primary, secondary, tertiary) and in specialized contexts (special education, distance learning, vocational training) merits systematic investigation. Finally, integrating the model into comprehensive student support ecosystems—linking predictions with counseling services, academic advising, peer support programs, and family communication channels—would maximize practical impact, transforming technical capabilities into tangible improvements in student welfare and educational outcomes.

6. Conclusion

This study addresses the problems of existing student well-being prediction methods lacking psychological theory guidance and relationship modeling capabilities, successfully constructs the PERMA-GNN-Transformer model and validates its effectiveness. Experimental results show that compared to the optimal baseline method King et al. (2024), this study’s model achieves MAE improvement from 0.201 to 0.163 on large-scale datasets (18.9% improvement), and from 0.205 to 0.148 on small-scale datasets (27.8% improvement). PCE comprehensive indicators reach 0.792 and 0.798 respectively, PDA dimensional prediction accuracy reaches 0.841 and 0.823, all passing statistical significance tests at p < 0.01. Ablation experiments confirm that PERMA feature embedding contributes 23.0% performance improvement, multi-topology graph neural networks contribute 13.9%, and attention mechanisms contribute 14.0% on small datasets, validating the effectiveness and necessity of each component.

From theoretical perspectives, this study achieves the first structured integration of positive psychology PERMA theory with deep learning architecture, constructing a theory-driven AI model design paradigm. The alignment degree between five-head attention mechanisms and PERMA dimensions reaches 0.85–0.94, ensuring psychological consistency of model predictions, with PCI consistency indicators reaching 0.798. Multi-topology graph neural networks effectively capture complex social dynamics of student groups through four relationship modeling strategies, with personalized allocation of graph-level attention weights validating the technical implementation path of individualized teaching. Cross-cultural validation results prove the universality of PERMA theory, with models showing stronger generalization advantages in data-scarce environments, providing theoretical and technical foundations for intelligent student services in globalized educational backgrounds.

Comprehensive evaluation shows that this study achieves breakthrough progress in three dimensions: technological innovation, theoretical contribution, and practical application, but still has limitations such as single data sources and insufficient dynamic modeling. Future research should focus on multimodal data fusion, temporal dynamic modeling, and lightweight deployment directions to further improve model practicality and applicability. The interdisciplinary fusion paradigm established by this study provides important methodological references for the field of educational artificial intelligence, and the proposed PERMA comprehensive evaluation framework establishes new evaluation standards for theory-driven educational AI models. Research achievements will provide important support for constructing more precise, interpretable, and humanized intelligent educational systems.

Supporting information

S1 Appendix. Complete hyperparameter configuration.

https://doi.org/10.1371/journal.pone.0338693.s001

(DOCX)

References

  1. 1. Junaidi S. Artificial intelligence in EFL context: rising students’ speaking performance with Lyra virtual assistance. Int J Adv Sci Technol Rehabil. 2020;29:6735–41.
  2. 2. Wilson LM, et al. Mental health and wellbeing of international students in Australia: a systematic review. J Ment Health. 2024:1–14.
  3. 3. Ahmed S. Student performance prediction using machine learning algorithms. Appl Comput Intell Soft Comput. 2024;2024.
  4. 4. Angeioplastis A, Aliprantis J, Konstantakis M, Tsimpiris A. Predicting student performance and enhancing learning outcomes: a data-driven approach using educational data mining techniques. Computers. 2025;14(3):83.
  5. 5. King RB, Wang Y, Fu L, Leung SO. Identifying the top predictors of student well-being across cultures using machine learning and conventional statistics. Sci Rep. 2024;14(1):8376. pmid:38600124
  6. 6. Rahman HA, et al. Machine learning-based prediction of mental well-being using health behavior data from university students. Healthcare. 2023;10(5):575.
  7. 7. Chen X, Xie H, Zou D, Hwang G-J. Application and theory gaps during the rise of artificial intelligence in education. Comput Educ: Artif Intell. 2020;1:100002.
  8. 8. Chen L, Chen P, Lin Z. Artificial intelligence in education: a review. IEEE Access. 2020;8:75264–78.
  9. 9. Sun J, et al. Prerequisite-enhanced category-aware graph neural networks for course recommendation. ACM Trans Knowl Discov Data. 2024.
  10. 10. Li M, Wang X, Wang Y, Chen Y, Chen Y. Study-GNN: a novel pipeline for student performance prediction based on multi-topology graph neural networks. Sustainability. 2022;14(13):7965.
  11. 11. Bethel A, et al. Cross-cultural transition and psychological adaptation of international students: the mediating role of host national connectedness. Front Educ. 2020;5.
  12. 12. Yağcı M. Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learn Environ. 2022;9.
  13. 13. Oparina E, Kaiser C, Gentile N, Tkatchenko A, Clark AE, De Neve J-E, et al. Machine learning in the prediction of human wellbeing. Sci Rep. 2025;15(1):1632. pmid:39794488
  14. 14. Li X. Research on intelligent analysis algorithm for student learning behavior based on multi-source sensor data fusion. Int J Hi Spe Ele Syst. 2025.
  15. 15. Mohd Amin MR, Ismail I, Sivakumaran VM. Revolutionizing education with artificial intelligence (AI)? Challenges, and implications for open and distance learning (ODL). Soc Sc Human Open. 2025;11:101308.
  16. 16. Wilson LM, et al. Mental health and wellbeing of international students in Australia: a systematic review. J Ment Health. 2024;1–14.
  17. 17. Govorova E, Benítez I, Muñiz J. How schools affect student well-being: a cross-cultural approach in 35 OECD countries. Front Psychol. 2020;11:431. pmid:32269538
  18. 18. Wu L, et al. Graph neural networks: foundation, frontiers and applications. In: Proc. 28th ACM SIGKDD Conf. Knowl. Discov. Data Mining. 2022. pp. 4840–1.
  19. 19. Kumar VS, et al. Natural Language Processing using Graph Neural Network for Text Classification. In: 2022 Int. Conf. Knowl. Eng. Commun. Syst. 2022. pp. 1–5.
  20. 20. Karimi H, et al. Online academic course performance prediction using relational graph convolutional neural network. In: Proc. 13th Int. Conf. Educ. Data Mining. 2020. pp. 444–50.
  21. 21. Zhou Z, et al. A review of graph neural networks: concepts, architectures, techniques, challenges, datasets, applications, and future directions. J Big Data. 2024;11.
  22. 22. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw. 2009;20(1):61–80. pmid:19068426
  23. 23. Seligman MEP. Flourish: A Visionary New Understanding of Happiness and Well-being. New York: Free Press; 2011.
  24. 24. Ahmad S, et al. Implementation of PERMA model into teaching and learning of generation Z. Int J Learn Teach Educ Res. 2023;22(9):423–41.
  25. 25. Zhang S, Fan R, Liu Y, Chen S, Liu Q, Zeng W. Applications of transformer-based language models in bioinformatics: a survey. Bioinform Adv. 2023;3(1):vbad001. pmid:36845200
  26. 26. Woodward K, Kanjo E, Brown DJ, McGinnity TM, Inkster B, Macintyre DJ, et al. Beyond mobile apps: a survey of technologies for mental well-being. IEEE Trans Affective Comput. 2022;13(3):1216–35.
  27. 27. Dale H, Brassington L, King K. The impact of healthy lifestyle interventions on mental health and wellbeing: a systematic review. Mental Health Rev J. 2014;19(1):1–26.
  28. 28. King RB, Wang Y, Fu L, Leung SO. Identifying the top predictors of student well-being across cultures using machine learning and conventional statistics. Sci Rep. 2024;14(1):8376. pmid:38600124
  29. 29. Klimova B, Pikhart M. Exploring the effects of artificial intelligence on student and academic well-being in higher education: a mini-review. Front Psychol. 2025;16:1498132. pmid:39963679