Federated TriNet-AQ: Explainable english proficiency classification in augmented and virtual reality learning

Chunxiao Zhang; Zhiyan Liu

doi:10.1371/journal.pone.0329304

Abstract

AR/VR and other immersive technologies are creating dynamic, learner-centred, and engaging language-learning environments. In these ever-changing situations, judging someone’s language abilities is difficult. Managing multimodal learner inputs, understanding model predictions, and protecting user data across distributed systems are some of the most prominent challenges. This paper proposes TriNet-AQ, a federated, interpretable deep learning architecture for classifying English competency in AR/VR platforms. This technique addresses the difficulties raised. This work employs Quantum Sinusoidal Encoding (QSE), Triaxial Attention Fusion (TAF) for multimodal feature alignment, and Quantum Modulated Integration (QMI) to enhance context-aware learning by optimizing temporal representation. Hybrid Slime Gorilla Optimisation (HSGO) aids optimization. It accelerates convergence and improves performance and economy. TriNet-AQ provides decentralized training to many clients via federated learning, enhancing privacy and flexibility. TriNet-AQ outperforms classical, fuzzy, and hybrid baselines in real-world augmented and virtual reality instructional datasets. Its accuracy is 98.5%, AUC is 0.95, and EPES is 0.89. Even when it loses 3.5% accuracy on new data, it can generalize effectively. Another SHAP-based interpretability finding is the presence of obvious feature attributions and consistent relevance across users. Statistical analysis, including Cohen’s d = 0.89 (p < 0.001), confirms the model’s significance and reliability. TriNet-AQ provides robust, easy-to-understand, and private real-time, tailored language evaluation in next-generation immersive learning environments.

Citation: Zhang C, Liu Z (2026) Federated TriNet-AQ: Explainable english proficiency classification in augmented and virtual reality learning. PLoS One 21(1): e0329304. https://doi.org/10.1371/journal.pone.0329304

Editor: Aamna AlShehhi, Khalifa University, UNITED ARAB EMIRATES

Received: July 5, 2025; Accepted: December 16, 2025; Published: January 20, 2026

Copyright: © 2026 Zhang, Liu. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The third party data used in this study is publicly available at Zenodo at DOI https://doi.org/10.5281/zenodo.17480302.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Immersive technologies such as virtual reality (VR) and augmented reality (AR) are rapidly revolutionising online education, particularly language learning [1,2]. Augmented and virtual reality (AR/VR) technology enables students to engage with interactive, multimodal environments ideal for language acquisition. Compared to films and speeches that fail to strike an emotional chord, this is a vast improvement. Using tools such as spatial engagement, real-time feedback, and culture simulation, students participate in genuine communication experiences grounded in real-life situations [3,4]. People in immersive settings communicate with one another in a variety of methods, including writing and speech. In class, students communicate with one another via verbal responses, nonverbal cues, and facial expressions. In addition to demonstrating the students’ engagement and comprehension, using several modes of communication raises additional, well-documented assessment challenges. Conventional systems that use central databases and predetermined judgments often fall short. It tends to overlook minor patterns and may be difficult for specific individuals to understand complex material [5]. The privacy issue further adds complexity. Because AR and VR devices may capture biometric and contextual data, questions about ownership, consent, and ethical use have surfaced [6]. These concerns pertain to the use of these technologies. It becomes more challenging for students to maintain flexibility when this data is sent to central systems for processing. Furthermore, the system is more vulnerable to hacking. This highlights the significance of decentralised systems that can monitor children’s whereabouts without prying into their personal lives or requiring vital information from every device. According to [7,8], federated Learning (FL) may address the identified problems. FL allows devices and organizations train models independently and deliver encrypted changes to a common model. Unlike the conventional method of storing all data in one location. This keeps data private while allowing global collaboration. Schools assists create adaptable evaluation methods without disclosing student data. FL is difficult to integrate into immersive situations. Clear and usable learning outcomes need models that manage various data, operate with diverse learners, and are lightweight for daily devices [9,10].

There has been a recent shift toward using vocabulary lists and grammar tests as measures of language competency in AR and VR settings. Indicators of interest and comprehension in behavior are ignored. The issues with these methods make it difficult to provide students comprehensive and timely feedback [11]. An effective learning system monitors student progress and makes adjustments based on their results, such as changing the degree of difficulty of assignments or providing different types of feedback. To make the most of this adaptability, models should not limit themselves to only checking for correctness. It is able to discern trends across time, integrate data from diverse sources, and explain concepts to students and instructors. These environments need explainability for openness, trust, and educational alignment. The industry is now undergoing a shift toward decentralized, ethical, and environmentally focused learning methods to achieve these goals. These next-gen technologies prioritise transparency, adaptability, and student agency to pave the way for more fair and personalised education. As immersive technologies become increasingly common in education, assessment tools should be as responsive and human-centered as their settings. Summary of this work’s technical and domain-specific contributions to overcome these challenges:

A new unified framework, TriNet-AQ, is presented to tackle the difficulty of learning behavior modeling in immersive worlds. This design has the potential to enhance classification precision in AR/VR educational settings by leveraging quantum-modulated integration, triaxial attention fusion, and quantum sinusoidal encoding. Indicating the time, combining data from several sources, and enhancing learning in context are all possible with this approach. Following are the contributions of this work:

A FL system was developed to address the growing disparity in student devices and the proliferation of privacy concerns. Since each FL client is responsible for training the model independently under this decentralised method, raw data is not required. This ensures safe collaboration while also making a wide variety of user demands feasible.
Applying a unique hybrid approach that integrates slime mould and gorilla troop behaviour improves the optimisation process in federated systems. For dispersed, resource-constrained client systems, the metaheuristic optimiser expedites convergence and ensures consistent performance.
To help with educational understanding, the system includes an SHAP-based analytical layer. By identifying the most influential elements for each finding, this module provides up-to-date, comprehensive data on how students are being assessed. This facilitates a more transparent decision-making process and clarifies complex model behaviours.
Setting up a transparent and safe method for evaluating an individual’s real-time English competency is a crucial component of domain-specific VR/AR language instruction. The proposed method accomplishes the same objective. The ability to cater to specific needs is another advantage of immersive learning environments.

At its outset, the research examines the development of immersive language-learning technologies, namely the growing impact of VR and AR in modern classrooms. This work explains the main modelling challenges that arise in dynamic, multimodal settings. The article then describes the proposed TriNet-AQ design, including all its components and the rationale for its creation, based on these results. Afterwards, compare the framework’s performance with other baseline models and assess its accuracy and interpretability using SHAP-based explanations. Finally, the study offers practical recommendations for improving the framework’s implementation in real classrooms, based on the trial outcomes.

Related work

The use of hybrid learning approaches enhances the dynamic and engaging nature of learning. Research in [12] classified learners’ learning styles based on their actions using artificial intelligence. It can meet each student’s needs by creating individualised lesson plans. According to their findings, [13], web mining techniques made it easier to evaluate behaviour. Because of their centralised, static architectures, both models struggle to operate in AR/VR environments, where learning co-occurs across separate domains. To enable the use of decision trees to categorise learners, the authors enhanced the Felder-Silverman Learning Style Model (FSLSM), as mentioned in [14]. Applications in virtual and augmented reality that require real-time, multimodal input demonstrated that the method was inadequate. The reason for this was that the technique relied on a limited set of samples and data types. In [15,16], the hybrid approach combining fuzzy logic and neural networks had difficulties with large FSLSMs and high-dimensional inputs, such as gaze direction or the dynamic nature of movements. Fuzzy classification trees and Bayesian networks both agree that text-based online discussion groups and chat rooms are beneficial. Despite using verbal data rather than sensor-based signals, it employed immersive learning approaches and engaged a large number of pupils. While a fuzzy logic system has been developed to simplify model comprehension, its use in AR/VR has not yet been tested [17,18]. Despite improving prediction granularity using dialog-based information, the fuzzy tree technique in [19] only worked for unimodal situations. A more formal technique to organize behaviors, fuzzy C-means, matched behavioral data with FSLSM categories. However, its static and unsupervised nature made it tougher to adapt to learners’ shifting behaviors throughout immersion activities.

The adaptive e-learning systems mentioned in [19,20] use AI and knowledge-level modeling to personalize the contents given to each learner. The adaptive feedback in these methods came from quizzes and surveys, rather than VR or AR-enhanced instruction. Furthermore, it failed to react instantly to data collected by sensors. Automatically classifying learning styles in traditional LMSs was our goal using decision tree and Bayesian algorithms, similar to what was done in [21]. Even with perfect accuracy, these models are unable to provide federated learning or cross-device sharing techniques that protect user privacy. In order to predict learning style from visual data, the CNN-based model [22] assisted with feature extraction. In the field of visual analytics, this research was carried out. Symbolic thinking and explainability, which are necessary for better transparency in instruction, were missing from the method, even if it was sound. While the deep multi-target architecture mentioned in [23] improved prediction accuracy, its inability to provide understandable feedback rendered it ineffective for instructor-led teaching. Although [24] demonstrated the significance of emotional signals in LMS logs, it was unable to fulfill the interaction requirements of AR and VR environments.

A hybrid ensemble model that incorporates several classifiers for performance forecasting—including support vector machines (SVM), multi-layer perceptrons (MLP), and others—was shown to successfully anticipate student outcomes, as mentioned in [25] This system was not intended to comprehend dynamic behavior sequences. A hybrid architecture [26] addresses temporal aspects by modeling sequential patterns using attention-based RNNs and SVM. However, university ID card swipes did not accurately represent the complex, embodied interactions of immersive language learning. Game-based learning methods also motivated and engaged students. Research in [27] found that Quizlet-based gamified learning for TOEIC vocabulary acquisition increased student attitudes but lacked multimodal flexibility and real-time feedback. The [28] fuzzy logic recommender framework offered tailored learning support in programming, but was not adaptable to immersive language-based settings.

The potential of mobile platforms in promoting learner autonomy was demonstrated by the mobile VR-integrated learning system MGVR-ELS [29], which was evaluated using conventional pre- and post-testing, as well as ANOVA. It lacked explainability and federated learning aids. A multimodal fuzzy system in [30] successfully classified data using ANFIS and SWOT analysis, but it was limited to structured input data and lacked adaptation to real-world learner behaviors. In [31], a hybrid model incorporating Bi-LSTM, fuzzy AHP, and evolutionary algorithms was presented to dynamically adjust game complexity. While this solution offered learner-specific customisation, it failed to solve federated training and device-agnostic scalability, which are crucial to AR/VR-based educational systems.

While existing approaches as shown in Table 1 highlight the promise of explainable, federated frameworks for immersive learning, our technique achieves results equivalent to TriNet-AQ by integrating triaxial attention with quantum-inspired encoding within a coherent federated framework.

Download:

Table 1. Summary of related work in hybrid learning for AR/VR-based language education.

https://doi.org/10.1371/journal.pone.0329304.t001

Proposed system model

The paper introduces TriNet-AQ, a deep learning architecture designed to assess students’ English language skills within a VR- or AR-based course of study. This approach considers privacy concerns, model knowledge, data from various learning methods, and dynamic interactions. Combining visual, behavioral, and contextual data, Triaxial Attention Fusion (TAF) and Quantum Modulated Integration (QMI) enhance feature representation based on situational significance. Both Quantum QMI and QSE aim to enhance feature representation in general and capture temporal dynamics, respectively. This work aims to use the hybrid metaheuristic Slime-Gorilla Optimization (HSGO) to connect devices across different areas. With federated learning, clients can be trained without transmitting raw inputs, thereby safeguarding sensitive data. The use of a SHAP-based interpretability module to provide students with constructive criticism aids self-awareness and objective-setting training. The suggested framework is well-organised and has modules, as seen in Fig 1. Moreover, it provides a high-level summary of each area, formal mathematical equations, and an account of the methods’ applications. This method provides a better idea of how TriNet-AQ can offer privacy-protecting, easily understandable language tests.

Download:

Fig 1. Proposed framework for English language proficiency classification in AR/VR-based curricula.

https://doi.org/10.1371/journal.pone.0329304.g001

Data collection and preprocessing

The dataset used in this study was sourced from a publicly available Kaggle repository and generously provided by the Royal Brisbane and Women’s Hospital (RBWH) in Queensland, Australia, in collaboration with the Australian Digital Health Agency [32]. Session-level interaction records from a significant English language learning program using augmented and virtual reality platforms are anonymised. Using immersive instructional technology, a network of schools and institutions collected data over a 14-month period. Each student’s activity in the AR/VR module is discussed, and it records a wide range of actions about language competency and use. The dataset captures the intrinsic diversity across learners and devices and contains data from real deployment settings, making it suitable for federated learning. The data was processed in compliance with privacy rules, and the Queensland Health Ethics Committee approved its use and sharing. The creation of privacy-aware models in digital education is initiated using a dataset of 252,782 elements, as shown in Table 2.

Download:

Table 2. Overview of dataset features and their distribution characteristics.

https://doi.org/10.1371/journal.pone.0329304.t002

Preprocessing and feature representation using the NAAL framework

The preprocessing pipeline we provide is called NAAL, an acronym for “Normalisation then Aggregation then Adaptation Layer" [33]. The objective of this pipeline is to facilitate the development of scalable, ethical, and privacy-conscious distributed AR/VR education systems that model learner behaviour. The four most prevalent issues in federated educational contexts are the primary focus of this pipeline. There are several issues, including uneven data distribution across nodes, learners following a specific sequence, feature spaces that are too large, and proficiency labels that are not evenly distributed across classes. The four components that comprise NAAL are time-based biometric aggregation (FedNorm), adaptive resampling (ARBR), context-aware feature selection (CIWS), and federated normalization. All of these components work hand in hand. To ensure the model is as suitable, generalizable, and easy to understand across as many learning environments as possible, each step in preparing the learner data is critical. The many components of the NAAL Framework are described in Algorithm 1.

Algorithm 1 NAAL: Normalization–aggregation–adaptation layer for federated AR/VR educational data.

Require: Raw learner dataset from K federated clients

Ensure: Preprocessed feature tensor , label tensor

1: for each client k = 1 to K in parallel do

2: Step 1: Federated One-Hot Encoding

3: for each categorical attribute c in client k do

4: Transform c into one-hot vector

5: end for

6: Step 2: FedNorm (Federated Normalization)

7: for each numerical feature x_r in client k do

8: Compute local mean and std

9: end for

10: Securely aggregate global mean and std

11: for each x_r in client k do

12: Normalize:

13: end for

14: Step 3: Temporal-Biometric Aggregation

15: Construct session matrix

16: Compute temporal mean

17: Compute session variance

18: Concatenate and to learner profile

19: Step 4: Contextual Feature Selection (CIWS)

20: for each feature do

21: Estimate mutual info using local joint probabilities and context score

22: if then

23: Mark as retained

24: end if

25: end for

26: Step 5: Adaptive Resampling (ARBR)

27: for each sample i in client k do

28: Compute label rarity and informativeness

29: Assign weight

30: end for

31: end for

32: Step 6: Construct Final Tensors

33: Form X by combining selected, aggregated, and normalized features across all clients

34: Form multi-label outputs

35: return

Federated encoding and normalization (FedNorm).

Preprocessing begins by numerically encoding categorical data, such as the learner’s location, device type, or level of education. This becomes feasible with one-hot encoding. With this update, we can be confident that no hierarchical sorting or unexpected category treatment will occur. Elements of the encoded vector , where g represents the count of different categories, are defined element-wise when a learner u is considered:

(1)

This ensures that the schema is shown in all federated network institutions. Task duration, interaction speed, and evaluation results are all continuous variables that are normalized using the FedNorm method. Decentralized normalization is accomplished using this method. By safely collecting local client data to determine scaling parameters, FedNorm safeguards users’ privacy. The following is used to standardise a numerical attribute known as x_r:

(2)

The feature x_r’s federated mean and standard deviation is obtained by using client-side summaries rather than raw data. They are identified as and respectively. This guarantees both continuous expansion and the preservation of data sovereignty at each site.

Temporal-biometric aggregation for behavior modeling.

Over the course of many sessions, students’ actions while using AR and VR can vary. To capture the dynamics that change over time, NAAL can use a matrix , where h is the total number of sessions and p is a physiological or behavioral characteristic like head motions, gesture frequency, or gaze focus. Two possible statistical overviews derived from this matrix are:

(3)

A vector is used to monitor changes in people’s behaviour over time. In contrast, the differences between the sessions are represented by the vector . Discovering shifts in participation could be possible with this. These statistics, when combined with static data, aid in building a profile of the actions and mental processes associated with each individual learner.

Feature selection based on contextual importance (CIWS).

It is crucial to select only the most relevant and applicable qualities from the vast amount of data obtained. Contextual Importance-Based Weighted Selection is the selection protocol used by the National Association of American Leagues (NAAL). The weighted relevance score is generated by using the mutual information for each candidate feature and prediction target y.

(4)

In this equation, the quantity represents the collective likelihood of the feature value u and the label result v. The behavioural consistency score, generated via unsupervised clustering within each client, is likewise represented by the symbol . A feature is retained if its score is above a dynamic threshold denoted by the symbol .

(5)

The distribution of the calculated relevance scores determines the adaptive choice of the threshold τ, which is not a fixed number for each client. Typically, the threshold is not computed in this manner. For an extract threshold, this study sets τ to the average of all local feature relevance scores plus 1 standard deviation. In subsequent federated rounds, it is dynamically updated based on feature stability and regional variation. Adaptive formulation ensures fairness and sensitivity by catering to each client’s unique data demands. This eliminates irrelevant elements and keeps just those that are statistically significant.

Adaptive resampling to address imbalanced labels (ARBR).

ARBR is an NAAL module that aims to address class imbalance, particularly the underrepresentation of students with very high or low competence levels. This method adjusts the model’s response to training samples according to two parameters: the rarity of the label and the distinctiveness of the behavioral pattern. The sample weight for learner i is calculated as [33]:

(6)

Where denotes the informativeness parameter and is the frequency percentage of class z_i.

(7)

One learner’s feature value is represented by the variable and the average value for all learners is defined by the variable . These weights prioritise learner profiles that are not often seen but are more expressive during training when this weighting is in effect. The results are more equity and fairness. In an input tensor (, the encoded data is combined with the total number of features (p’) and the number of learners (N). This takes place after the changes are completed. A multi-output vector is arranged by the corresponding label structure:

(8)

Incorporating aggregate scores and subdomain-specific abilities, these outputs include a broad spectrum of linguistic competence. Modern, federated, immersive learning systems have specific data preprocessing needs, and the NAAL framework offers a robust, ethically acceptable solution. To achieve more equitable and comprehensible AI in education, its design guarantees a well-aligned, behaviorally rich, and learning-ready dataset.

Classification architecture: TriNet-AQ

This TriNet-AQ architecture, which combines adaptive attention fusion with quantum-mechanical feature encoding. Students’ performance in federated immersive AR/VR systems must be easily comprehensible and categorizable. TRINet-AQ is effective for learning systems that aim to protect users’ privacy in real-world scenarios. An all-encompassing strategy is necessary for the efficient management of interactions among high-dimensional learners. In the model shown, a QSE unit that modifies engagement signals is one of the four primary components. Three further parts are needed: a TAF module to separate biometric, contextual, and semantic streams; a QMI layer to uncover hidden interdependencies via phase-shift operations; and a plethora of task-specific, long-term output heads. The TriNet-AQ ensures private group workouts by connecting to edge computer devices. Its adaptable design makes it easy to include in curricula across schools in diverse regions without compromising on student engagement or identity. Fig 2 shows the component architecture, and Algorithm 2 provides a detailed description of the TriNet-AQ model’s operating flow.

Download:

Fig 2. Triaxial attention–quantum network component architecture.

https://doi.org/10.1371/journal.pone.0329304.g002

Quantum-inspired sinusoidal transformation (QST).

Students’ actions during immersive digital learning are often influenced by their circadian rhythms and cognitive cycles. Our solution successfully captures these complex dynamics through a sinusoidal transformation guided by quantum systems theory. After using the formula , we can get the processed feature vector for every learner. Here, the learner’s index is represented by r and the total number of features processed by NAAL is denoted by d. We create a quantum mapping with two frequencies for every scalar property [34,35].

(9)

for feature index s, where stands for a trainable frequency modulation parameter. This encoding projects real-valued qualities into a richer, periodic latent space, simulating oscillations in engagement and proficiency. By incorporating temporal features into a periodic latent space, quantum-inspired encoding improves the model’s capacity to generalise long-term behavioural data. Mathematically, represents the sinusoidal transformation . In contrast to static encodings, this transformation allows the network to detect oscillatory and cyclical patterns in learner interactions, including oscillations in engagement and attention cycles. Among these patterns are cycles of attention and oscillations of engagement. One possible use of the dual-frequency representation is to simulate phase interference in quantum systems. The phase and magnitude information are preserved in this formulation across time. In immersive AR/VR learning situations, this method allows the model to preserve temporal continuity between sessions. In such cases, looking at sequential behavioural data becomes more stable and simpler to generalise.

Algorithm 2 TriNet-AQ: Triaxial attention–quantum classification for federated learning.

Require: Preprocessed learner feature matrix , corresponding label matrix , learning rate , number of federated clients C, and the set of learning tasks

Ensure: Predicted label set across all tasks

1: Model Initialization: Randomly initialize shared model parameters , and task-specific layers for each classification task

2: for each communication round t = 1 to T do

3: for each client j = 1 to C in parallel do

4: Sample the local learner dataset without transmitting raw data

5: for each learner instance k in do

6: Quantum Sinusoidal Encoding (QSE):

7: Transform each input using periodic quantum embedding:

8: Triaxial Attention Fusion (TAF):

9: for each modality do

10: Compute multi-head attention over subspace q:

11: end for

12: Concatenate triaxial representations:

13: Quantum Modulated Fusion (QMF):

14: Apply quantum phase transformation to enhance intermodal expressivity:

15: Task-Specific Predictions:

16: for each learning task do

17: Predict outcome via fully connected classifier:

18: Accumulate multi-task cross-entropy loss:

19: end for

20: end for

21: Local Update: Update model parameters using local gradients:

22: Send encrypted local model update to the federated server

23: end for

24: Federated Aggregation: Aggregate updates from all clients:

25: end for

26: return Final predicted outcomes for all tasks and all learners

Triaxial attention consolidation (TAC).

Augmented and virtual reality educational datasets provide multimodal information. This might include biometric patterns, contextual usage, and semantic interaction sequences. TriNet-AQ addresses this issue by introducing a three-stream attention module that operates in parallel across different representation spaces. Streams receive a matrix , where denotes modality type, t temporal segments, and h projected dimension. Modality-dependent self-attention mechanism definition:

(10)

The terms “query," “key," and “value" are denoted by the tensors in that order. Each stream models an example of an interdependency within its domain; for instance, the physiology of eye movement varies, the environment varies according to location and device, and the job varies according to question and answer entropy. The concatenation method merges the three attention outputs:

(11)

An dimensional model that accounts for correlations inside and across streams is the final product. This model is represented by the symbol . Using this approach, contextualization and preservation of learner-specific features are guaranteed throughout the fusion process.

Quantum-phase feature blending (QPFB).

An innovative modulation layer that leverages quantum phase rotations is used to enhance the attention-fused vector . The structure is shown with more depth by adding this layer. This layer’s objective is to simulate non-linear phase interferences and amplitude variations, which are techniques that might detect subtle shifts in thought and behavior. This operation can be characterzie by [36]:

(12)

The function represents a differentiable phase generator function when performing element-wise operations, while the notation indicates Hadamard multiplication. Simulating amplitude-phase couplings, which are analogous to superimposing quantum states, becomes possible with this crucial step. For the purpose of discovering higher-order correlations between attributes, this implies that no more engineering work is required.

Multi-objective predictive decoding.

As the unified embedding for the subsequent phase of the classification process, the vector is the result of the procedure. To set up a multitask learning environment, we do it as follows: . This accomplishes all of this while keeping an eye on the state of education as a whole. Each job is handled by a certain classification head :

(13)

where and represent learnable weights and biases, and is the class count for each educational task. These heads operate concurrently, enabling the model to reason across multiple dimensions of learning feedback and offer more personalized and actionable insights per learner.

The cumulative optimization objective is the aggregated categorical cross-entropy loss across all N learners and all tasks:

(14)

where denotes the actual label and the predicted vector for task and learner r.

Federated coordination with gradient preservation.

The goal of implementing TriNet-AQ into a federated learning system is to ensure that all student information is handled in an ethical manner. Every client node uses its own private dataset and environment to execute computations, such as data loading, gradient calculation, and model updating, due to the independence of these nodes. The client does not transmit raw data to the central coordinator for combining. Instead, the central coordinator processes only encrypted model parameters or gradient changes. Localisation safeguards clients’ privacy and institutional independence while allowing them to study together.

(15)

where is the learning rate, and denotes the local training loss. To aggregate these decentralized updates securely, the central coordinator performs a weighted average across clients:

(16)

In the equation , where q_m stands for the number of local samples, the total sample size is represented. We employ encrypted communication and aggregation to comply with institutional privacy rules and improve cross-client generalisability. TriNet-AQ offers robust multitask educational profiling in AR/VR settings. Modular attention fusion, sinusoidal quantum transformations, and federated optimisation solve this problem. The triaxial architecture of quantum modulation enables precise feature curation across behavioral modalities and expressive characterization of learner variability. Organisations provide both characteristics. The most crucial aspect of its federated alignment is that artificial intelligence-led educational innovation will remain ethical, egalitarian, and decentralised.

Parameter tuning via HSGO: Hybrid slime–gorilla optimization

Our goal in implementing the HSGO hybrid optimisation approach was to make the TriNet-AQ model as stable and flexible as possible when used in distributed education situations. This approach combines swarm intelligence, inspired by the behavior of gorillas [37], with adaptive learning, based on the behavior of slime moulds [38]. Important hyperparameters in federated settings include learning rate, phase offset, attention breadth, and regularization intensity, all of which our hybrid optimizer is designed to tune. We manage to do all this while ensuring efficient convergence and protecting user privacy. The first stage of optimising is to initialise a set of possible configurations. The parameters are fully specified for each contender, denoted as . To ensure that these configurations are distributed at random within the allowed range, the following procedures are employed:

(17)

The lower and upper bounds for the n-th hyperparameter are represented by and , respectively, and is a scalar that is selected at random. Training TriNet-AQ locally at several federated clients allows us to analyse each setup. Next, a weighted combination of predicted accuracy and F1-score across all tasks is used to quantify the efficacy of the m-th candidate:

(18)

The symbols and indicate the user-defined preference weights, average accuracy, and macro F1-score of candidate m over all nodes that are involved, respectively. For each applicant, the most effective method of exploring the parameter space is to combine the impact of the most successful peer with that of a randomly chosen member of the population:

(19)

Here, , denotes the current best configuration, and is a randomly sampled peer.

Once the algorithm identifies strong-performing regions, the refinement phase begins. Inspired by slime mould dynamics, this step fine-tunes the candidates using perturbation guided by fitness-ranked weighting. The refined position is given by:

(20)

(21)

Within the range (–1,1), the adaptive impact factor is constrained using the hyperbolic tangent function . Convergence should occur slowly and without sudden changes to the parameters; this is the aim of this function. In order for HSGO to maintain its equilibrium and power over the federated optimisation cycles, this restricted activation stabilises updates during the exploration and exploitation phases.

The algorithm continues this cycle of exploration and exploitation until convergence is observed, defined as the change in best fitness value falling below a minimum threshold :

(22)

When the optimization concludes, the best parameter vector is disseminated to all clients for consistent deployment in final training.

Algorithm 3 HSGO: Hybrid slime–gorilla optimization for tuning TriNet-AQ parameters.

Require: Total candidate pool size M, maximum iterations , parameter bounds , participating clients

Ensure: Optimized hyperparameter vector

1: Initialize M random hyperparameter vectors

2: for iteration index τ = 1 to do

3: for each candidate m = 1 to M do

4: Share vector with all clients in

5: Train the TriNet-AQ model locally using

6: Compute model quality score as a weighted sum of local accuracy and macro F1-score

7: end for

8: Identify the best configuration and the worst

9: for each vector m = 1 to M do

10: if then

11: Select a peer configuration vector randomly from the population

12: Update parameters using:

13:

14: where are sampled from uniform distribution U(–1,1)

15: else

16: Select two distinct vectors and

17: Compute average value across population

18: Calculate influence factor:

19:

20: Update parameters with refinement:

21:

22: end if

23: end for

24: Check convergence: if change in is less than threshold γ, terminate early

25: end for

26: return Best solution found:

HSGO’s stages are shown in Algorithm 3, available here. At its core, HSGO is a framework for simultaneous strategy optimization. In this way, it provide fast communication while intelligently exploring and improving TriNet-AQ’s tuning space, all while satisfying privacy needs. Because of its capacity to strike a balance between broad search and specialisedspecialized adaptation, it performs well in varied, scattered, and constantly changing learning contexts.

Performance evaluation and metrics

Commonly used evaluation measures that show the model’s performance on classification tasks are used to determine the TriNet-AQ model’s effectiveness in a federated learning environment. Some of these metrics shed light on different aspects of predictive quality, such as accuracy, recall, F1-score, and precision. Represented as and correspondingly, are correctly identified as positive and negative samples. Conversely, false positives are denoted by the symbol and false negatives by . The accompanying metrics are defined as follows, according to [38]:

(23)

(24)

These metrics help evaluate how well the model distinguishes between classes, balances false alarms and missed detections, and maintains overall correctness. In the context of federated learning, where multiple clients train the model on decentralized data, we also compute global metrics. Depending on the size of each customer’s local dataset, the average their metric scores and accomplish the following:

(25)

The sum of all clients is represented by the variable n_j, and client j samples are denoted by the variable N.We have developed a novel assessment tool, the Federated Fairness Score (FFS), to thoroughly investigate the consistency and fairness of all participants.According to this definition, this score:

(26)

The average F1-score for each customer is represented by the symbol , the standard deviation is symbolized by the symbol , and a tiny constant that ensures stability is represented by the symbol . If the FFS is high, then the model is applicable to all users, notwithstanding data inconsistencies. This makes it possible for anyone to pick up the method, regardless of how consistent the data is. When used together, these procedures demonstrate that, when given data from federated setups, the model is accurate and equitable.

Experimental setup and results

This section provides the key findings and the methodology used to assess the TriNet-AQ framework in a completely immersive AR/VR language-learning setting. The study included 48,000 sessions of anonymous student-teacher communication. Indicators of multimodal engagement and the activity’s environment were included in the time-series recordings of participants’ actions during these sessions. None of the three federated customers, all academic institutions, met the criteria for independent and identically distributed data. As a result, we were able to build a simulation that accurately captures the dispersed and unique nature of real-world data. As a preprocessing step, we used min-max scaling and QSE temporal encoding. The TriNet-AQ model used triaxial attention layers, 128-dimensional latent features, and a dual-layer Quantum Modulated Integration (QMI) unit to enhance contextual learning. Using a population size of 35 local update steps per client, the Hybrid Slime-Gorilla Optimisation (HSGO) approach was used once the global aggregation operation was finished. A very powerful workstation equipped with an Intel Xeon Silver CPU, 32 GB of RAM, and NVIDIA RTX A4060 GPUs was used to execute the simulations. It used federated orchestration with the Flower framework and PyTorch 2.1.1. For this purpose, SHAP analysis is used in conjunction with both global and client-specific performance metrics to gauge the framework’s usability. These results will be shown and compared to baseline models in the sections that follow.

The top 20 features are shown in Fig 3 according to their mean SHAP values. This will provide insight into how the TriNet-AQ model handles input from various learning modes. Behavioural and physiological traits, such as eye blink rate, gesture frequency, and gaze direction variance, are crucial for language learning in immersive AR and VR environments. Verbal and nonverbal clues can reveal students’ levels of interest and ability. Teachers are expected to interpret, clarify, and respond to the model outcomes in a privacy-preserving federated learning environment. When employing AI to assess trust, interpretability is crucial. The most relevant features for determining competency levels, according to the SHAP research, were linguistic coherence levels, semantic alignment, and pronunciation fluency. Combining model explanations with conventional language evaluation criteria provides a better understanding of TriNet-AQ.

Download:

Fig 3. Top 20 high-impact features identified by SHAP analysis.

https://doi.org/10.1371/journal.pone.0329304.g003

The comparison of the proposed and baseline models is shown in Table 3. Several metrics are used to measure performance, including Accuracy, Precision, Recall, F1-Score, and the domain-specific EPES. Multiple industry-standard models have an accuracy rate below 75%. These include Bayesian Networks and Decision Trees. Bi-LSTM has an 85.4% success rate when combined with Genetic Algorithms and Fuzzy AHP. However, with a 98.5% accuracy, 0.95 area under the curve, and 0.89 EPES, TriNet-AQ is the superior choice. In immersive AR/VR environments, in particular, its federated design, quantum-inspired encoding, and attention-based fusion should aid in discovering subtle linkages across various forms of instructional material.

Download:

Table 3. Performance metrics comparison of proposed TriNet-AQ with existing methods.

https://doi.org/10.1371/journal.pone.0329304.t003

Table 4 displays the results of an ablation study that tested the functionality of each central module in the TriNet-AQ architecture. A rudimentary CNN model with static input achieves 84.3 per cent accuracy by adding each segment one at a time. The Quantum Sinusoidal Encoding (QSE) module immediately improves temporal sequence modelling and performance. Triaxial Attention Fusion (TAF) helps the system fuse numerous data kinds, increasing accuracy to over 92%. Model contextual knowledge and AUC improve with quantum-modulated integration (QMI). Finally, integrating Federated Learning (FL) enhances the architecture, resulting in 98.5% peak accuracy and 0.89 EPES. These modular It demonstrates how each addition enhances the system. It demonstrate TriNet-AQ’s flexibility, understandability, and efficacy in privacy-sensitive language-learning contexts.

Download:

Table 4. Ablation study of TriNet-AQ modules on language proficiency classification.

https://doi.org/10.1371/journal.pone.0329304.t004

The confusion matrix for CEFR-level categorization, with categories ranging from A1 to C2, is shown in Fig 4. As a result, we can see that the system is effective for evaluating individual languages. When there is a lot of activity along the diagonal, it indicates that the predictions are correct, particularly for the higher and medium level courses. Levels that are near to one another, such as B1 and B2, tend to have less misclassifications. This distribution is reflective of the fact that, in practice, educational learners do progress in tandem. Using triaxial attention and federated reasoning, the TriNet-AQ model can detect subtle linguistic changes. An impressive 98.9% accuracy percentage is boasted by it. Thanks to its use of context-aware training in various AR and VR contexts and multimodal input, the system is able to identify advanced learners, particularly those at levels C1 and C2.

Download:

Fig 4. Confusion matrix for CEFR-level classification.

https://doi.org/10.1371/journal.pone.0329304.g004

In Fig 5, the confusion matrix represents the three-tiered organisation of overall competency (Low, Intermediate, and High). The segregation of more generalised linguistic abilities is the primary focus of this matrix. Nearly half of the 556 assessment samples were correctly classified, with only a small degree of overlap, especially between the adjacent groups from the Intermediate group. This impact is expected to occur throughout the stages of learning transitions, when the boundaries between various levels of competence can become hazy. Regardless, the TriNet-AQ framework maintained a high degree of classification consistency, demonstrating its ability to generalise across simplified proficiency levels. These results show that the technology is used in federated AR/VR classrooms to provide transparent student profiles, real-time skill monitoring, and adaptive training while still protecting students’ privacy.

Download:

Fig 5. Confusion matrix for overall proficiency classification.

https://doi.org/10.1371/journal.pone.0329304.g005

Table 5 compares the computational needs and efficiency of classical, fuzzy logic-based, and deep learning models. The table illustrates the amount of trainable parameters (in millions), training duration, sample prediction time, and model size. Decision Trees and Bayesian Networks are resource-efficient due to their few parameters and fast inference. It have small learning space. Complex deep learning models like Bi-LSTM with fuzzy logic and genetic algorithms are more accurate but take 25 minutes and 5 million parameters to train. TriNet-AQ is a good compromise. It takes 12 minutes to learn, predicts each sample in milliseconds, and contains 2.4 million factors. Its 4.1 MB size makes it ideal for edge deployment. TriNet-AQ excels in real-time, federated language learning systems, particularly in immersive AR/VR environments that value speed and efficiency.

Download:

Table 5. Comparison of model complexity and efficiency.

https://doi.org/10.1371/journal.pone.0329304.t005

The suggested TriNet-AQ model is shown in Fig 6, which compares the actual and anticipated language competence scores. A thousand evaluation samples served as the basis for this comparison. The figure’s left pane shows the anticipated alignment. After convergence, in particular, there are a few minor discrepancies between the expected alignment and the actual results. The model’s response to minute changes in scores is shown in the right pane, with emphasis on the first 50 samples used in the study. The model’s ability to operate with various student types and its accuracy in estimating time-series scores are both shown by this two-dimensional graphic. The model’s accuracy makes it ideal for usage in federated, adaptive AR, and VR-based learning environments where real-time evaluation is required.

Download:

Fig 6. Actual vs predicted proficiency scores with zoomed segment.

https://doi.org/10.1371/journal.pone.0329304.g006

Using SHAP as the evaluation framework, Table 6 compares the effectiveness of different models in explaining predictions. These metrics include the top 10 features’ share of decision-making, the model’s Global Transparency Score, and the Average SHAP Impact, which illustrates how much each feature impacts the model’s output. Traditional models like decision trees and fuzzy logic systems have good interpretability, transparency scores of 0.53-0.64, and significant influence at their core. CNNs and Bi-LSTM hybrids have lower SHAP effect ratings and transparency. This makes their projections harder to understand. On the other side, TriNet-AQ is simple. It scores 0.77 on the transparency scale, derives 79.4% of its decision-making from its strongest features, and has the most significant average SHAP impact of 0.138. These statistics illustrate that explainability-driven design relies on organised attention layers and SHAP-informed insights. Finally, TriNet-AQ’s predictive performance and clear outputs make it ideal for open, trust-based teaching.

Download:

Table 6. Interpretability evaluation via SHAP-based metrics.

https://doi.org/10.1371/journal.pone.0329304.t006

The method by which the TriNet-AQ model differentiates between six separate skill levels (A1 to C2) is shown in Fig 7. This figure shows the use of multi-class ROC curves to aggregate CEFR categories. By examining the curves, one can see that the AUC values exceed 0.91 and approach 0.98 as the proximity increases. This indicates that the categorisation is highly accurate. That the model can pick up new tongues with little help from related categories is shown here. The results demonstrate that the proposed architecture accurately captures intricate behavioural patterns, particularly through its attention-driven and multimodal fusion techniques. Based on these results, we are optimistic that we can implement the concept in VR/AR language-learning environments that prioritise user privacy, are intuitive, and provide a comprehensive experience.

Download:

Fig 7. Multi-class ROC curve for CEFR-level prediction.

https://doi.org/10.1371/journal.pone.0329304.g007

Using the suggested TriNet-AQ architecture, Fig 8 compares the ROC curves of classical and hybrid models. One way to assess the model’s ability to distinguish between different skill levels is to calculate the area under the receiver operating characteristic (ROC) curve (AUC) for each skill level. Fuzzy, decision tree, convolutional neural network, and Bayesian network area under the curve (AUC) values range from 0.82 to 0.92. However, TriNet-AQ exceeds these standards, achieving an AUC of 0.97. To accurately and intelligibly classify language competency in immersive AR/VR environments, the model discovers complex, multimodal patterns. A mix of federated learning, triaxial attention fusion, and quantum-inspired encodings enables this.

Download:

Fig 8. ROC curve comparison of classical and proposed methods.

https://doi.org/10.1371/journal.pone.0329304.g008

The relationships among several hyperparameter optimisation approaches are shown in Fig 9. This research demonstrates that, after 50 reps, each method improves fitness levels. TriNet-AQ uses the Hybrid Slime-Gorilla Optimisation (HSGO) algorithm. Typically, this approach converges more rapidly, achieving the optimal setting and rapidly surpassing 0.81. Particle Swarm Optimisation (PSO), Genetic Algorithm (GA), and Simulated Annealing (SA) all exhibit slower, flatter growth curves, making them less apt to discover and adapt to novel environments. In federated, real-time language-learning settings, results show that HSGO can successfully traverse the optimisation environment and adjust model parameters to achieve good performance.

Download:

Fig 9. Hyperparameter optimization convergence comparison across algorithms.

https://doi.org/10.1371/journal.pone.0329304.g009

The effectiveness of various models in federated learning scenarios is seen in Table 7. Priorities include rapid convergence, client-side equity, privacy, and personalization. There is more client disparity and longer communication cycles required for traditional and deep learning models to reach stability. TriNet-AQ has the fewest privacy overhead, the quickest convergence in 37 rounds, and the best customisation score (0.72). Its fairness variation is also the lowest. On top of that, it’s the least customizable option. Based on these results, TriNet-AQ is able to provide effective, tailored, and equitable learning to customers across various regions.

Download:

Table 7. Federated performance and fairness evaluation across clients.

https://doi.org/10.1371/journal.pone.0329304.t007

The development of client fairness after multiple federated training cycles for different hybrid models is examined in depth in Table 8. All three clients had their coefficient of variation (CV) recorded four times, which is a fairness metric. While there has been improvement with each model, equity gaps remain. In contrast, TriNet-AQ consistently displays lower CV values. Starting at 0.112 in round 10, the values decreased to 0.073 in round 40. This consistent decline demonstrates that the approach is suitable for robust, equitable federated deployments, as it preserves client fairness throughout training.

Download:

Table 8. Client-wise fairness evaluation of hybrid models across rounds.

https://doi.org/10.1371/journal.pone.0329304.t008

Table 9 illustrates the distinctions between federated models and classical models. These traits are inferred from the information SHAP gathered. Conventional models, such as Convolutional Neural Networks (CNNs) and Decision Trees (DTs), would seem to be the best bet for comprehending attribution. Conversely, compared to conventional models, federated models provide more trustworthy results. With excellent consistency (0.84), impact (0.426), and clarity (92.1%), TriNet-AQ is the clear winner. Decentralized learning settings require students to be responsible and honest, which demonstrates that it makes accurate predictions and convey their concepts clearly. This research analyses feature transparency, SHAP relevance, and interpretative stability to evaluate the models’ interpretability further. The results of this investigation are shown in Table 10. Results from federated versions are consistently more trustworthy and easier to see than those from more conventional models, which only provide passable results. TriNet-AQ stands head and shoulders above the competition due to its reliability and ability to provide crucial attributes. Because it clarifies the rationale for adopting adaptive models, this resilience is significant in the context of immersive AR and VR language acquisition. Students’ educational confidence, confidence in others, and self-esteem are all positively affected. The performance of the TriNet-AQ model changes after 45 training epochs, as shown in Fig 10. Accuracy and loss are presented on the left and right Y-axes, respectively, for training and testing. Early epochs show fast model improvement, with training accuracy increasing from 70% to over 98.5% and testing accuracy following. The model converges at epoch 30, with a slight difference (0.2-1%) between training and testing measures, indicating good generalization and low overfitting. Testing loss increases little as training loss decreases. The dual-axis graphic shows the model’s stability, efficiency, and excellent convergence in federated learning.

Download:

Table 9. SHAP explainability comparison between classical and federated models.

https://doi.org/10.1371/journal.pone.0329304.t009

Download:

Table 10. Interpretability assessment of classical and federated learning models using SHAP-based metrics.

https://doi.org/10.1371/journal.pone.0329304.t010

Download:

Fig 10. Training and testing accuracy and loss over 45 epochs.

https://doi.org/10.1371/journal.pone.0329304.g010

By comparing each model’s performance with both known and unknown data, the information in Table 11 provides a solid evaluation of the models’ ability to generalise to new scenarios. Most conventional and hybrid models exhibit a discernible decline in accuracy of 5 to 8 percent when presented with invisible inputs. The proposed TriNet-AQ outperforms the existing research baseline models. The performance drops by just 3.5% while reaching a fantastic 94.5% on fresh data and a respectable 98.0% on previously examined data. A strong F1-score of 0.95 on previously unseen data further demonstrates its capability to manage unexpected, real-time AR/VR language-learning scenarios.

Download:

Table 11. Comparison of model accuracy on seen and unseen data.

https://doi.org/10.1371/journal.pone.0329304.t011

To fully understand the model’s performance, refer to the statistical analysis in Table 12. This evaluation uses a wide range of efficacy criteria. Measures such as Cohen’s d, Pearson’s correlation, p-values for Wilcoxon and t-tests, scores from analysis of variance (ANOVA), and confidence intervals is used to assess the robustness and reliability of each model. These metrics are used ot examine the models. Despite the fact that several models perform well, TriNet-AQ stands out from the crowd since it achieves the greatest correlation (0.92), the largest effect size (0.89), and the highest F-score (27.36). The fact that its p-values are continuously low and that its confidence interval is rather narrow ([97.8, 99.2]) substantiates the fact that its findings are not only better but also statistically significant and trustworthy.

Download:

Table 12. Statistical evaluation of model performance.

https://doi.org/10.1371/journal.pone.0329304.t012

An investigation into the impact of various model parameters on TriNet-AQ’s classification performance is shown in Fig 11. Quantum encoding depth, learning rate, batch size, federated round count, client participation rate, and attention techniques are all examined. The bars represent the accuracy of the findings as a single parameter is varied at a time. The model’s most important parameters are the learning rate, the number of federated rounds, and customer engagement. In other words, these are the aspects that matter most for performance. There is a significant decrease in both the adjusted depth of the temporal layer and the dropout rate. This figure displays the most essential hyperparameters for modifying federated language learning models that use AR and VR.

Download:

Fig 11. Sensitivity analysis of TriNet-AQ model parameters.

https://doi.org/10.1371/journal.pone.0329304.g011

Conclusion

TriNet-AQ is a federated, explainable deep learning system that assesses English language competency in real time across immersive augmented and virtual reality educational contexts. The method was designed to address significant issues, including handling multimodal learner data, protecting user privacy, and working with a variety of clients. TriNet-AQ uses QSE, TAF, and Quantum Modulated Integration to accurately capture temporal, behavioural, and contextual subtleties in learner interactions. For efficient federated optimization, the model uses Hybrid Slime-Gorilla Optimisation (HSGO). Train more quickly and reliably with this strategy when resources are scarce and spread out. Model training on client devices is possible with privacy-aware federated learning. This protects student data. Experimental results demonstrated that the framework generalized effectively, achieving 98.5 per cent accuracy, an AUC of 0.95, and only a 3.5 per cent reduction in performance on new data. SHAP-based analysis indicated that each feature contributed clearly and consistently to system understanding. Teachers and students gained trust and openness. Cohen’s d of 0.89 and p < 0.001 make the findings more dependable and meaningful.

Future priorities include expanding TriNet-AQ to accommodate multilingual learning in augmented and virtual reality platforms. Audiovisual signals can be used to develop emotion-aware feedback approaches to enhance customization. The suggested framework should be tested in real-life classrooms to understand how it influences teaching over time.

References

1. Huang F, Bei Y, Yang Z, Jiang J, Chen H, Shen Q. Large language model simulator for cold-start recommendation. In: Proceedings of the ACM international conference on web search and data mining (WSDM). New York, USA; 2025.
2. Pramanik S. Immersive innovations: Exploring the use of virtual and augmented reality in educational institutions. Augmented reality and the future of education technology. IGI Global; 2024. p. 66–85.
3. Zhang Z-W, Liu Z-G, Martin A, Zhou K. BSC: Belief shift clustering. IEEE Trans Syst Man Cybern, Syst. 2023;53(3):1748–60.
- View Article
- Google Scholar
4. Lin Z, Wang Y, Zhou Y, Du F, Yang Y. MLM-EOE: Automatic depression detection via sentimental annotation and multi-expert ensemble. IEEE Trans Affective Comput. 2025;16(4):2842–58.
- View Article
- Google Scholar
5. Zhao H, Ji T, Rosin PL, Lai Y-K, Meng W, Wang Y. Cross-lingual font style transfer with full-domain convolutional attention. Pattern Recognit. 2024;155:110709.
- View Article
- Google Scholar
6. Yang R-S, Li H-B, Huang H-Z. Multisource information fusion considering the weight of focal element’s beliefs: A Gaussian kernel similarity approach. Meas Sci Technol. 2023;35(2):025136.
- View Article
- Google Scholar
7. Koukaras C, Koukaras P, Ioannidis D, Stavrinides SG. AI-driven telecommunications for smart classrooms: Transforming education through personalized learning and secure networks. Telecom. 2025;6(2):21.
- View Article
- Google Scholar
8. Hoter E, Yazbak Abu Ahmad M, Azulay H. Enhancing language learning and intergroup empathy through multi-user interactions and simulations in a virtual world. Virtual Worlds. 2024;3(3):333–53.
- View Article
- Google Scholar
9. Goi CL. The impact of VR-based learning on student engagement and learning outcomes in higher education. In: Teaching and learning for a sustainable future: Innovative strategies and best practices; 2024. p. 207–23.
10. Mondal H, Mondal S. Adopting augmented reality and virtual reality in medical education in resource-limited settings: Constraints and the way forward. Adv Physiol Educ. 2025;49(2):503–7. pmid:40136005
- View Article
- PubMed/NCBI
- Google Scholar
11. Huang C-Q, Huang Q-H, Huang X, Wang H, Li M, Lin K-J, et al. XKT: Toward explainable knowledge tracing model with cognitive learning theories for questions of multiple knowledge concepts. IEEE Trans Knowl Data Eng. 2024;36(11):7308–25.
- View Article
- Google Scholar
12. Aloudat MZ, Aboumadi A, Soliman A, Al-Mohammed HA, Al-Ali M, Mahgoub A, et al. Metaverse unbound: A survey on synergistic integration between semantic communication, 6G, and edge learning. IEEE Access. 2025;13:58302–50.
- View Article
- Google Scholar
13. Ruan T, Liu Q, Chang Y. Digital media recommendation system design based on user behavior analysis and emotional feature extraction. PLoS One. 2025;20(5):e0322768. pmid:40388444
- View Article
- PubMed/NCBI
- Google Scholar
14. Ait Daoud M, Namir A, Talbi M. FSLSM-based analysis of student performance information in a blended learning course using Moodle LMS. Open Inform Sci. 2024;8(1):20220163.
- View Article
- Google Scholar
15. Song W, Wang X, Zheng S, Li S, Hao A, Hou X. TalkingStyle: Personalized speech-driven 3D facial animation with style preservation. IEEE Trans Vis Comput Graph. 2025;31(9):4682–94. pmid:38861445
- View Article
- PubMed/NCBI
- Google Scholar
16. Song W, Ye Z, Sun M, Hou X, Li S, Hao A. AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis. Pattern Recognit. 2025;163:111447.
- View Article
- Google Scholar
17. Yu S, Ye J, Zhang C, Liu Q, Luo S, Nan M, et al. Enhancing collaborative learning environments: A multi-feature fusion model for disruptive talk detection. IEEE Access. 2025;13:61261–73.
- View Article
- Google Scholar
18. Elfakki AO, Sghaier S, Alotaibi AA. An intelligent tool based on fuzzy logic and a 3D virtual learning environment for disabled student academic performance assessment. Appl Sci. 2023;13(8):4865.
- View Article
- Google Scholar
19. Suo W, Muhammad BA, Zhang Z, Liu B. GAT-LS: A graph attention network for learning style detection in online learning environment. J Netw Netw Applic. 2024;4(2):60–72.
- View Article
- Google Scholar
20. Kaouni M, Lakrami F, Labouidya O. Design of an adaptive E-learning model based on artificial intelligence for enhancing online teaching. Int J Emerg Technol Learn. 2023;18(06):202–19.
- View Article
- Google Scholar
21. Prabpala S, Nitiwatthana K. Enhancing learning style identification through advanced machine learning techniques. Sociolyt J. 2024;1(1):21–32.
- View Article
- Google Scholar
22. Hu R, Hui Z, Li Y, Guan J. Research on learning concentration recognition with multi-modal features in virtual reality environments. Sustainability. 2023;15(15):11606.
- View Article
- Google Scholar
23. Liu S. Virtual reality and 6G based smart classroom teaching using artificial intelligence. Wireless Pers Commun. 2024.
- View Article
- Google Scholar
24. Hasnine MN, Nguyen HT, Tran TTT, Bui HTT, Akçapınar G, Ueda H. A real-time learning analytics dashboard for automatic detection of online learners’ affective states. Sensors (Basel). 2023;23(9):4243. pmid:37177447
- View Article
- PubMed/NCBI
- Google Scholar
25. Saidani O, Umer M, Alshardan A, Alturki N, Nappi M, Ashraf I. Student academic success prediction in multimedia-supported virtual learning system using ensemble learning approach. Multimed Tools Appl. 2024;83(40):87553–78.
- View Article
- Google Scholar
26. Alnasyan B, Basheri M, Alassafi M. The power of deep learning techniques for predicting student performance in virtual learning environments: A systematic literature review. Comput Educ: Artif Intell. 2024;6:100231.
- View Article
- Google Scholar
27. Mauidloh NH, Anam SU, Widyastuti W. The correlation between student’s engagement and reading comprehension while using Quizlet gamification for vocabulary learning. Int J Recent Educ Res. 2024;5(4):1013–25.
- View Article
- Google Scholar
28. Li D. Creating personalized higher education teaching system using fuzzy association rule mining. Int J Comput Intell Syst. 2024;17(1).
- View Article
- Google Scholar
29. Huang X, Macgilchrist F. A virtual classroom map-based immersive VR learning approach to fostering collaborative learning. Comput Educ: X Reality. 2024;5:100088.
- View Article
- Google Scholar
30. Sun H. Enhancing higher education English learning through virtual reality and game-based approaches using the fuzzy deep model. CADandA. 2023:231–48.
- View Article
- Google Scholar
31. Alazmi M, Ayub N. Enhancing student success prediction in higher education with swarm optimized enhanced efficientNet attention mechanism. PLoS One. 2025;20(6):e0326966. pmid:40587447
- View Article
- PubMed/NCBI
- Google Scholar
32. Royal Brisbane and Women’s Hospital. English proficiency assessment in AR/VR; 2025.
33. Zhao H, Yang Z, Cheng Y, Tian C, Ren S, Xiao W, et al. GoldMiner: Elastic scaling of training data pre-processing pipelines for deep learning. Proc ACM Manag Data. 2023;1(2):1–25.
- View Article
- Google Scholar
34. Perisic A, Perisic B. Towards a digital transformation hyper-framework. Appl Sci. 2025;15(2):611.
- View Article
- Google Scholar
35. Fan Z, Zhang J, Zhang P, Lin Q, Li Y, Qian Y. Quantum-inspired language models based on unitary transformation. Inform Process Manag. 2024;61(4):103741.
- View Article
- Google Scholar
36. Wang Y, Cao L. Quantum phase transition detection via quantum support vector machine. Quantum Sci Technol. 2024;10(1):015043.
- View Article
- Google Scholar
37. Mostafa RR, Gaheen MA, Abd ElAziz M, Al-Betar MA, Ewees AA. An improved gorilla troops optimizer for global optimization problems and feature selection. Knowl-Based Syst. 2023;269:110462.
- View Article
- Google Scholar
38. Fergus P, Chalmers C. Performance evaluation metrics. Applied deep learning: Tools, techniques, and implementation. Springer International Publishing; 2022. p. 115–38.

[ref1] 1. Huang F, Bei Y, Yang Z, Jiang J, Chen H, Shen Q. Large language model simulator for cold-start recommendation. In: Proceedings of the ACM international conference on web search and data mining (WSDM). New York, USA; 2025.

[ref2] 2. Pramanik S. Immersive innovations: Exploring the use of virtual and augmented reality in educational institutions. Augmented reality and the future of education technology. IGI Global; 2024. p. 66–85.

[ref3] 3. Zhang Z-W, Liu Z-G, Martin A, Zhou K. BSC: Belief shift clustering. IEEE Trans Syst Man Cybern, Syst. 2023;53(3):1748–60.
View Article
Google Scholar

[4] View Article

[5] Google Scholar

[ref4] 4. Lin Z, Wang Y, Zhou Y, Du F, Yang Y. MLM-EOE: Automatic depression detection via sentimental annotation and multi-expert ensemble. IEEE Trans Affective Comput. 2025;16(4):2842–58.
View Article
Google Scholar

[7] View Article

[8] Google Scholar

[ref5] 5. Zhao H, Ji T, Rosin PL, Lai Y-K, Meng W, Wang Y. Cross-lingual font style transfer with full-domain convolutional attention. Pattern Recognit. 2024;155:110709.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Yang R-S, Li H-B, Huang H-Z. Multisource information fusion considering the weight of focal element’s beliefs: A Gaussian kernel similarity approach. Meas Sci Technol. 2023;35(2):025136.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Koukaras C, Koukaras P, Ioannidis D, Stavrinides SG. AI-driven telecommunications for smart classrooms: Transforming education through personalized learning and secure networks. Telecom. 2025;6(2):21.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref8] 8. Hoter E, Yazbak Abu Ahmad M, Azulay H. Enhancing language learning and intergroup empathy through multi-user interactions and simulations in a virtual world. Virtual Worlds. 2024;3(3):333–53.
View Article
Google Scholar

[19] View Article

[20] Google Scholar

[ref9] 9. Goi CL. The impact of VR-based learning on student engagement and learning outcomes in higher education. In: Teaching and learning for a sustainable future: Innovative strategies and best practices; 2024. p. 207–23.

[ref10] 10. Mondal H, Mondal S. Adopting augmented reality and virtual reality in medical education in resource-limited settings: Constraints and the way forward. Adv Physiol Educ. 2025;49(2):503–7. pmid:40136005
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref11] 11. Huang C-Q, Huang Q-H, Huang X, Wang H, Li M, Lin K-J, et al. XKT: Toward explainable knowledge tracing model with cognitive learning theories for questions of multiple knowledge concepts. IEEE Trans Knowl Data Eng. 2024;36(11):7308–25.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref12] 12. Aloudat MZ, Aboumadi A, Soliman A, Al-Mohammed HA, Al-Ali M, Mahgoub A, et al. Metaverse unbound: A survey on synergistic integration between semantic communication, 6G, and edge learning. IEEE Access. 2025;13:58302–50.
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref13] 13. Ruan T, Liu Q, Chang Y. Digital media recommendation system design based on user behavior analysis and emotional feature extraction. PLoS One. 2025;20(5):e0322768. pmid:40388444
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref14] 14. Ait Daoud M, Namir A, Talbi M. FSLSM-based analysis of student performance information in a blended learning course using Moodle LMS. Open Inform Sci. 2024;8(1):20220163.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref15] 15. Song W, Wang X, Zheng S, Li S, Hao A, Hou X. TalkingStyle: Personalized speech-driven 3D facial animation with style preservation. IEEE Trans Vis Comput Graph. 2025;31(9):4682–94. pmid:38861445
View Article
PubMed/NCBI
Google Scholar

[40] View Article

[41] PubMed/NCBI

[42] Google Scholar

[ref16] 16. Song W, Ye Z, Sun M, Hou X, Li S, Hao A. AttriDiffuser: Adversarially enhanced diffusion model for text-to-facial attribute image synthesis. Pattern Recognit. 2025;163:111447.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref17] 17. Yu S, Ye J, Zhang C, Liu Q, Luo S, Nan M, et al. Enhancing collaborative learning environments: A multi-feature fusion model for disruptive talk detection. IEEE Access. 2025;13:61261–73.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Elfakki AO, Sghaier S, Alotaibi AA. An intelligent tool based on fuzzy logic and a 3D virtual learning environment for disabled student academic performance assessment. Appl Sci. 2023;13(8):4865.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Suo W, Muhammad BA, Zhang Z, Liu B. GAT-LS: A graph attention network for learning style detection in online learning environment. J Netw Netw Applic. 2024;4(2):60–72.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Kaouni M, Lakrami F, Labouidya O. Design of an adaptive E-learning model based on artificial intelligence for enhancing online teaching. Int J Emerg Technol Learn. 2023;18(06):202–19.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Prabpala S, Nitiwatthana K. Enhancing learning style identification through advanced machine learning techniques. Sociolyt J. 2024;1(1):21–32.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Hu R, Hui Z, Li Y, Guan J. Research on learning concentration recognition with multi-modal features in virtual reality environments. Sustainability. 2023;15(15):11606.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref23] 23. Liu S. Virtual reality and 6G based smart classroom teaching using artificial intelligence. Wireless Pers Commun. 2024.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref24] 24. Hasnine MN, Nguyen HT, Tran TTT, Bui HTT, Akçapınar G, Ueda H. A real-time learning analytics dashboard for automatic detection of online learners’ affective states. Sensors (Basel). 2023;23(9):4243. pmid:37177447
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref25] 25. Saidani O, Umer M, Alshardan A, Alturki N, Nappi M, Ashraf I. Student academic success prediction in multimedia-supported virtual learning system using ensemble learning approach. Multimed Tools Appl. 2024;83(40):87553–78.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Alnasyan B, Basheri M, Alassafi M. The power of deep learning techniques for predicting student performance in virtual learning environments: A systematic literature review. Comput Educ: Artif Intell. 2024;6:100231.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref27] 27. Mauidloh NH, Anam SU, Widyastuti W. The correlation between student’s engagement and reading comprehension while using Quizlet gamification for vocabulary learning. Int J Recent Educ Res. 2024;5(4):1013–25.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref28] 28. Li D. Creating personalized higher education teaching system using fuzzy association rule mining. Int J Comput Intell Syst. 2024;17(1).
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref29] 29. Huang X, Macgilchrist F. A virtual classroom map-based immersive VR learning approach to fostering collaborative learning. Comput Educ: X Reality. 2024;5:100088.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref30] 30. Sun H. Enhancing higher education English learning through virtual reality and game-based approaches using the fuzzy deep model. CADandA. 2023:231–48.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref31] 31. Alazmi M, Ayub N. Enhancing student success prediction in higher education with swarm optimized enhanced efficientNet attention mechanism. PLoS One. 2025;20(6):e0326966. pmid:40587447
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref32] 32. Royal Brisbane and Women’s Hospital. English proficiency assessment in AR/VR; 2025.

[ref33] 33. Zhao H, Yang Z, Cheng Y, Tian C, Ren S, Xiao W, et al. GoldMiner: Elastic scaling of training data pre-processing pipelines for deep learning. Proc ACM Manag Data. 2023;1(2):1–25.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref34] 34. Perisic A, Perisic B. Towards a digital transformation hyper-framework. Appl Sci. 2025;15(2):611.
View Article
Google Scholar

[98] View Article

[99] Google Scholar

[ref35] 35. Fan Z, Zhang J, Zhang P, Lin Q, Li Y, Qian Y. Quantum-inspired language models based on unitary transformation. Inform Process Manag. 2024;61(4):103741.
View Article
Google Scholar

[101] View Article

[102] Google Scholar

[ref36] 36. Wang Y, Cao L. Quantum phase transition detection via quantum support vector machine. Quantum Sci Technol. 2024;10(1):015043.
View Article
Google Scholar

[104] View Article

[105] Google Scholar

[ref37] 37. Mostafa RR, Gaheen MA, Abd ElAziz M, Al-Betar MA, Ewees AA. An improved gorilla troops optimizer for global optimization problems and feature selection. Knowl-Based Syst. 2023;269:110462.
View Article
Google Scholar

[107] View Article

[108] Google Scholar

[ref38] 38. Fergus P, Chalmers C. Performance evaluation metrics. Applied deep learning: Tools, techniques, and implementation. Springer International Publishing; 2022. p. 115–38.

Figures

Abstract

Introduction

Related work

Proposed system model

Data collection and preprocessing

Preprocessing and feature representation using the NAAL framework

Federated encoding and normalization (FedNorm).

Temporal-biometric aggregation for behavior modeling.

Feature selection based on contextual importance (CIWS).

Adaptive resampling to address imbalanced labels (ARBR).

Classification architecture: TriNet-AQ

Quantum-inspired sinusoidal transformation (QST).

Triaxial attention consolidation (TAC).

Quantum-phase feature blending (QPFB).

Multi-objective predictive decoding.

Federated coordination with gradient preservation.

Parameter tuning via HSGO: Hybrid slime–gorilla optimization

Performance evaluation and metrics

Experimental setup and results

Conclusion

References