Knowledge relation rank enhanced heterogeneous learning interaction modeling for neural graph forgetting knowledge tracing

Knowledge tracing models have gained prominence in educational data mining, with applications like the Self-Attention Knowledge Tracing model, which captures the exercise-knowledge relationship. However, conventional knowledge tracing models focus solely on static question-knowledge and knowledge-knowledge relationships, treating them with equal significance. This simplistic approach often succumbs to subjective labeling bias and lacks the depth to capture nuanced exercise-knowledge connections. In this study, we propose a novel knowledge tracing model called Knowledge Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural Graph Forgetting Knowledge Tracing. Our model mitigates the impact of subjective labeling by fine-tuning the skill relation matrix and Q-matrix. Additionally, we employ Graph Convolutional Networks (GCNs) to capture intricate interactions between students, exercises, and skills. Specifically, the Knowledge Relation Importance Rank Calibration method is employed to generate the skill relation matrix and Q-matrix. These calibrated matrices, alongside heterogeneous interactions, serve as input for the GCN to compute exercise and skill embeddings. Subsequently, exercise embeddings, skill embeddings, item difficulty, and contingency tables collectively contribute to an exercise relation matrix, which is then fed into an attention mechanism for predictions. Experimental evaluations on two publicly available educational datasets demonstrate the superiority of our proposed model over baseline models, evidenced by enhanced performance across three evaluation metrics.


Introduction
With the continuous development of science and technology, the network provides people with many conveniences and created huge amounts of information.Our modern generation is defined by the rapid growth of big data, which is crucial to connecting many areas, including education, health care, and transportation.Combining education with information science is an unstoppable trend in the development of education topics, and the use of various technologies developing in online education has expanded into a significant field of research.The application of artificial intelligence and related technologies to analyze large amounts of educational data generates valuable information for people from it and processes it to further education.Online educational systems have been widely applied in the educational field for tracking, reporting, and the delivery of online courses including edX, Coursera, and Udacity 1 .For students, these platforms provide a variety of conveniences including several online courses and free, individualized learning resources 2 .Furthermore, these online learning platforms support the normal teaching schedule because online teaching is difficult to carry out in many regions due to geographical reasons or weather reasons.These powerful educational systems can be utilized by teachers to create remedial materials based on the needs of their students 3 .Tracking the students' performance based on past interactions is proved as an important task 4 in these educational systems.This task is known as the Knowledge tracing 5 which aims to analyze exercises to infer the knowledge state of the students by their responses.Knowledge tracing(KT) can be considered as the task to evaluate the performance of the student knowledge state.Specifically, students can first select a set of questions X t = (x 1 , x 2 , ...x t ) to practice the knowledge points and response logs (e.g., right or wrong).Then, the objective of the KT is the probability of correctly predicting the knowledge states of the students in the next interaction p(a t+1 = 1|q t+1 , X) according to past interactions and corresponding responses.The input: x t is presented as the tuple(q t ,a t ), where q t infers the question of the student and a t indicates the response provided by the student.
The Q-matrix is served as the process to model the learning resources used in many models 6 .The value of the Q-matrix is binary indicating the relationship between exercises and KCs.Specifically, the value of the matrix is marked as 1 when the problem is related to the KC, otherwise, it is marked as 0. However, the relationship between skills is ignored.Therefore, there exist several calibration methods to generate the calibrated Q-matrix but those methods ignore the knowledge importance between Knowledge Concepts(KCs) 7 .
Recently, numerous knowledge tracing models are designed to handle the problem of tracking the knowledge state of students, such as the Deep Knowledge Tracing model(DKT) 8 , the Dynamic Key-Value Memory network(DKVMN) 9 , Graphbased knowledge tracing(GKT) 10 , and the Self-attentive model for knowledge tracing(SAKT) 11 .These models achieve better prediction performance in several public educational datasets.However, the DKT model ignores the information about the knowledge points and can not take the students' abilities into consideration.The SAKT and DKVMN further consider the relationship between exercises and skills without considering the heterogeneous interactions of the students, exercises, and KCs.The GKT is proposed to consider the heterogeneous interactions of students, exercises, and KCs but ignores the relational information between exercises and KCs.
In order to solve the problem of previous models, the Knowledge Relation Rank Enhanced Heterogeneous Learning Interaction Modeling for Neural Graph Forgetting Knowledge Tracing(NGFKT) is designed to address the drawbacks of these four models.Firstly, the skill relation matrix and Q-matrix are generated and calibrated to consider the relationship between questions and KCs.Then the calibrated skill relation matrix and the Q-matrix are served as the inputs of the GCN to output the skill-exercise embedding.Next, the heterogeneous interactions, the item difficulty, the skill embedding, and the exercise embedding are incorporated to model the exercise relation matrix.In addition, the Position-Relation-Forgetting attention mechanism is applied to predict the students' performance based on the exercise relation matrix.
The main contributions of this paper are: 1.A calibration method is proposed to calibrate the skill relation matrix and Q-matrix according to the relationship of KCs in the heterogeneous interactions:students-exercise-KCs.Then, the calibrated skill relation matrix, Q-matrix, and the heterogeneous interactions are used for the input of the GCN to generate the exercise embedding and skill embedding, which comprehensively considers the relationship between students, exercises, and KCs. 3. Detailed experiments are conducted to evaluate the performance between the NGFKT model and baseline models from three aspects.The first is to measure the prediction performance of the NGFKT model with the baseline model and a new metric: PS is designed to measure the performance with AUC and ACC.The second is to find out the effectiveness of the NGFKT model even with limited records.The last aspect is to visualize the knowledge tracing results based on the radar diagrams.
The rest of the paper is organized as follows.The related works on the knowledge tracing models, graph neural networks, and relation modeling are included in the "Related works".The section "Proposed model" describes the structure of the NGFKT model in detail.In the "Implementation and experimental results" section, the experimental results of this paper and implementation details are presented.Finally, the conclusion and future work directions are shown in the section"Conclusion and future work"

Related works
Knowledge tracing.The knowledge tracking task seeks to assess the students' level of knowledge according to the students' interactions.Numerous KT models created on deep learning now lead to improvements in tracking students' knowledge states.
The Deep Knowledge Tracing model(DKT) 8 is the first method to employ a neural network to track the knowledge state of the students.The effectiveness of the DKT model also is further verified 12 .However, most knowledge-tracing models based on the deep neural network ignore the relationship between exercises and skills.The EKT model is proposed to apply the exercise embedding modules to model the relation between the exercises 13 .However, the EKT model does not take past interactions and student behavior into consideration.The RKT model combines past interactions with the attention mechanism to make predictions 14 without considering the relationship between students, exercises, and KCs.

Graph neural networks.
A non-linear conducted graph is more complicated than a tree structure.Information with a graph structure can describe connections and entities in the real world in an intuitive manner 15 .A unique type of neural network called a GCN operates directly on data with a graph structure.The GCN model can update the presentation of the nodes in the graph by itself and their neighbors 16 .Applying numerous graph convolutional layers, the GCN model can make sure that the updated nodes properly represent both the features of higher-order neighbors and the features of neighboring nodes.
There exist some knowledge tracing models focusing on investigating the potential usage of knowledge graph structures to capture more knowledge.According to the works of 17 , each node represents a student; the nodes are fixed, but the edges are dynamic.This creates a dynamic graph that changes over time.However, this method ignores the heterogeneous interactions between the students-exercises-KCs.The GIKT model is proposed to treat the exercise-skill relationship graph as a bipartite graph and then applies the embedding propagation to the GCN to integrate the exercise-skill correlation 18 without considering student behavior.Therefore, the NGFKT model applies the skill relation matrix and Q-matrix to model the relationship between exercises and KCs and predicts the performance by incorporating the attention mechanism with a forgetting layer.
Relation modeling.The relation modeling is also involved in the knowledge tracing tasks including the exercise relation modeling.Exercise relation modeling is applied in many research papers to make predictions 19 .When two exercises are associated with the same knowledge concept or the student, they can be considered connected 20 .In addition, the Q-matrix is used for investigating the relationship between exercises and skills to explore the connection between two exercises 21 .Intuitively, if the two problems are similar in difficulty and similar in past practice, then the two problems can also be considered similar 14,22 .According to these ideas, the NGFKT model incorporates the item difficulty, the exercise embedding, the skill embedding, and the contingency table of two exercises to generate the final exercise relation matrix.

Proposed model
In this section, the NGFKT model is developed based on the exercise relation matrix and Position-Relation-Forgetting attention mechanism to predict the performance of students.There exist three steps including the skill relation matrix and Q-matrix modeling, the extraction of the exercise relation matrix, and the predictions based on the Position-Relation-Forgetting attention mechanism.First, the skill relation matrix( Ŝ) and the calibrated Q-matrix ( Q) are designed as the input of the model.Then, the skill relation matrix Ŝ, the calibrated Q-matrix ( Q), and the heterogeneous interactions are treated as the inputs of the GCN to output the embedding of exercises and skills.The similarity of exercises can be computed based on the exercise embedding, skill embedding, item difficulty, and contingency table to generate the corresponding exercise relation matrix(R E ).Finally, the exercise relation matrix(R E ) is served as the input of the Position-relation-Forgetting attention mechanism to track the student knowledge state.The overall structure can reference as follows: Skill relation matrix and Q matrix modeling.The related skill can be defined as the skill that is covered by the same exercise.When considering the hierarchical knowledge levels of skill, the related skill also can be regarded as the parent nodes and child nodes of the skill.For instance, the knowledge concept: the "Triangle" is viewed as the "Right Triangle"'s parent node, and the "Pythagorean Theorem" is the child node of the "Right Triangle".Therefore, the related skills of the "Right Angle" are the concepts: "Triangle" and "Pythagorean Theorem".Inspired by these two ideas, the importance of the skills is ranked based on the following partial order, and the calibrated method called the knowledge Relation Importance Rank Calibration method is proposed(KRIRC).A pairwise Bayesian treatment is as follows.For convenience, we define a partial order > + i as: where "a", "b", "c", and "d" are the neighbors of the skill.Here, "a" implies the skill of the parent node, knowledge level is 0, in the knowledge level graph."b" denotes the skill of the child node, knowledge level is 1, in the knowledge level graph."c" can be interpreted in a similar way."d" is the skill that is covered by the same exercise.Along this line, neighbor: "a" is more important than neighbor: "b" in extracting the neighbor of the skill.The rank of the skill: "a", "b", "c", and "d" are 0, 1, 2, and 3 respectively.Thereby, according to the equation 1, the partial order relationship set can be defined as ..K} where K is the number of knowledge concepts.Based on the traditional Bayesian method, we assume the calibration matrix: M uniforms the Gaussian distribution.To give the calibration matrix labels higher confidence, we define p(a > + i b| M) as follows: (2)

Response Tuple
Skill embedding

Student Cognitive Difficulty
Exercise embedding

Outputs
Forgetting curve The overall structure of the Neural Graph Forgetting Knowledge Tracing (NGFKT).Firstly, according to previous interactions, the response tuples of students and knowledge levels can be extracted from the inputs.Then KRIRC method is designed to generate the calibrated skill relation matrix and Q-matrix.The skill relation matrix, Q-matrix, and the cognitive difficulty of each student are treated as the inputs of the GCN to generate the skill-exercise embedding: ê.Secondly, the ê, the item difficulty of each question, and the contingency table are combined to output the exercise relation matrix.Finally, the Position-Relation-Forgetting attention mechanism is utilized to process the inputs to make predictions.
where λ controls the discrimination of relevance values of different knowledge levels.The log posterior over D KRIRC on M can be eventually computed as: where I(*) is the judgment function when the function's condition is met and the function output 1.And C, the constant variable, can be ignored when training the matrix.Finally, a calibrated matrix M estimated by the KRIRC can be calculated.The Q-matrix and the skill relation matrix can apply the KRIRC method to obtain the calibrated Q-matrix and the calibrated skill relation to reflect the hierarchical knowledge levels between different knowledge points.The specific algorithm can reference as follows: Exercise relation modeling.The exercise matrix modeling is designed based on two processes.Firstly, skill-exercise embedding is obtained by applying the GCN.Secondly, skill-exercise embedding, item difficulty, and the contingency table are incorporated to generate the exercise relation matrix.
For the GCN model, high-order neighbor information can be encoded by combing the skill relation matrix and the Q-matrix with the heterogeneous interactions.The GCN model consists of numerous convolutional layers, and each layer can be updated by the states of itself and the neighbors of nodes.The ith node in the graph donated as node i indicating the skill state s i or exercise state e i .The neighboring nodes of node i are denoted as a set of nodes: Node(i).As a result, the ı th layer of the graph convolutional network can be updated as follows: Evaluate calibrated skill element q i j using equation 2; where w ı and b ı infer as the weighted matrix and bias of the GCN layer and the RELU() indicates the activation function accepted in the GCN model.Then the skill-exercise embedding: ê is used for estimating the implicit relations among questions by calculating the inner product of questions: The exercises' similarity further incorporates the item difficulty with the previous students' performance to generate R E .For the item difficulty modeling, the students' incorrect interactions can intuitively represent the item difficulty of the exercises involved in the student interactions.And the students repeat the same questions by utilizing their skills in different timestamps.This behavior also can demonstrate the cognitive difficulty of these exercises.In order to model this situation, the cognitive question difficulty for a student: s can be defined as follows: where Ψ q,t indicates each student's cognitive difficulty of the question set at timestamp t.The cognitive difficulty is divided into 5 levels including very hard, hard, medium difficulty, relatively easy, and easy.The number ranging from 0 to 5 is accepted to indicate the corresponding levels.The |Q| denote the set of questions before timestamp t and R s refers to the student's response to the same questions.A zero in the R s indicates the student provides a wrong answer for a question.If a learner attempt to answer a question fewer than five times, the cognitive difficulty of this question is directly quantified into 5.Then according to the different learners ' cognitive difficulty of questions, the average cognitive difficulty for different learners on the same question is defined as the item difficulty: ϕ(q) after processing the cognitive difficulty for different learners into the GCN.
Then, according to the item difficulty: ϕ q , the similarity of question difficulty can be modeled as follows: In order to incorporate the previous interaction, the students' performance on question pair q i and q j is summarized in the contingency table.The students' correct and incorrect responses are interpreted as the mastery indicators of the questions referring to Table 1.When a question pair appears more than once in the previous student interactions, the latest occurrence is taken into consideration.According to the contingency table, seven evaluation metrics, measuring the association between two variables, are developed to measure the relationship between the question pair: e i and e j referring to Table 2.A threshold is

5/11
imposed to control the sparsity of relations of exercises.The exercise relation matrix based on the contingency table is denoted as W R , R ∈ {SK, Kappa, Kappa , Phi,Yule, Ochiai, Sokal, Jaccard.}.
Table 2. Seven evaluation metrics.These metrics are designed to explore the association between two variables.
Finally, the relation of exercise: i with exercise: j is calculated as follows: where Θ is a threshold to control the sparsity of the exercise relation matrix.Then Given the past exercises: (e 1 , e 2 , ...e n−1 ) and the next exercise: e n , the exercise relation matrix is defined as R E = [A e n ,e 1 , A e n ,e 2 ,...A e n ,e n−1 ].Finally, the exercise relation matrix is applied as the input of the Position-Relation-Forgetting attention mechanism.
Position-Relation-Forgetting attention mechanism.The Position-Relation-Forgetting attention mechanism includes the relative position attention layer, the relation attention layer, and the forgetting layer.The relative position attention accepts the relative distance between input elements.x i and x j = (x 1 , x 2 ..x n−1 ) are served as the model inputs to track the student state.And edge vectors between x i and x j are presented as a v i, j , a K i, j to extract relative position representations.The edge vectors are clipped to a maximum absolute value of k: clip(x,k) = max(-k, min(k,x)).And corresponding relative position representations are respectively.The outputs of the relative position attention mechanism are new sequences Z.The process can refer to the following equations: a i, j = exp(e i, j ) where W Q , W K , and W V are the query, key, and value matrices respectively and d z is the dimension of the new sequence of Z.Then, the relation attention predicts student performance on the next interaction by combining the outputs of the relative position attention mechanism.The relation attention layer incorporates the output of the formula (13) with the exercise relation matrix to predict student performance on the next interaction.
where W Q , W K , and W V represent the query, key, and value matrices of the attention mechanism.Then applying the output of the relation attention layer is treated as the input of the forgetting layer based on learning theory in the educational field.The relative time intervals between past and next interactions are compared as i = t n − t i .The final outputs of the Position-Relation-Forgetting attention mechanism incorporating forgetting behavior,R F , is computed as follows: where ξ 1 and ξ 2 are hyper-parameters.
Student performance prediction layer.The student performance prediction layer contains the pointwise Feed-Forward (FFN) and probability prediction layer.The FFN can be computed as follows referring to ( 17).W l and W s , b l and b s are weighted matrices and bias vectors respectively.
The probability prediction layer predicts the probability of the student's performance by accepting function: σ () on the basis of the FFN.P denotes as the probability that the students provide correct answers in the next interaction.The W and b are trainable parameters.

Implementation and experimental results
Implementation details.Framework setting.The model dimension of attention, the max sequence length, and the training batch size are 200.The dropout rate of the NGFKT model is 1e-2.And the hyper-parameters, including the λ , Θ, are 1 and 0.65 respectively.The parameters in the exercise relation modeling: µ 1 , µ 2 , and µ 3 are 0.1, 0.2, and 0.7 respectively.The other parameters that are not specified involved in the process of the training model are normally initialized as 0.
Evaluation methodology.Metrics.The prediction task is evaluated in a binary classification scenario, i.e., whether or not an exercise is performed correctly.As a result, the Area Under Curve (AUC) and Accuracy(ACC) are accepted to measure the prediction performance of students.AUC or ACC values of 0.5 usually indicate that the result was determined at random.The greater the knowledge tracing performance, the higher the value of AUC or ACC.The cross-entropy is accepted as the loss function of the NGFKT model.The Performance Stability metric(PS) is used to specifically compare the performance of the baseline models with the NGFKT model in the testing phase.The performance of the model: M is stable when the M can consistently outperform other models in most testing batches.Based on this idea, the PS is designed based on the performance rank.For instance, if the NGFKT model outperforms the DKT model and DKT+ model in 96 testing batches.However, the performance of the NGFKT model is worse than the DKT+ model in 4 testing batches.The performance rank of the NGFKT model is 1 in 94 testing batches and 2 in 4 testing batches.Then the PS of the NGFKT model is 97.32% referring to the following formulation.The N Batch and N model are the number of the testing batches and models in this paper.Knowledge state prediction visualization.Knowledge state prediction visualization is regarded as an important application of knowledge tracing models for online educational systems.We will show that our proposed model: the NGFKT model can capture the student knowledge state correctly compared with two standard knowledge tracing models: the DKT model and the DKT+ model.Specifically, Figure 2 indicates the knowledge state traced by the NGFKT model of the same student.The general knowledge evolving process of the knowledge state is consistent with the student learning process.When the student first attempts the exercise, the knowledge state reaches the minimum level.The student continues to learn skills: "32", "49", and "71" and continuously deepens his proficiency in knowledge points.Finally, the student knowledge state achieves the maximum, which is shown by the increased areas of the radar diagram.During the latest attempt of the student, the knowledge proficiency of the student presents some reduction considering the student forgetting behavior.But, knowledge proficiency is still improved by continuously practicing the skills compared with the first interaction with the student.Referring to Figure 2, the NGFKT model also outperforms the DKT model and the DKT+ model because the NGFKT model further incorporates the relation modeling that is generated by the GCN model.The DKT+ model achieves better results than the DKT model, which indicates that adding two regularization terms can further improve the performance of tracing the knowledge proficiency of the student.

Conclusions and future work
In this paper, a novel knowledge tracing model: NGFKT is proposed to track the student knowledge state by incorporating the relation modeling, skill relation matrix, Q-matrix, and relative distance representations.The NGFKT model can track the student knowledge state accurately even with small amounts of interactions.Specifically, this paper applies the KRIRC method to calibrate the skill relation matrix and Q-matrix and these two matrices served as the input of the GCN to generate the exercise embedding and skill embedding.The skill-exercise embedding, the item difficulty, and the contingency table are incorporated to generate the final exercise relation matrix.Finally, utilizing the Position-Relation-Forgetting Attention layer outputs the predicted results.The experiments conducted on two public datasets indicate that the NGFKT model can track the student knowledge state efficiently.A combination of explainable and predictive power in the NGFKT model will contribute to the better design of the Online Educational System.In the future, we plan to take the exercise texts into the design of the knowledge tracing model and consider more student behaviors, such as the guessing factor, in relation modeling.

9 11 Update
Replace the element in Q and Ŝ with a calibrated element 10 end parameters learning rate α and hyper-parameter λ 12 end

Figure 2 .
Figure 2. The radar diagram.The NGFKT model outperforms the DKT model and the DKT+ model in tracking the student knowledge state.The "32", "49", and "71" are three skill ids and are presented with three different colors.The average prediction accuracy of the NGFKT model is around 70.5%.

Calibrated Skill Relation Matrix and Q matrix Generation Exercise Relation Modeling Position-Relation-Forgetting Attention Knowledge Ranks
Knowledge Relation Importance Rank Calibration Method.Input: Students' historical response dataset: D = s 1 , s 2 , ...s N , s i = (e i , s i ,t i ); The knowledge level graph G; The heterogeneous relation graph: τ; Task-learning rate: α; Hyper-parameter λ ; Output: The calibrated Q matrix: Q; The calibrated skill relation matrix : Ŝ. 1 initialization learning rate α and hyper-parameter λ randomly 2 while element in G and τ do Extract hierarchical knowledge levels and related skills of each element based on G and τ ) 4/11 Algorithm 1: 3

Table 1 .
The contingency table for exercise i and exercise j.The labels: "F" and "T" present the student answering the exercise incorrectly or correctly.

Table 4 .
Comparison of results of baseline models with the Neural Graph Forgetting Knowledge Tracing model(NGFKT).The NGFKT outperforms all baseline models in terms of AUC, ACC, and PS.In scenario 2, which contains new students who have short exercise sequences, the training data are separated into six groups.Each group has a distinct range of exercise sequences, such as (50, 75], (75, 100], (100, 125], (125, 150], (150, 175], and(175, 200].The lengths of the exercises from the original exercise sequence are sampled to generate each exercise sequence in the training data.Given that students in the first group have the fewest exercise answering records, it is clear that this situation is the most challenging for the student.In Figure2(b), the effectiveness of the various techniques is compared.On the Eedi datasets, the NGFKT model shows better performance in this scenario than DKT and DKT+.