Figures
Abstract
Recommender systems play a vital role in enhancing the user experience and facilitating content discovery on online platforms. However, conventional approaches often struggle to capture users’ evolving preferences over time, leading to suboptimal performance as recommended videos frequently do not align with users’ interests. To address this issue, this study introduces an innovative method that leverages watch-time duration to analyze long-term user behavior and generate personalized recommendations. The proposed Duration Count Matrix (DCM) technique includes two key components: User Profiling (DCM-UP) and User Similarity (DCM-US). DCM-UP constructs dynamic user profiles based on engagement with content, while DCM-US quantifies user similarity through collaborative filtering, enabling the system to predict user-to-user behavior and personalize recommendations. This innovative system, DCM-UP, utilizes matrix-based representations of users and items, dynamically updates profiles, and adapts to changing preferences over time, thus providing a more accurate reflection of user interests. Additionally, DCM-US facilitates the identification of user similarities by analyzing user-item generalizations. Moreover, the effectiveness of the proposed techniques was evaluated on a real-world dataset obtained from JAWWY, the Saudi Telecom Company. The study’s results clearly demonstrated that the DCM approach significantly outperformed existing state-of-the-art methods across various performance metrics, including precision, recall, F1-score, and accuracy. This highlights the superiority of the DCM technique in capturing and predicting long-term user behavior for more accurate and personalized recommendations.
Citation: Alqazzaz A, Anwar Z, Hassan Mu, Qureshi S, Alsulami M, Zia A, et al. (2025) Genre-aware user profiling using duration count matrices: A novel approach to enhancing content recommendation systems. PLoS ONE 20(4): e0312520. https://doi.org/10.1371/journal.pone.0312520
Editor: Ali B. Mahmoud, St John’s University, UNITED STATES OF AMERICA
Received: September 8, 2023; Accepted: October 8, 2024; Published: April 1, 2025
Copyright: © 2025 Alqazzaz et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript.
Funding: The authors are thankful to the Deanship of Graduate Studies and Scientific Research at the University of Bisha for supporting this work through the Fast-Track Research Support Program (to AA). The funders had no role in study design, data collection, and analysis, the decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Recommender systems have become an indispensable part of numerous online platforms, aiding users in discovering relevant content and enhancing their overall user experience [1–5]. With the ever-increasing volume of available content [6, 7], it has become crucial to develop efficient systems that can accurately predict user preferences and make personalized recommendations [8–12].
However, traditional methods frequently encounter difficulty in keeping up with users’ evolving preferences over time. These methods frequently fall short in delivering satisfactory results, evidenced by two key shortcomings: firstly, the videos recommended often do not align with users’ interests, and secondly, the recommendations are often complex and fail to resonate with users, consequently lacking the power to encourage user engagement.
To overcome these limitations, this study introduces a novel technique that leverages watch-time duration to capture and analyze users’ long-term behavior. By considering the duration of user engagement with content [4, 13], we gain valuable insights into their preferences, interests, and viewing habits. This information forms the foundation for building recommender systems that can adapt to users’ changing tastes and provide tailored recommendations [14, 15].
The use of watch-time duration [16] as a key parameter in capturing long-term user behavior offers several advantages. Firstly, it provides a more accurate representation of user preferences, as it reflects the actual time spent interacting with content. This approach takes into account the intensity and duration of user engagement, allowing us to differentiate between casual and significant interests.
Furthermore, the incorporation of long-term user behavior enhances the effectiveness of recommender systems [8, 17–19]. By considering users’ historical patterns and analyzing their evolving preferences, we can generate more personalized recommendations that align with their long-term interests [20, 21]. This not only improves the overall user experience but also enhances user satisfaction and engagement.
In this paper, we present an innovative technique called the Duration Count Matrix (DCM) to enrich the prediction of users’ long-term behavior and watch-time duration by tracking their historical behavior patterns. The DCM incorporates two essential components: User Profiling (DCM-UP) and User Similarity (DCM-US). Through DCM-UP, user behavior is captured and profiles are created, while DCM-US measures the similarity between users.
Furthermore, this study proposes an advanced recommender system, namely Duration Count Matrix-based User Profiling (DCM-UP), which leverages user long-term behavior to provide personalized recommendations. Initially, the system learns the distributed representation of users and items using a matrix-based approach. Subsequently, it dynamically updates each user’s behavior in a sequential manner, utilizing the Dynamic Hierarchical Updating State, where all matrix levels are interconnected. This enables the system to adapt to users changing preferences over time.
In addition, a recommending state is generated to offer users item recommendations based on their individual preferences. By analyzing user behavior and preferences, the system intelligently identifies items that align with their interests, enhancing the overall user experience.
Moreover, this study introduces Duration Count Matrix-based User Similarity (DCM-US), which utilizes collaborative filtering to predict user-to-user behavior. This approach involves generating a matrix based on user-item generalization and sequentially analyzing each user’s data using an aggregate (sum) function. Through this process, the system determines the degree of similarity between users, enabling the prediction of how closely their preferences align.
To implement the Duration Count Matrix (DCM) techniques, the proposed system employs machine learning algorithms to identify patterns in user behavior. These identified patterns are then used to predict users’ future long-term behavior. Furthermore, numerous experiments conducted on real-world datasets demonstrate that incorporating watch-time duration leads to a more accurate understanding of user behavior, thereby improving the overall effectiveness of video recommendation systems.
This study makes four key contributions:
- The paper presents innovative DCM techniques for the next-item recommendation, focusing on long-term behavior prediction. These techniques capture users’ historical behavior and currently utilized preferences, enabling a more comprehensive understanding of user behavior and more accurate recommendations.
- The study proposes Duration Count Matrix-based User Profiling (DCM-UP), an advanced recommender system that leverages users’ long-term behavior. DCM-UP employs a matrix-based approach to learn the distributed representation of users and items. It dynamically updates each user’s behavior using the Dynamic Hierarchical Updating State, adapting to changing preferences over time and providing personalized recommendations.
- The study also introduces DCM-US, which uses collaborative filtering to predict user-to-user behavior. By generating a matrix based on user-item generalization and analyzing each user’s data with an aggregate function, DCM-US determines the degree of similarity between users. This enhances the recommender system’s accuracy in predicting user preferences.
- Experimental results demonstrate that the DCM techniques achieve a significantly higher understanding of user long-term behavior compared to state-of-the-art techniques. By comprehending users’ behavior and preferences more deeply, the system selects more personalized and engaging content based on individual interests and preferences, enhancing user satisfaction.
Related work
Next, we will thoroughly review the current literature that relates to our research focus, specifically, predicting watch-time and modeling behavior in Recommender Systems (RecSys).
Video recommendation
The explosive growth of video platforms can be attributed to their ability to recommend captivating content to users [22]. Within this context, accurately predicting watch time, a crucial metric that reflects user engagement, holds significant importance for recommender systems in various industries [23, 24]. While there has been considerable research focused on metrics like Click-Through-Rate (CTR) [25–29], the area of watch-time prediction has received comparatively less attention. Recent advancements in video recommendations have demonstrated remarkable performance by mining users’ preferences and leveraging historical preference data to forecast future videos, thereby assessing the effectiveness of recommender systems.
To address the challenge of predicting users’ video preferences, several innovative approaches have emerged [30, 31]. Dynamic Micro Video Recommendations (DMA) [32] propose an explicit modeling technique that captures the dynamic trends in users’ current preferences, incorporating both historical data and potential future trends. The task of recommending the next video to watch has long been a focal point in recommender system research. Markov Chain-based Transition Probability matrix [33] has been introduced as an efficient method to uncover individual behavior preferences. Additionally, the groundbreaking study of Multiscale Time Aware User Interest (MTIN) [34] introduces interest groups based on users’ interaction sequences, providing novel insights for video recommendation. In terms of video lifespan and streaming patterns, CONDE (Concept-Aware Graph Neural Network) [35] presents a concept-driven approach to representing user preferences in video recommendations. Going beyond click and rate-based approaches, Social4Rec [36] enhances the representation of user interests by incorporating social factors such as friendships, following bloggers, and interest groups. Finally, SEMI (Sequential Multi-model Information Transfer Network) [37] utilizes user behavior in e-commerce environments to enhance video recommendations, particularly in the context of purchasing interactions.
This section emphasizes the critical role of accurate watch-time prediction in video recommendations. The advancements in dynamic modeling, interest group analysis, concept-aware representations, and the integration of social aspects contribute to the ongoing efforts to improve the effectiveness and user experience of recommender systems in the domain of video recommendations.
Behavior modeling in RecSys
Behavior modeling in the realm of Recommender Systems (RS) encompasses the task of unveiling sequential patterns embedded within users’ historical sequences, enabling the prediction of subsequent items based on these patterns [38]. With a keen eye on optimizing long-term user satisfaction, Markov Decision Process (MDP) approaches [39] have emerged to encapsulate user satisfaction rewards, incorporating nuanced heuristics that consider both user stickiness and activeness. Retention, an invaluable metric reflecting prolonged user-system interactions, has come into sharp focus within the RS landscape. Remarkably, request-based MDP [40] has revolutionized reinforcement learning techniques, empowering the pursuit of maximal retention and long-term performance optimization.
Notably, RS research has experienced a paradigm shift from short-term engagement optimization to a resolute dedication to enhancing the long-term user experience [41, 42]. This transformation has witnessed the advent of surrogate selection techniques [43], forging connections between long-term outcomes and more immediate behavioral signals, thus enabling the fine-tuning of immediate-term behaviors. Another noteworthy study, PreRec [44], has laid the foundation for recommender systems to leverage insights gained from users’ historical behaviors, thereby optimizing long-term user engagement within the realm of recommendations.
Moreover, within the domain of online RS, a pioneering approach known as Long-Short term Temporal Meta-learning (LSTTM) has emerged [45], focused on unraveling users’ intricate internal and external behaviors and preferences, thus providing valuable insights for recommendations. Exploiting the power of long-term sequence models, revolutionary methods such as DREAM [46], SASRec [47], and BST [48] have entered the fray. Dynamic REcurrent BAsket Model (DREAM) astutely captures global sequential features that interlink items, while the self-Attention-based Sequential model (SASRec) harnesses the prowess of self-attention-based sequential models. Lastly. Behavior Sequence Transformer (BST) meticulously deciphers and harnesses users’ long-term interests through the Behavior Sequence Transformer.
Proposed methodology
The primary objective of this study is to analyze user behavior by utilizing user-to-item and user-to-user relationships based on watch-time duration, with the ultimate goal of predicting future behavior. In Fig 1, we present a comprehensive methodology for video recommendation that covers various stages, including data collection, data preprocessing, behavior analysis using classification and clustering techniques, model selection, model training and testing, and performance evaluation using diverse evaluation metrics.
Data collection and preprocessing
In this study, the dataset was collected from the Saudi Telecom Company (STC) [49] and comprises video-based data with N = 3, 598, 607 rows and d = 13 columns, including features such as user ID, watch-time duration, genres, date, etc. The STC’s new Open Data project for the general public, known as “IPTV,” and its streaming platform for movies and television shows, JAWWY, provided the sources for this dataset. The JAWWY contains over 3 million records of user behavior activity, making it a valuable resource for analyzing user behavior. Thus, the video-based data can be considered as an N × d matrix , where each row of X corresponds to a user and each column corresponds to a feature. In this study, X was used to analyze user behavior and identify patterns for recommendation systems.
From the perspective of user behavior analysis, this study used the STC dataset to identify long-term behavioral patterns for recommendation systems. To achieve this, we conducted statistical analysis and machine learning algorithms to model and predict user behavior based on various features. The video-streaming dataset used in this study is a collection of data points or observations that represent user behavior on a streaming platform. This dataset is composed of two main parts, as shown in Table 1 movies data and series data.
Movies dataset.
The movies dataset is a subset of the video-streaming dataset that contains information about movies watched by users on the platform. It comprises 1,547,401 rows and 13 columns, denoted as . Here, Xm is a matrix of real numbers that represent the movies data. It has 1,547,401 rows and 13 columns, each representing a unique user-movie pair and each representing a feature (xi,j) of the movie.
Series dataset.
The series dataset is another subset of the video-streaming dataset that contains information about TV series watched by users on the platform. It comprises 2,051,206 entities along 13 columns, denoted as . Here, Xs is a matrix of real numbers that represent the series data. It has 2,051,206 rows, which is 57% of the total dataset, as compared to the movie data, which covers 43% of the dataset.
After the collection of data, the process of data preprocessing is applied, which typically involves transforming the raw data into structured and well-organized datasets that can be effectively analyzed using data mining techniques. The dataset used in this research was subjected to several preprocessing steps to make it more manageable and accurate. The pre-processing steps applied to the dataset include data cleaning, dimension reduction, and data transformation. These techniques were utilized to eliminate errors, reduce the number of variables, and transform the data into a more usable format for analysis.
Data cleaning.
Data cleaning is a fundamental process in data preprocessing [50–52], which involves the identification and management of incomplete, inaccurate, duplicated, null, or irrelevant data. Suppose we have an original dataset containing n observations and p features. The aim of data cleaning is to obtain a cleaned dataset
, where nc ≤ n and pc ≤ p.
The preprocessing stage of data analysis is crucial for identifying and handling outliers, as it can yield valuable insights into the data. This research focused on analyzing the maximum number of users who were outliers in the dataset. The Jawwy service is similar to Netflix, and while customers watch a large number of videos for entertainment, a watch-time duration of 11944 is an abnormally high value. To account for this, the research considered two potential explanations for the outliers:
- The presence of duplicate records for the same user in the database.
- The absence of essential features in the service that help users limit their viewing time.
To identify and handle outliers, as shown in Fig 2, Inter-Quartile Range (IQR) was used as a preprocessing method to manage the outliers [53, 54]. We define the set of outliers as O = x ∈ X|x > Q3 + 1.5 × IQR or x < Q1 − 1.5 × IQR, where Q1 and Q3 are the first and third quartiles, respectively, and IQR = Q3 − Q1 is the interquartile range.
Dimensionality reduction.
Consider a dataset X with n observations and m features, where X = {x1, x2, …, xn} and each xi is an m-dimensional vector. In order to improve the performance of machine learning models, it is often assumed that adding more features will lead to better accuracy. However, this is not always true [55] and can lead to decreased performance as the dimensionality of the dataset increases.
To address this issue, dimensionality reduction techniques aim to transform the high-dimensional dataset X into a lower-dimensional representation Y = {y1, y2, …, yn}, where each yi is a k-dimensional vector with k < m. By reducing the number of dimensions, these techniques make it easier to analyze and visualize the data.
One common method for reducing the dimensionality of a dataset is the correlation-based feature selection technique. This method involves identifying highly correlated features and retaining only one of them while discarding the others. This significantly reduces the dimensionality of the dataset without losing much information. The correlation coefficient between two features xi and xj can be computed using the Pearson correlation coefficient, which is given by:
(1)
where mean(xi) and std(xi) are the mean and standard deviation of the ith feature, respectively. The resulting correlation coefficient ranges between -1 and +1, with values closer to +1 indicating a strong positive correlation and values closer to -1 indicating a strong negative correlation.
Data transformation.
Following the completion of data cleaning, a range of data transformation methods were implemented to improve the quality and usability of the dataset [56].
Encoded transformation. In data processing, it is often necessary to convert categorical data into numerical data in order to use it in various analytical and modeling tasks [57]. One-hot encoding is a technique used to perform this conversion [58]. Categorical data represents qualitative variables with a finite number of categories, such as genres (animated, action, drama). However, many machine learning algorithms require numerical input data to operate on. One-hot encoding is a way to represent categorical data numerically without introducing any ordinal relationship between the categories.
Suppose we have a categorical variable C with k categories, represented by the set C = {c1, c2, …, ck}. To transform this categorical variable into a numerical format, we can use a one-hot encoding. One-hot encoding creates k binary variables, one for each category, and assigns a value of 1 to the variable corresponding to the category and a value of 0 to all other variables. The resulting dataset has k columns, with each column representing one category.
The one-hot encoding of a categorical variable C can be defined as follows:
Let x be a categorical variable with k categories, represented by the set C = {c1, c2, …, ck}. The hot-one encoding of x is a matrix X of size n × k, where n is the number of observations in the dataset, such that each element xij of the matrix is defined as:
(2)
For example, suppose we have a dataset with a categorical variable “Genres” that can take on three values: “Action”, “Horror”, or “Drama”. The one-hot encoding of this variable would create three binary variables, one for each genre. The resulting dataset would have three columns, with each column representing one genre as shows in Table 2.
Scaling transformation. In the preprocessing of a dataset, it is essential to perform scaling transformations, which involve applying scaling techniques to the features. The goal of scaling is to ensure that all the features are measured on the same scale to avoid issues of one feature dominating over another [59]. Min-max scaler is a data normalization technique used to transform features by scaling them to a specified range of values. Let’s consider a feature vector X = {x1, x2, ⋯, xn} with n elements. The min-max scaler maps each element xi to a new value in the range [a, b], where a and b are the minimum and maximum values of the range [0, 1], respectively.
The transformation is performed as follows:
The purpose of min-max scaling is to normalize the feature vector so that each element has equal weight and is on the same scale. It is commonly used in machine learning algorithms that require the features to be on a common scale, such as decision trees, random forests, etc. The example of min-max scaling is shown in Table 3.
Consider a dataset consisting of two features, denoted by X1 and X2, and assume that the dataset has been provided with values. The objective is to apply min-max scaling to the dataset to standardize the values of both features to the same scale. The first step involves computing the minimum and maximum values of each feature. Subsequently, the min-max scaling formula is applied to each feature for each sample.
(3)
Analysis of user behavior patterns for data classification & clustering
In recommendation systems, user behavior analysis plays a vital role as it allows for a deeper understanding of user preferences, and interests. This information can be represented as a set of vectors denoted by X = {x1, x2, ⋯, xn}, where each vector xi represents the behavior pattern of user i. Classification and clustering are two common techniques used to analyze user behavior patterns in recommendation systems.
User-item behavior analysis.
User-item behavior analysis is a type of analysis used in recommendation systems that involves analyzing the behavior patterns of both users and items [20]. This analysis is used to identify patterns in the way that users interact with different items (such as products, movies, and series), which can be used to provide personalized recommendations to users.
In user-item behavior analysis, each user and item is represented as a vector of features, such as genre. These vectors can be used to model the interactions between users and items and to identify patterns in the way that users interact with different items. Let U be the set of users, I be the set of items, and W be the set of watch-time duration. The user-item interaction matrix, where each entry Ai,j represents the watch-time duration of item j by user i. A can be represented as A ∈ WU×I. This matrix can then be analyzed using various techniques, such as content-based filtering, to generate personalized recommendations for each user based on their past behavior patterns.
(4)
For user behavior analysis for data classification, let D be the dataset of user interactions with data, where each interaction is represented as a vector of features. Let B be the matrix that represents the dataset, where each row corresponds to a user’s behavior pattern and each column corresponds to a feature as, B ∈ RN×F. User behavior analysis for data classification is a type of analysis used in machine learning to classify data based on user behavior patterns.
(5)
In this context, user behavior patterns refer to the patterns in the way that users interact with data, such as watch-time duration. By analyzing these patterns, we can identify features that are most relevant for classification and use them to build predictive models.
(6)
User-user behavior analysis.
User-user behavior analysis [60] is a type of analysis used in recommendation systems that involves analyzing the behavior patterns of users with similar interests or preferences. This analysis is used to identify patterns in the way that similar users interact with different items (such as products, movies, and series), which can be used to provide personalized recommendations to users.
Let there be N users and M items, and let A be the user-item interaction matrix with dimensions N × M. Each row of matrix A represents the behavior patterns of a user with respect to different items. To identify patterns in the behavior patterns of similar users, we can use similarity metrics. Let sim(u, v) be the similarity between two users u and v based on their behavior patterns.
(7)
The behavior patterns of the users in the set U(u) can then be used to provide personalized recommendations to the target user u.
User behavior analysis for data clustering is a type of analysis used in machine learning to cluster data based on user behavior patterns. To cluster the users based on their behavior patterns, we can use various clustering techniques such as k-means clustering. Let C be the set of clusters obtained after applying a clustering algorithm to the user-item interaction matrix A.
The behavior patterns of users within the same cluster can be considered similar, and personalized recommendations can be provided to each user based on the behavior patterns of the other users in their cluster. Given the user-item interaction matrix A, the objective of clustering is to find a set of k clusters C = {C1, C2, …, Ck}, where each cluster Ci is a subset of users {u1, u2, …, un}.
(8)
Exploring user behavior patterns for data classification and model implementation
The prediction of user behavior patterns constitutes a vital endeavor in which one endeavors to discern the proclivities and inclinations of individuals through an examination of their historical behavioral data [61]. This intricate procedure necessitates a thorough dissection of these behavior patterns, with the ultimate goal of offering tailor-made recommendations and enhancing user contentment.
The prediction of user behavior patterns can be formalized as follows. Let denote the user-behavior matrix, where each row corresponds to a user and each column corresponds to a behavior. Let
denote the target variable, which is a binary indicator of whether a user is interested in a particular item or not. Let
denote the feature matrix of the users, where each row corresponds to a user and each column corresponds to a user attribute, such as genres. The task is to learn a function
that can predict the target variable y given the feature vector
of a new user.
An approach for acquiring this function involves the utilization of a decision tree, as outlined in [62]. A decision tree partitions the feature space into a set of rectangular regions and assigns a label to each region. The construction of this tree is an iterative process, wherein, at each juncture, the system selects the feature that yields the highest information gain, proceeding until it fulfills a predefined termination criterion. The resultant decision tree can then be applied to classify novel users by tracing a trajectory from the tree’s root to a leaf node. This leaf node corresponds to the label of the segment encompassing the user’s feature vector.
The decision tree algorithm can be formalized as follows. Given a set of training data (xi, yi){i = 1}N, where and yi ∈ {0, 1}, the algorithm learns a tree T that minimizes the following cost function:
(9)
Common loss functions include the logistic loss for binary classification:
(10)
where
is the predicted probability of class 1. Once the decision tree is trained, it can be used to predict the behavior patterns of new users by following the path from the root to a leaf node, which corresponds to the predicted label. The decision tree [63] can also be used to identify the most important features that contribute to the classification of users by computing the information gained at each split.
Exploring user behavior patterns for data clustering
In the realm of data clustering, forecasting user behavior patterns is a pivotal task, entailing the categorization of users into distinct clusters predicated on their historical behavioral data, as elucidated in [64]. This intricate procedure necessitates a meticulous examination of user behavior patterns, geared towards furnishing personalized recommendations and enhancing user contentment. A frequently employed clustering technique is the renowned k-means algorithm.
K-means is an unsupervised machine learning algorithm [65] employed to cluster similar data points together. In the context of user behavior prediction, K-means can be utilized to group users based on their behavior patterns. In the k-means algorithm, the distance criterion is typically set to the Euclidean distance between data points and cluster centroids [66]. The Euclidean distance [67] is a measure of the straight-line distance between two points in Euclidean space, which corresponds to the shortest path between them.
(11)
In the assignment step, each data point is assigned to the nearest cluster centroid based on the Euclidean distance [68]. The distance between each data point and each cluster centroid is computed, and the data point is assigned to the cluster with the closest centroid.
In the update step, the cluster centroids are recalculated based on the mean of the data points assigned to each cluster. The centroid of each cluster is set to the mean of the data points in that cluster, which minimizes the total within-cluster sum of squares (WCSS). By using the Euclidean distance [69] as the distance criterion, the k-means algorithm seeks to minimize the total distance between data points and their respective cluster centroids, resulting in compact and well-separated clusters.
In k-means, K refers to the number of clusters that are desired. This parameter is usually determined by the analyst based on the characteristics of the data and the desired number of clusters. The choice of K can have a significant impact on the clustering results, as too few clusters may not capture all the relevant behavior patterns, while too many clusters may result in overfitting and poor generalization.
Algorithm 1: K-mean Clustering for User Behavior Prediction
Require: X ← the data points (matrix of size n × m)
Require: K ← the number of clusters
Ensure: C ← the final centroids (matrix of size K × m)
Ensure: idx ← a vector of size n containing the index of the cluster to which each data point belongs
1: C ← X[randomly select K rows]
2: oldidx ← None
3: while idx ≠ oldidx do
4: oldidx ← idx
5: for i ← 1 to n do
6: idx[i] = arg mink‖xi − ck‖2
7: for k ∈ [1, K] do
8: Sk = xi: idx[i] = k
9: if |Sk| > 0 then
11: else
12: Ck ← X[randomly select one row]
13: end if
14: end for
15: end for
16: end while
17: return C, idx
Moreover, the Within-Cluster Sum of Squares (WCSS) [70] serves as a metric employed to assess the quality of K-means clustering. WCSS is computed as the sum of the squared distances between each data point and its assigned centroid. The primary objective of K-means is to minimize WCSS, as it reflects the compactness of the clusters. A lower WCSS signifies that data points within each cluster are closer to each other, indicating the effectiveness of the clustering process.
(12)
Hence, the optimal value of K can be determined using the elbow method, which involves plotting the WCSS for different values of K and selecting the value of K at the elbow point of the curve.
According to the results presented in Fig 3, the Elbow method was employed to ascertain the optimal number of clusters for the datasets analyzed in this study. The Elbow method entails plotting the within-cluster sum of squares (WCSS) against the number of clusters and identifying the elbow point on the resultant curve. The elbow point corresponds to the number of clusters at which the rate of decrease in WCSS begins to level off, beyond which the addition of more clusters does not yield a substantial reduction in WCSS.
The elbow point indicates the optimal number of clusters to use in the clustering algorithm, balancing cluster specificity and the within-cluster sum of squares (WCSS). (a) Movies data. (b) Series data. (c) Collective data.
For the movie dataset, the curve reached the elbow point at a value of K = 3, indicating that three clusters would be optimal for this dataset. Conversely, for the series dataset and the collective dataset, the elbow points were observed at values of K = 5 and K = 6, respectively. Based on these results, the analysis will proceed using the optimal number of clusters identified for each dataset.
Principal component analysis.
Principal Component Analysis (PCA) is a technique employed to diminish the dimensionality of a dataset while preserving as much of the data’s variability as feasible [71]. This method finds extensive application in data analysis, machine learning, and statistics. PCA operates by identifying a fresh set of variables known as principal components, which are linear combinations of the original variables.
(13)
These principal components are arranged in order of the variance they account for in the original dataset. The first principal component, PC1, is the linear combination of X that has the largest variance. The weights must satisfy the constraint that . The second principal component, PC2, is also the linear combination of X that has the second-largest variance, subject to the constraint that it is orthogonal to PC1. Similarly, the kth principal component, PCk, is the linear combination of X that has the kth-largest variance, subject to the constraints that it is orthogonal to PC1, PC2, ⋯, PC(k−1). This can be expressed as:
(14)
where ak1, ak2, …, akp are the loadings or weights of the variables X1, X2, …, Xp in PCk. The loadings must satisfy the constraints that
for i = 1, 2, …, p and akiakj = 0 for i ≠ j and i, j = 1, 2, …, k − 1. By choosing only the top few principal components, we can reduce the dimensionality of the dataset while still retaining a large amount of the variability in the original data.
(15)
Let X be an n × p matrix representing a dataset with n observations and p variables. Assume that the variables have been centered to have a mean of 0. The goal of PCA is to find a new set of k ≤ p variables Z1, Z2, …, Zk that are linear combinations of the original variables, such that the Zi are orthogonal and explain as much of the variance in X as possible. Specifically, the Zi are defined as:
(16)
where Xj is the jth variable of X, and aij are the loading coefficients that determine the weights of each variable in each Zi. The loading coefficients are chosen to maximize the total variance explained by the Zi, subject to the constraint that the Zi are orthogonal and have unit length.
(17)
Once the loading coefficients have been computed, each observation xi in X can be transformed into the lower-dimensional representation yi = (yi1, yi2, …, yik), where yij is the value of the jth component of Zj for the ith observation.
(18)
The resulting matrix Y, which has dimensions n × k, represents the lower-dimensional version of the original dataset. It can be used in place of X for visualization, clustering, classification, or other machine-learning tasks that are sensitive to the number of variables
In the context of k-means clustering (Fig 4), PCA can be used to visualize the clustering results when the dataset has many dimensions. Specifically, after performing k-means clustering on the high-dimensional data, we can use PCA to reduce the dimensionality of the data to two or three dimensions, which can be easily plotted on a graph.
(a) Movies data. (b) Series data. (c) Collective data.
Performance metrics & evaluation criteria
Performance metrics encompass quantitative measures employed to gauge the proficiency of a classification model. These metrics serve the purpose of evaluating the model’s precision in accurately predicting the class labels of test data. Within classification methodologies, performance metrics assume a pivotal role in appraising the caliber of classification outcomes [72]. They furnish insights into the model’s accuracy, dependability, and its capacity to generalize. Notably, among these performance measures, both the confusion matrix and classification report are employed to assess the effectiveness of a classification model.
A confusion matrix serves as a tabular tool employed to assess the performance of a classification model. It provides a breakdown of key metrics for each class in the dataset, including the count of true positives TP, false positives FP, true negatives TN, and false negatives FN. In the context of binary classification, the confusion matrix is structured with two rows and two columns, symbolizing the two classes under consideration. The matrix comprises four entries, which encompass:
- True positives TP: It represents the count of instances that are genuinely positive (true) and have been correctly classified as positive (true) by the model.
- False positives FP: It signifies the count of instances that are actually negative (false) but have been erroneously classified as positive (true) by the model.
- True negatives TN: It denotes the count of instances that are genuinely negative (false) and have been accurately classified as negative (false) by the model.
- False negatives FN: This corresponds to the count of instances that are truly positive (true) but have been incorrectly classified as negative (false) by the model.
These terms are used to calculate various performance measures such as precision, recall, F1-score, and accuracy.
A classification report serves as a concise summary of the performance metrics for a classification model. It encompasses precision, recall, and the F1-score for each class label, in addition to providing an overall accuracy assessment of the model.
- Precision measures the proportion of correctly classified instances in the positive class (i.e., true positives) out of all instances classified as positive by the model (true positives and false positives).
(19)
- Recall (also known as sensitivity) measures the proportion of correctly classified instances in the positive class (true positives) out of all instances that are actually in the positive class (true positives and false negatives).
(20)
- F1-score is a weighted average of precision and recall and provides a single value to summarize the performance of the model.
(21)
- Accuracy measures proportion of correctly classified instances (both true positives and true negatives) out of all instances in the test data.
(22)
Fundamentally, the confusion matrix offers a graphical portrayal of classification outcomes, whereas the classification report furnishes a more comprehensive and detailed analysis of the model’s performance. Together, these tools are invaluable for assessing the efficacy of a classification model.
The proposed algorithm
Within this section, our initial focus lies in delivering a formal depiction of a sequential recommendation algorithm predicated on watch-time duration. Following this, we present description of our novel approach, DCM-UP (Dynamic Contextual Modeling for User Preferences).
Problem formulation
Let X = {x1, x2, x3, …., xm} denote the set of users, where |X| = m, and G = {g1, g2, g3, …., gn} denote the set of items or genres, where |G| = n. For a user, x ∈ X, let represent the sequence of genres that user x interacts with before time t, where
. Let
denote the set of historical sequences for all users.
The aim of historical sequence recommendation is to predict the next genre that a user x will interact with after time t, based on their historical sequence
. This can be formally expressed as:
(23)
where f is a function that maps historical sequences to the next genre in the sequence.
(24)
Where T is the maximum time of the historical data. The goal is to learn this function f from the historical sequences in H, such that it accurately predicts the next genre in the sequence for any given user and time t.
The architecture of Duration-Count Matrix-based User Profiling (DCM-UP)
The architecture of the DMC-UP historical sequential model is depicted in Fig 5, consisting of four distinct components. Firstly, the Embedding State employs advanced embedding technology to model the distributed representation of both users and items. Secondly, the Dynamic Hierarchical Transformer State systematically merges each level from the top to the bottom layer by layer, updating each user’s interest or behavior through the count matrix and ultimately yielding the users’ long-term interest expression. Thirdly, the State Recommendation methodically models the relationship between user interest and their subsequent moment interest representation, resulting in a uniform and consistent expression of user interest. Lastly, the Fully Connected Control State accurately models the control or comparison between user interest and the implicit representation of the user based on items.
It has four parts: embedding state, dynamic hierarchical updating state, state recommendation, and fully connected control state.
Embedding state.
In the embedding state, the model learns the distributed representation of users (X) and items (G). Let X be a matrix of size m × r, where each row represents a user and each column represents a feature of the user’s distribution. Similarly, let G be a matrix of size n × r, where each row represents an item (genre) and each column represents a feature of the item’s distribution.
(25)
where “;” represents the concatenation of rows or columns. In this state, all users are arranged in sequential order on the left side of the matrix as rows (horizontally), while the genres (items) are arranged vertically in the form of columns. To further elaborate, X can be written as:
(26)
where each element of xi,j represents the jth feature of the ith user’s distribution. Similarly, G can be written as:
(27)
where each element of gi,j represents the jth feature of the ith item’s distribution.
Dynamic hierarchical updating state.
In the dynamic hierarchical updating state, there are L levels denoted by the index l (l = 1, 2, …, L). Let L be the number of levels in the dynamic hierarchical updating state. Let each layer of the matrix be denoted as Ml, where 1 ≤ l ≤ L. Each element of the matrix Ml represents the watch-time duration Wd that a user watched based on a particular genre Gd. The user is denoted by the index i, and the genre is denoted by the index d.
(28) Ml(i, d) represents the element in the l-th layer of the matrix that corresponds to the watch-time duration Wd that user i watched based on the genre Gd. At each level l, the system reads the user’s behavior sequentially and updates the watch-time duration for each genre based on the following equation:
(29)
where σ is the summation function, k is the number of genres,
is the watch-time duration of the user i for the genre d in the previous level l − 1, and
is the additional watch-time duration of the user i for the genre d at level l.
For example, let’s say a user i watches an animation (G1) for 51 seconds, then watches it again for 87 seconds. The previous state in level l − 1 is . The new watch-time duration for this user and genre at level l is:
(30)
The matrix levels Ml are all interconnected, where each level is linked to the following level from top to bottom. Therefore, the updated watch-time duration at each level l propagates to the next level l + 1. The final watch-time duration for each user i and genre d is denoted as .
(31)
(32)
(33)
Algorithm 2: Dynamic Hierarchical Updating State
Require: L: the number of levels in the dynamic hierarchical updating state
Require: M: a matrix with L levels, where each level represents the watch-time duration Wd that a user i watched based on a particular genre Gd
Require: k: the number of genres
Ensure: Mupdated: the matrix with updated watch-time duration for each user i and genre d at each level l
1: for l ← 1 to L do
2: for i ← 1 to n do
3: for d ← 1 to k do
4: if l ← 1 then
5: Wi,d,l = Wi,d
6: else
7:
8: end if
9: Mupdated,l[i, d] = Wi,d,l
10: if l < L then
11: Wi,d,next = Wi,d,l
12: Ml+1[i, d]+ = Wi,d,next
13: end if
14: end for
15: end for
16: end for
17: return Mupdated
This algorithm 2 is used to update a matrix that represents the watch-time duration of users for different genres at different levels. The algorithm takes three inputs—L, M, and k. L is the number of levels in the dynamic hierarchical updating state, M is the matrix with L levels representing the watch-time duration, and k is the number of genres. The algorithm outputs a new matrix M_updated that contains updated watch-time duration for each user i and genre d at each level l.
If the current level is not the last level, the algorithm propagates the updated watch-time duration to the next level by adding it to the matrix at the next level. Finally, the algorithm returns the updated matrix M_updated with the new watch-time duration for each user and genre at each level.
(34)
State recommendation.
In state recommendation, the system dynamically updates the watch-time duration for each genre and recommends items (genres) to users based on their preferences. To achieve this, the system selects the highest duration genre from each row of the matrix M, which contains the watch-time duration for each user i and genre d. Mathematically, this can be expressed as:
(35)
where Mi,d represents the watch-time duration of user i for genre d in the matrix M, and argmaxdMi,d denotes the genre index with the highest watch-time duration for user i, as seen in Fig 5. The system recommends the item (genre) corresponding to this highest duration genre to the user i, assuming that the user is interested in this genre.
In words, the system selects the index d that maximizes the value of Mi,d for each user i, indicating the genre with the highest watch-time duration for that user. The system recommends the item (genre) corresponding to this index to the user, assuming their interest based on the highest duration genre.
Algorithm 3:: State Recommendation
Require: Watch-time duration matrix M of size n × k, where n is the number of users and k is the number of genres
Ensure: A dictionary containing recommended genres for each user
1: Recommended Genres ← {}
2: for i ← 1 to n do
3: max duration ← 0
4: max duration genre ← 0
5: for d ← 1 to k do
6: if Mi,d > max duration then
7: max duration ← Mi,d
8: max duration genre ← d
9: end if
10: end for
11: Recommended Genres[i] ← max duration genre
12: end for
13: return Recommended Genres
Fully connected control state.
Let us define a binary labeling function that assigns the label T if the actual genre y matches the recommended genre
, and F otherwise (Fig 6). In other words,
(36)
The system updates the current watch-time duration by adding it to the previous duration values and recommends items (genres) to each user based on the highest weight of watch-time duration.
Due to the “cold-start problem”, there are certain items that cannot be recommended to initial users. Our algorithm takes this into account and recommends only those items that can be recommended to the users.
Therefore, given a watch-time duration matrix M of size n × k where n is the number of users and k is the number of genres, our system recommends genres for each user by finding the genre with the highest watch-time duration for that user. The recommended genre is then compared with the actual genre watched by the user and a label is assigned using the function as described above.
Algorithm 4: Fully Connected Control State
Require: Watch-time duration matrix M of size n × k where n is the number of users and k is the number of genres
Ensure: Recommended genres for each user with assigned
1: n ← number of users;
2: k ← number of genres;
3: D ← empty dictionary;
4: for i from 1 to n do
5: gi ← genre with highest watch-time duration for user i
6: if canRecommend(gi) then
7: D[i] ← gi
8: else
9: D[i] ← defaultGenre
10: end if
11: end for
12: return D
Duration-Count Matrix based User Similarity (DCM-US)
In this study, we propose a system that generates a matrix to measure user-to-user similarity by utilizing user-item generalization. The goal of this process is to predict the similarity between users by sequentially reading their historical data, based on specific items and their corresponding watch-time duration.
Algorithm 5: Generating User-to-User Similarity Matrix
Require: M (n × k matrix), T (integer), L (integer)
Ensure: S (n × n matrix)
1: X ← aggregate function(M)
2: for i ← 1 to n do
3: for j ← 1 to k do
4: X′ij ← 0
5: for l ← 1 to L do
6: if i − l ≥ 1 then
7: X′ij ← X′ij + Xi − l, j
8: end if
9: end for
10: end for
11: end for
12: for i ← 1 to n do
13: for j = i to n do
14: numerator ← 0
15: for t = 1 to k do
16: numerator ← numerator + (X′it − mean(X′i)) × (X′jt − mean(X′j))
17: end for
18:
19:
20: Sji ← Sij
21: end for
22: end for
23: return S
Let M be the watch-time duration matrix of size n × k, where n is the number of users and k is the number of items. Our system generates a user-item matrix X of size n × k, where Xij denotes the duration of watch time for item j by user i. We use an aggregate function f to predict each user’s behavior based on the duration of watch time. This function computes the sum of the watch-time duration for each item and user, yielding a matrix X.
(37)
To estimate the time a particular user spent watching a video for each specific item, the system analyzes the behavior of similar users. In the user’s sequential historical behavior, the system learns each user’s behavior by using the duration of watch time, sums it from the first level to the current level, and adds it into a matrix-specific chunk. Let S be a similarity matrix of size n × n, where Sij represents the similarity between users i and j.
The estimated duration of watch time for item j by user i based on similar users’ behavior up to level T and depth L is denoted as X′ij. To compute X′ij, we sum the watch-time duration of similar users up to level T and depth L for item j and user i as follows:
(38)
The process of adding this information into the matrix is repeated for all items, resulting in a matrix X′ that captures the estimated watch-time duration for each user-item pair. Finally, the system computes the similarity matrix S between users based on X′. The similarity between users i and j is computed as follows:
(39)
where
and
are the mean values of the estimated watch-time duration for each user-item pair for users i and j, respectively. This similarity measure allows us to compare the behavior of users and make recommendations based on their preferences.
Experiments and results
In this section, we embark on an evaluation attempt to gauge the performance of our innovative algorithms, DCM-UP (Dynamic Contextual Modeling for User Preferences) and DCM-US (Dynamic Contextual Modeling for User Streaming). This evaluation unfolds across three practical scenarios: movies, series, and collective video streaming recommendations. We employ a vast video streaming dataset comprising an impressive repository of over 3 million records, spanning 29,487 users and encompassing 16 distinct genres. These records capture user behavior, particularly watch-time duration. Within this section, we furnish in-depth insights into the hyperparameter configurations, establish a baseline method for reference, and orchestrate a comprehensive comparative analysis. This analysis meticulously pits our proposed techniques against the backdrop of existing state-of-the-art methodologies. This comparative exploration serves as a pivotal window into the effectiveness of DCM-UP and DCM-US. Moreover, it dissects the individual contributions of each component within our proposed techniques, unraveling their collective impact on overall performance.
Baseline method
To validate the effectiveness of DCM (Duration Count Matrix), we conduct a comparative analysis with a set of long-term user behavior modeling algorithms.
SEMI [37] a pioneering sequential multi-modal information transfer network, revolutionizes micro-video recommendations by harnessing users’ product domain behavior.
MMTHA [20] is a Multi-scale Modeling of Users’ Historical Behavior for Micro Video Recommendations that models users’ historical behavior across multiple scales. By capturing users’ short-term dynamic interests and incorporating long-term correlations, MMTHA effectively predicts user behavior on micro videos.
BAR [73] is a Behavior-Aware Recommendation that emphasizes the user’s sequential and heterogeneous one-class feedback. By incorporating behavior information into the input and output of a representation module, BAR effectively captures the item sequence and its relationship to the user’s real next behavior.
CT [74] revolutionizes the next video recommendation with its collaborative transformer architecture. This unified framework excels at capturing both micro video representation and sequential user-video historical interactions.
RLUR [40] is a Reinforcement Learning for User Retention to model long-term user feedback in the context of short-video recommendation systems that focus on predicting user retention. The objective is to minimize the time it takes for users to return to the system while maximizing long-term performance.
PreRec [44] is a Preference-based Recommender System that optimizes long-term user engagement in recommendations by leveraging preferences based on historical behavior, rather than relying solely on explicit behavior.
Experiments setup
Tables 4–6 presents a summary of the different hyperparameter values used in the proposed video recommendation models. To evaluate the performance of the proposed methodologies, we randomly assigned 70% of the data for each user as the training set and the remaining 30% as the testing set. Machine learning techniques, specifically the decision tree algorithm, were applied to assess the performance of the proposed methodologies. We comprehensively evaluated the proposed methodologies for video recommendations by selecting the parameter configuration. The hyperparameters, including the maximum depth, splitters, criterion, and maximum leaf nodes, were systematically configured for each dataset.
As we see in Table 4, we found that the optimal maximum depth for DCM-UP on the movies dataset was 5, while the optimal maximum depth for DCM-US was 4. Both techniques employed the entropy criterion and chose the best splitter. Furthermore, DCM-UP utilized 32 maximum leaf nodes, while DCM-US utilized 7 maximum leaf nodes. Therefore, we can express the hyperparameters selected for the movies dataset as:
Based on our analysis of the series dataset, we have determined that the ideal maximum depth for the DCM-UP model is 5, while the DCM-US model performs best with a maximum depth of 6. These findings are outlined in Table 5. For the DCM-UP model, we employed the gini criterion [75] to select the optimal splitter, while for the DCM-US model, the entropy criterion was utilized. Furthermore, the DCM-UP model was configured with a maximum of 32 leaf nodes, while the DCM-US model utilized a maximum of 12 leaf nodes. Therefore, we can summarize the hyperparameters chosen for the series dataset as follows:
Similarly, the hyperparameters chosen for DCM-UP and DCM-US on the collective video streaming dataset are presented in Table 6. In the case of DCM-UP, the optimal maximum depth was determined to be 7, whereas, for DCM-US, it was 5. Both models utilized the entropy criterion to select the most suitable splitter. Furthermore, DCM-UP was configured with a maximum of 100 leaf nodes, while DCM-US utilized only 12. In summary, the selected hyperparameters for the video streaming dataset can be summarized as follows:
Comparative analysis
To ascertain the efficacy of the suggested video recommendation techniques, an extensive comparative analysis was undertaken in this investigation. The comparison encompassed a wide range of machine-learning models for long-term behavioral prediction. The outcomes of this comparison, including the performance of the DCM-UP and DCM-US models in relation to other models, were presented in Tables 7–10, utilizing three distinct datasets. Essential evaluation metrics such as accuracy, precision, recall, and F1-score were employed to demonstrate the effectiveness of the techniques.
According to the results presented in Table 7, the DCM-UP model demonstrated superior performance compared to the Random Forest and Gradient Boosting models in the domain of decision trees. This can be attributed to several factors. Notably, the Decision Tree model exhibited remarkable accuracy in both the movies and series datasets, achieving training accuracies of 87% and 89%, respectively. In contrast, the Random Forest and Gradient Boosting models achieved lower accuracy scores of 75%, 74%, and 73% respectively. These findings highlight the efficacy of the DCM-UP model in comparison to the alternative models when considering decision tree-based approaches.
Based on the findings from DCM-US (Tables 8–10), an impressive 98% accuracy was attained in analyzing the movie dataset. Furthermore, when examining series data and video streaming datasets, the Decision tree model alone yielded a remarkable accuracy of 98%. Similarly, the random forest and gradient boosting approaches proved to be highly effective, achieving an accuracy of 97% in this particular context.
Evaluation results
The effectiveness of our innovative recommender model, based on the duration count matrix, is evaluated through a rigorous methodology. In order to achieve a comprehensive assessment, we partitioned our dataset into a 70-30 split, dedicating 70% of the watch-time duration for model training while reserving the remaining 30% for evaluation purposes. The model’s performance was subjected to a thorough analysis, employing fundamental metrics such as precision, recall, f1-score, and accuracy. The significance of achieving higher precision, recall, F1-score, and accuracy lies in the fact that they indicate the relevance and quality of the recommendations provided. An increase in these metrics corresponds directly to the enhancement of recommendation quality and precision, ultimately elevating the overall user experience.
In our evaluation, we conduct a thorough analysis of our proposed method against state-of-the-art approaches. This critical analysis serves as a benchmark, allowing us to explicate the advancements and contributions of our approach in relation to existing methodologies. Through a comprehensive review of relevant literature, we aim to identify key challenges and limitations encountered by previous methods. Moreover, this comparative analysis offers insights into the strengths and advantages of our methodology, Specifically in addressing key challenges within the video recommendation. Additionally, our evaluation highlights the robustness of our method across diverse datasets and scenarios, showcasing its adaptability and applicability in real-world settings.
Moreover, Table 11 offers an extensive evaluation of the proposed DCM-UP technique, considering a range of hyperparameter configurations and assessing multiple performance metrics. Remarkably, the analysis highlights the superiority of series data over movie data and video streaming data, achieving the highest level of accuracy. This notable difference in performance can be attributed to the significantly larger dataset size and the richer content available in the series data, which allows for more robust model training and better generalization.
In this study, when focusing on the analysis of series data, we observed that default hyperparameters (HPs) achieved a remarkable accuracy of 89%, while the tuned HPs achieved a slightly lower accuracy of 76%. Conversely, when examining both movie and series data together (as depicted in Fig 7), the utilization of tuned HPs resulted in an accuracy rate of 72%. In contrast, default HPs yielded an accuracy of 87% for movie data and 73% for video streaming data. These findings underscore the critical role of hyperparameter tuning in optimizing the model’s performance.
(a) Movies Dataset. (b) Series Dataset. (c) Video Streaming Dataset.
In this study, we applied the innovative DCM-US technique to identify users sharing similar behaviors and preferences, thereby facilitating personalized recommendations. Table 12 shows the optimizing DCM-UP model performance Using default hyperparameters. The evaluation, as showcased in Table 13, was conducted across various datasets using finely-tuned hyperparameters (HPs), which led to exceptionally high accuracy outcomes. To be more specific, our analysis revealed that the movies data, series data, and video streaming data all achieved a remarkable accuracy rate of 98% each, as illustrated in (Fig 8).
(a) Movies Dataset. (b) Series Dataset. (c) Video Streaming Dataset.
Delving into the intricacies of the model’s decision-making process, we directed our attention toward a vital hyperparameter known as “max depth.” Through thorough analysis and experimentation, we explored the impact of different “max depth” values on the model’s performance.
In our analysis of the movies dataset (Fig 8a), setting the “max depth” hyperparameter to 4 resulted in optimal performance. This balance between complexity and simplicity effectively captured inherent patterns, leading to highly accurate recommendations.
In analyzing series data (Fig 8b), we found that a “max depth” of 6 resulted in exceptional accuracy, allowing the model to capture intricate relationships effectively. Conversely, for the video streaming dataset (Fig 8c), including movies, series, and additional factors, a “max depth” of 5 achieved outstanding accuracy, successfully navigating complexities while avoiding overfitting.
After assessing the accuracy rates, the evaluation of model performance extends to error-based measures. [63] In this section, we examine key error-based metrics, including Gini impurity Entropy, and misclassification error, to provide a comprehensive understanding of our model’s performance.
In further detail, we analyze Gini impurity [75], a crucial metric that measures the impurity of a dataset by calculating the probability of misclassifying a randomly chosen instance. When assessing the DCM-UP, default hyperparameters produce an impurity score of 0.3, contrasting with the 0.1 impurity achieved with tuned hyperparameters. Similarly, leveraging the DCM-US technique results in a notable reduction in impurity, yielding a score of 0.08.
(40)
Furthermore, Entropy [76] quantifies the uncertainty or disorder present in a dataset. It is computed as the negative sum of the probabilities of each class multiplied by the logarithm of the probability. In our evaluation of the DCM-UP, both default and tuned hyperparameters resulted in an impurity score of 0.3. On the contrary, when utilizing the DCM-US technique, we achieve a notable decrease in impurity, with a score of 0.03.
(41)
Moreover, Misclassification error [77] evaluates the ratio of misclassified instances to the total number of instances. When evaluating the DCM-UP, we achieve a score of 0.1 with default HPs and 0.2 with tuned HPs. On the other hand, we achieve error score of 0.01, when employing the DCM-US technique.
(42)
Hyperparameter senstivity analysis
In certain cases, default hyperparameters in decision tree models demonstrate superior performance compared to fine-tuned hyperparameters [78]. This can be attributed to several factors. Firstly, default hyperparameters are designed to provide a good balance between model complexity and generalizability. They are chosen based on heuristics and prior knowledge to work reasonably well across various datasets and scenarios. This inherent generalizability makes default hyperparameters robust and reliable. Secondly, default hyperparameters are less prone to overfitting, a phenomenon where the model becomes overly specialized to the training data and performs poorly on unseen data. The conservative nature of default hyperparameters helps prevent overfitting and encourages better generalization to new data. Additionally, in situations where the dataset is small or lacks diversity, fine-tuning hyperparameters may not yield significant improvements. The predefined values of default hyperparameters provide a sensible starting point that avoids over-optimization and ensures reasonable performance. Default hyperparameters offer a convenient and efficient solution in such scenarios. However, it is important to note that the performance of default hyperparameters can still vary depending on the specific dataset and problem domain. Therefore, it is advisable to explore and experiment with fine-tuning hyperparameters to uncover optimal configurations that suit the unique characteristics of the data. Ultimately, the choice between default and fine-tuned hyperparameters depends on the trade-off between performance and the available resources and constraints.
Conclusion
In our research, we present an novel method for the enhancement of recommender systems, specifically tailored to the precise analysis of users’ long-term behavioral patterns. Through the utilization of watch-time duration data, our approach, referred to as the Duration Count Matrix (DCM) technique, furnishes a holistic insight into user preferences. This, in turn, empowers the generation of personalized recommendations that dynamically adapt to evolving user tastes over time. However, incorporating watch-time duration offers significant advantages over conventional approaches, as it takes into account the actual actual temporal investment users make when engaging with content.
Within the framework of the DCM technique, we delineate two pivotal constituents: User Profiling (DCM-UP) and User Similarity (DCM-US). DCM-UP is instrumental in capturing user behavior and generating user profiles. It does so by employing matrix-based representations of users and items, rendering dynamic updates to user behavioral patterns, and accommodating the evolution of user preferences over time. This functionality ensures the delivery of tailored recommendations that align with individual user inclinations. Additionally, DCM-US harnesses the power of collaborative filtering to prognosticate user-to-user interactions. This predictive mechanism is essential for ascertaining the degree of similarity between users, thereby enhancing the precision of our predictions pertaining to user preferences.
Furthermore, our empirical findings substantiate the efficacy of the DCM techniques in comprehending user long-term behavioral patterns, surpassing the performance of contemporary methodologies. This substantiates our ability to furnish more personalized and captivating content recommendations, underpinned by an acute understanding of individual interests and preferences. In forthcoming research endeavors, we propose a thorough exploration of the synergy between the DCM approach and alternative recommendation techniques, such as content-based filtering or collaborative filtering. The integration of these approaches holds promise for advancing the development of hybrid recommender systems, augmenting both recommendation precision and diversity.
References
- 1.
Gong X, Feng Q, Zhang Y, Qin J, Ding W, Li B, et al. Real-time Short Video Recommendation on Mobile Devices. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management; 2022. p. 3103–3112.
- 2.
Lin Z, Wang H, Mao J, Zhao WX, Wang C, Jiang P, et al. Feature-aware Diversified Re-ranking with Disentangled Representations for Relevant Recommendation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 3327–3335.
- 3.
Wang J, Ma W, Li J, Lu H, Zhang M, Li B, et al. Make Fairness More Fair: Fair Item Utility Estimation and Exposure Re-Distribution. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 1868–1877.
- 4.
Zhan R, Pei C, Su Q, Wen J, Wang X, Mu G, et al. Deconfounding Duration Bias in Watch-time Prediction for Video Recommendation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 4472–4481.
- 5.
Liu S, Chen Z, Liu H, Hu X. User-video co-attention network for personalized micro-video recommendation. In: The World Wide Web Conference; 2019. p. 3020–3026.
- 6.
Meehan K, Lunney T, Curran K, McCaughey A. Context-aware intelligent recommendation system for tourism. In: 2013 IEEE international conference on pervasive computing and communications workshops (PERCOM workshops). IEEE; 2013. p. 328–331.
- 7.
Huang Y, Cui B, Jiang J, Hong K, Zhang W, Xie Y. Real-time video recommendation exploration. In: Proceedings of the 2016 international conference on management of data; 2016. p. 35–46.
- 8.
Wu Q, Wang H, Hong L, Shi Y. Returning is believing: Optimizing long-term user engagement in recommender systems. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management; 2017. p. 1927–1936.
- 9.
Xue W, Cai Q, Zhan R, Zheng D, Jiang P, An B. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor. arXiv preprint arXiv:220602620. 2022;.
- 10.
Zou L, Xia L, Ding Z, Song J, Liu W, Yin D. Reinforcement learning to optimize long-term user engagement in recommender systems. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; 2019. p. 2810–2818.
- 11.
Cui P, Wang Z, Su Z. What videos are similar with you? learning a common attributed representation for video recommendation. In: Proceedings of the 22nd ACM international conference on Multimedia; 2014. p. 597–606.
- 12.
Zhou X, Chen L, Zhang Y, Cao L, Huang G, Wang C. Online video recommendation in sharing community. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data; 2015. p. 1645–1656.
- 13.
Zheng Y, Gao C, Ding J, Yi L, Jin D, Li Y, et al. Dvr: micro-video recommendation optimizing watch-time-gain under duration bias. In: Proceedings of the 30th ACM International Conference on Multimedia; 2022. p. 334–345.
- 14. Deng Z, Yan M, Sang J, Xu C. Twitter is faster: Personalized time-aware video recommendation from twitter to youtube. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM). 2015;11(2):1–23.
- 15. Mei T, Yang B, Hua XS, Li S. Contextual video recommendation by multimodal relevance and user feedback. ACM Transactions on Information Systems (TOIS). 2011;29(2):1–24.
- 16. Merrill K Jr, Rubenking B. Go long or go often: Influences on binge watching frequency and duration among college students. Social Sciences. 2019;8(1):10.
- 17.
Chaney AJ, Stewart BM, Engelhardt BE. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. In: Proceedings of the 12th ACM conference on recommender systems; 2018. p. 224–232.
- 18.
Zhao X, Zhu Z, Caverlee J. Rabbit holes and taste distortion: Distribution-aware recommendation with evolving interests. In: Proceedings of the Web Conference 2021; 2021. p. 888–899.
- 19.
Zhang X, Jia H, Su H, Wang W, Xu J, Wen JR. Counterfactual reward modification for streaming recommendation with delayed feedback. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2021. p. 41–50.
- 20. Huang N, Hu R, Xiong M, Peng X, Ding H, Jia X, et al. Multi-scale interest dynamic hierarchical transformer for sequential recommendation. Neural Computing and Applications. 2022;34(19):16643–16654.
- 21. Ma J, Li G, Zhong M, Zhao X, Zhu L, Li X. LGA: latent genre aware micro-video recommendation on social media. Multimedia Tools and Applications. 2018;77:2991–3008.
- 22.
Tang L, Huang Q, Puntambekar A, Vigfusson Y, Lloyd W, Li K. Popularity prediction of facebook videos for higher quality streaming. In: USENIX Annual Technical Conference; 2017.
- 23.
Wu S, Rizoiu MA, Xie L. Beyond views: Measuring and predicting engagement in online videos. In: Proceedings of the International AAAI Conference on Web and Social Media. vol. 12; 2018.
- 24.
Covington P, Adams J, Sargin E. Deep neural networks for youtube recommendations. In: Proceedings of the 10th ACM conference on recommender systems; 2016. p. 191–198.
- 25.
Chen Q, Pei C, Lv S, Li C, Ge J, Ou W. End-to-end user behavior retrieval in click-through rateprediction model. arXiv preprint arXiv:210804468. 2021;.
- 26.
Pi Q, Zhou G, Zhang Y, Wang Z, Ren L, Fan Y, et al. Search-based user interest modeling with lifelong sequential behavior data for click-through rate prediction. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management; 2020. p. 2685–2692.
- 27.
Qu Y, Cai H, Ren K, Zhang W, Yu Y, Wen Y, et al. Product-based neural networks for user response prediction. In: 2016 IEEE 16th international conference on data mining (ICDM). IEEE; 2016. p. 1149–1154.
- 28.
Richardson M, Dominowska E, Ragno R. Predicting clicks: estimating the click-through rate for new ads. In: Proceedings of the 16th international conference on World Wide Web; 2007. p. 521–530.
- 29.
Wang P, Jiang Y, Xu C, Xie X. Overview of content-based click-through rate prediction challenge for video recommendation. In: Proceedings of the 27th ACM international conference on multimedia; 2019. p. 2593–2596.
- 30.
Zhu Q, Shyu ML, Wang H. Videotopic: Content-based video recommendation using a topic model. In: 2013 IEEE International Symposium on Multimedia. IEEE; 2013. p. 219–222.
- 31.
Saket S, Velugoti VSBR, Mehrotra R. Formulating video watch success signals for recommendations on short video platforms. In: Proceedings of the Workshop on Learning and Evaluating Recommendations with Impressions co-located with the 17th ACM Conference on Recommender Systems (RecSys 2023), Singapore. vol. 3590; 2023. p. 41–48.
- 32.
Lu Y, Huang Y, Zhang S, Han W, Chen H, Zhao Z, et al. Multi-trends Enhanced Dynamic Micro-video Recommendation. arXiv preprint arXiv:211003902. 2021;.
- 33.
Symeonidis P, Janes A, Chaltsev D, Giuliani P, Morandini D, Unterhuber A, et al. Recommending the video to watch next: An offline and online evaluation at youtv. de. In: Proceedings of the 14th ACM Conference on Recommender Systems; 2020. p. 299–308.
- 34.
Jiang H, Wang W, Wei Y, Gao Z, Wang Y, Nie L. What aspect do you like: Multi-scale time-aware user interest modeling for micro-video recommendation. In: Proceedings of the 28th ACM International conference on Multimedia; 2020. p. 3487–3495.
- 35.
Liu Y, Liu Q, Tian Y, Wang C, Niu Y, Song Y, et al. Concept-aware denoising graph neural network for micro-video recommendation. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management; 2021. p. 1099–1108.
- 36.
Xiao X, Dai H, Dong Q, Niu S, Liu Y, Liu P. Social4Rec: Distilling User Preference from Social Graph for Video Recommendation in Tencent. arXiv preprint arXiv:230209971. 2023;.
- 37.
Lei C, Liu Y, Zhang L, Wang G, Tang H, Li H, et al. Semi: A sequential multi-modal information transfer network for e-commerce micro-video recommendations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining; 2021. p. 3161–3171.
- 38. Quadrana M, Cremonesi P, Jannach D. Sequence-aware recommender systems. ACM Computing Surveys (CSUR). 2018;51(4):1–36.
- 39.
Zhang Q, Liu J, Dai Y, Qi Y, Yuan Y, Zheng K, et al. Multi-Task Fusion via Reinforcement Learning for Long-Term User Satisfaction in Recommender Systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 4510–4520.
- 40.
Cai Q, Liu S, Wang X, Zuo T, Xie W, Yang B, et al. Reinforcing User Retention in a Billion Scale Short Video Recommender System. arXiv preprint arXiv:230201724. 2023;.
- 41. Konstan JA, Riedl J. Recommender systems: from algorithms to user experience. User modeling and user-adapted interaction. 2012;22:101–123.
- 42. Lex E, Kowald D, Seitlinger P, Tran TNT, Felfernig A, Schedl M, et al. Psychology-informed recommender systems. Foundations and Trends® in Information Retrieval. 2021;15(2):134–242.
- 43.
Wang Y, Sharma M, Xu C, Badam S, Sun Q, Richardson L, et al. Surrogate for long-term user experience in recommender systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining; 2022. p. 4100–4109.
- 44.
Xue W, Cai Q, Xue Z, Sun S, Liu S, Zheng D, et al. PrefRec: Preference-based Recommender Systems for Reinforcing Long-term User Engagement. arXiv preprint arXiv:221202779. 2022;.
- 45.
Xie R, Wang Y, Wang R, Lu Y, Zou Y, Xia F, et al. Long short-term temporal meta-learning in online recommendation. In: Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining; 2022. p. 1168–1176.
- 46.
Yu F, Liu Q, Wu S, Wang L, Tan T. A dynamic recurrent model for next basket recommendation. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval; 2016. p. 729–732.
- 47.
Kang WC, McAuley J. Self-attentive sequential recommendation. In: 2018 IEEE international conference on data mining (ICDM). IEEE; 2018. p. 197–206.
- 48.
Harer J, Reale C, Chin P. Tree-transformer: A transformer-based method for correction of tree-structured data. arXiv preprint arXiv:190800449. 2019;.
- 49.
Saudi Telecom Company (STC);. https://lab.stc.com.sa/dataset/en/. Accessed on 2024-04-2.
- 50. Rahm E, Do HH, et al. Data cleaning: Problems and current approaches. IEEE Data Eng Bull. 2000;23(4):3–13.
- 51.
Chu X, Ilyas IF, Krishnan S, Wang J. Data cleaning: Overview and emerging challenges. In: Proceedings of the 2016 international conference on management of data; 2016. p. 2201–2206.
- 52.
Dasu T, Johnson T. Exploratory data mining and data cleaning. John Wiley & Sons; 2003.
- 53. Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology. 2014;14:1–13. pmid:25524443
- 54.
Vinutha H, Poornima B, Sagar B. Detection of outliers using interquartile range technique from intrusion dataset. In: Information and Decision Sciences: Proceedings of the 6th International Conference on FICTA. Springer; 2018. p. 511–518.
- 55. Luo Y, Tao D, Ramamohanarao K, Xu C, Wen Y. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE transactions on Knowledge and Data Engineering. 2015;27(11):3111–3124.
- 56. Manikandan S. Data transformation. Journal of Pharmacology and Pharmacotherapeutics. 2010;1(2):126. pmid:21350629
- 57. Lopez-Arevalo I, Aldana-Bobadilla E, Molina-Villegas A, Galeana-Zapién H, Muñiz-Sanchez V, Gausin-Valle S. A memory-efficient encoding method for processing mixed-type data on machine learning. Entropy. 2020;22(12):1391. pmid:33316972
- 58.
Ul Haq I, Gondal I, Vamplew P, Brown S. Categorical features transformation with compact one-hot encoder for fraud detection in distributed environment. In: Data Mining: 16th Australasian Conference, AusDM 2018, Bahrurst, NSW, Australia, November 28-30, 2018, Revised Selected Papers 16. Springer; 2019. p. 69–80.
- 59. Pospelov B, Rybka E, Togobytska V, Meleshchenko R, Danchenko Y, Volkov I, et al. Construction of the method for semi-adaptive threshold scaling transformation when computing recurrent plots. Eastern-European Journal of Enterprise Technologies. 2019;4(10):22–29.
- 60.
Wang G, Zhang X, Tang S, Zheng H, Zhao BY. Unsupervised clickstream clustering for user behavior analysis. In: Proceedings of the 2016 CHI conference on human factors in computing systems; 2016. p. 225–236.
- 61. Peng Y, Kondo N, Fujiura T, Suzuki T, Yoshioka H, Itoyama E, et al. Classification of multiple cattle behavior patterns using a recurrent neural network with long short-term memory and inertial measurement units. Computers and Electronics in Agriculture. 2019;157:247–253.
- 62. Charbuty B, Abdulazeez A. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends. 2021;2(01):20–28.
- 63. Ahmed AM, Rizaner A, Ulusoy AH. A novel decision tree classification based on post-pruning with Bayes minimum risk. Plos one. 2018;13(4):e0194168. pmid:29617369
- 64.
Gan G, Ma C, Wu J. Data clustering: theory, algorithms, and applications. SIAM; 2020.
- 65. Sinaga KP, Yang MS. Unsupervised K-means clustering algorithm. IEEE access. 2020;8:80716–80727.
- 66. Singh A, Yadav A, Rana A. K-means with Three different Distance Metrics. International Journal of Computer Applications. 2013;67(10).
- 67. Arora P, Varshney S, et al. Analysis of k-means and k-medoids algorithm for big data. Procedia Computer Science. 2016;78:507–512.
- 68.
Na S, Xumin L, Yong G. Research on k-means clustering algorithm: An improved k-means clustering algorithm. In: 2010 Third International Symposium on intelligent information technology and security informatics. Ieee; 2010. p. 63–67.
- 69. Ghosh S, Dubey SK. Comparative analysis of k-means and fuzzy c-means algorithms. International Journal of Advanced Computer Science and Applications. 2013;4(4).
- 70. Kodinariya TM, Makwana PR, et al. Review on determining number of Cluster in K-Means Clustering. International Journal. 2013;1(6):90–95.
- 71. Ai’t-Sahalia Y, Xiu D. Principal component analysis of high-frequency data. Journal of the American Statistical Association. 2019;114(525):287–303.
- 72. Tharwat A. Classification assessment methods. Applied Computing and Informatics. 2020;17(1):168–192.
- 73. He M, Pan W, Ming Z. BAR: Behavior-aware recommendation for sequential heterogeneous one-class collaborative filtering. Information Sciences. 2022;608:881–899.
- 74.
Fan Z, Liu Z, Zhang J, Xiong Y, Zheng L, Yu PS. Continuous-time sequential recommendation with temporal graph collaborative transformer. In: Proceedings of the 30th ACM international conference on information & knowledge management; 2021. p. 433–442.
- 75. Yuan Y, Wu L, Zhang X. Gini-impurity index analysis. IEEE Transactions on Information Forensics and Security. 2021;16:3154–3169.
- 76. Hu Q, Che X, Zhang L, Zhang D, Guo M, Yu D. Rank entropy-based decision trees for monotonic classification. IEEE Transactions on Knowledge and Data Engineering. 2011;24(11):2052–2064.
- 77.
Mohamed WNHW, Salleh MNM, Omar AH. A comparative study of reduced error pruning method in decision tree algorithms. In: 2012 IEEE International conference on control system, computing and engineering. IEEE; 2012. p. 392–397.
- 78.
Mantovani RG, Horváth T, Rossi AL, Cerri R, Junior SB, Vanschoren J, et al. Better Trees: An empirical study on hyperparameter tuning of classification decision tree induction algorithms. arXiv preprint arXiv:181202207. 2018;.