Figures
Abstract
Reranking is crucial in recommendation systems, refining candidate lists to significantly enhance the matching of user preferences and encourage engagement. While existing algorithms often focus solely on pairwise item interactions, they overlook local connections within item subsets. To address this limitation, we introduce the concept of “scenes” to explicitly mine local relationships among multiple items within a list, representing inter-scene correlations through undirected graphs. To effectively integrate these scenes and address the challenge of scoring items that cannot be definitively categorized into a single scene, we propose a scene-weighted reranking algorithm. This novel approach computes a final item score by leveraging scene-user preference matching scores, weighted by item-scene similarities. Experimental results demonstrate that compared to existing methods, our algorithm achieves more accurate item rankings that better reflect users’ true preferences, ultimately providing higher-quality recommendation sequences. This research contributes to the field by offering a more nuanced approach to capturing both local and global item relationships, specifically enhancing preference matching in personalized recommendation.
Citation: Tong K, Tan G (2025) Towards personalized recommendation with enhancing preference matching through scene-weighted reranking. PLoS One 20(11): e0333097. https://doi.org/10.1371/journal.pone.0333097
Editor: O-Joun Lee, The Catholic University of Korea, KOREA, REPUBLIC OF
Received: August 6, 2024; Accepted: September 9, 2025; Published: November 18, 2025
Copyright: © 2025 Tong, Tan. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: No data, models, and code generated or used during the study appear in the submitted article.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
Contemporary personalized recommendation systems typically encompass two stages: retrieval and personalized reranking, with the latter being crucial as it directly generates the results presented to users [1] Users expect these final recommendations to align with their interests and preferences, reflecting their user profiles. Thus, the reranking phase serves to reorganize initial retrieval results based on individual user inclinations. The reranking algorithm constructs a model considering both the relevance between user profiles and each item in the candidate list, as well as inter-item contextual relationships [2,3,4]. It optimizes a loss function incorporating pairwise influences between sample points and list-wise influences among sample clusters in the feature space, aiming to produce an optimal output sequence matching user preferences. Notably, as the original candidate list may contain items misaligned with user preferences, the optimal output sequence is typically shorter than the initial list [1,4].
Employing generative models as a foundational framework for reranking tasks presents a viable approach [5]. This methodology bifurcates the reranking process into two distinct phases: sequence generation and sequence evaluation. In the initial phase, the generative model produces a vast array of enumerated sequences based on the original candidate ranking, forming an enumeration set. Subsequently, in the evaluation phase, a pre-designed evaluator assesses each enumerated sequence, with the highest-scoring sequence selected as the optimal output for user presentation. This dual-phase framework is termed the evaluator-generator paradigm [6]. Within this paradigm, the generator’s capacity to produce diverse enumeration sets is paramount. Traditional generative models, such as GANs [7] and VAEs [8], have demonstrated limitations in this regard, impeding their efficacy in reranking tasks and hindering their application in recommendation models. Recently, diffusion models have achieved remarkable success across various computer vision domains. These models iteratively introduce noise into original samples during forward diffusion, subsequently learning to denoise and generate samples approximating the original distribution during reverse diffusion. Distinctively, diffusion models utilize Markov chains to learn noise distributions and generate data through denoising sampled data, theoretically offering significantly higher model capacity than traditional generative models [9]. Consequently, researchers have pioneered the integration of diffusion models into reranking algorithms with promising results [9]. However, the evaluator-generator paradigm necessitates the generation of sequences for evaluation in its initial phase, potentially incurring substantial computational costs when dealing with complex sequence items, such as high-resolution image data. This computational burden may significantly impact the overall efficiency of the reranking model, potentially limiting the practical applicability of the recommendation system [10].
To address the aforementioned issues, some researchers have proposed a scene-based reranking algorithm framework [11]. A scene refers to a local cluster of items in the feature space with semantically similar characteristics. For instance, when a female user is browsing for women’s jackets, a user-friendly recommendation sequence would position items like women’s dresses in close proximity to the jacket, while products such as over-ear headphones, which lack the “women’s clothing” semantic, would be placed further away. In this context, women’s jackets, dresses, and similar items form a scene in the feature space, with the local semantic representation of “clothing suitable for women.”
Generally, it is assumed that when users show interest in a specific item, such as an intention to purchase or click, they are likely to have similar intentions towards related or similar items within the same scene [12]. This implies that users tend to exhibit consistent preferences across all items within a particular scene. Researchers posit that the reranking task can be transformed into a process of identifying highly correlated scenes within the list using certain methods [13]. The semantic description of a scene can be represented by the feature vector of its key item. Subsequently, the algorithm calculates the degree of alignment between the user profile and each scene, then arranges the scenes in the list based on this alignment. The resulting sequence is then presented to the user, completing the reranking task. Notably, this approach does not compute the preference similarity between each item and the user profile. Instead, it calculates the similarity between key items in each scene and the user profile, significantly reducing computational costs and enhancing operational efficiency.
The construction of item groups within scenes is a crucial research topic, as models require representation of key items to represent entire scenes in computations. Some reseachers [1] proposed a clustering-like method to obtain scene representation vectors: first, constructing an inter-item relationship factor matrix through contrastive learning, then using the MDCA algorithm to partition the item set into multiple clusters, and finally using the average of items in the feature space within each cluster as the representation vector. Some reseachers [14] proposed using graph models, representing scenes as weighted undirected graphs, with nodes representing items and edge weights indicating items similarities. Graph neural networks are then applied to integrate scene structure and node information, yielding scene representation vectors [15].
After obtaining scene representations, the reranking process should be user profile- oriented. Existing research typically merges user profile vectors with scene vectors and inputs them into multilayer perceptrons (MLPs) to generate results [16,17] However, MLP weights heavily depend on the input order of scene vectors, with minor perturbations potentially causing significant differences. More importantly, real-world scenarios often contain items that are difficult to categorize into specific scenes, such as items in travel recommendations that simultaneously include natural and cultural attractions. This phenomenon described is commonly referred to as scene ambiguity [18]. Traditional scene-based reranking models merely consider the similarity relationship between items and their affiliated scenes, failing to explore the association strength of items with other scenes. In such cases, when an item exhibits scene ambiguity (i.e., where an item demonstrates high similarity with multiple scenes), this can significantly disrupt reranking results, thereby compromising the overall effectiveness of the model.
To address these issues, we propose a scene-weighted reranking approach to enhance preference matching. The model processes user profiles and items through separate transformer modules to generate contextual tokens, strengthening the learning of semantic correspondences between sequences and user profiles while improving training stability and robustness. Subsequently, we construct an inter-item preference similarity function to derive scene distributions from the candidate list and compute scene-user preference matching scores via dot product similarity. For each candidate item, the model calculates preference similarities with multiple scenes, performs polynomial multiplication with the scene-user matching scores to obtain weighted sums, and maps these aggregates to final scores through a mapping function. The reranking mechanism then optimizes the output sequence based on these scores.
The introduction of scene-weighted sums allows the model to perform mixed calculations between items and global scenes, rather than being limited to single-scene ranking. This effectively mitigates the efficiency degradation caused by scene ambiguity, enhancing the model’s generalization ability, adaptability and preference matching in personalized recommendation.
2. Related works
Item-level Reranking
This method reorders items based on their mutual influences. [19] and [20] proposed using RNN networks to encode item-level sequences into high-dimensional feature spaces and calculate influence factors between feature vectors. However, these methods are constrained by network architecture limitations and high computational costs for long sequence encoding. [21] introduced a self-attention mechanism to compute inter-item correlations for reranking. Similarly, [14] employed graph models to calculate item correlations, yielding favorable results. [22] further advanced this concept by utilizing Transformer modules to capture inter-item relationships. [23] addressed the information loss problem in long sequence reranking by proposing a feature augment solution.
List-level Reranking
In contrast to item-level methods, list-level methods [19,4,10,24] treat the entire list as a unit, evaluating scores between different lists and selecting the highest-scoring list order as the final reranking result. List-level reranking is generally categorized into two types:The first type [25,26,27] models the overall distribution relationships among list items and employs learnable scoring functions to evaluate lists.The second type [28,29,30] views reranking as a two-stage process: sequence generation and sequence scoring. It uses a generator to produce various sequences and incorporates an evaluation module at the model backend to assess each generated sequence, ultimately selecting the optimal sequence as the final model output.
Scene-based Reranking
The scene-based reranking algorithm was initially proposed in [31]. This research aims to address the challenge of mining local similarity distributions of semantic features among elements within recommendation lists. By pre-partitioning the recommendation lists, the approach enhances the performance and efficiency of recommendation algorithms.Building upon this algorithm, [11] employs multi-head attention to model interactions between scenes and derives high-quality embeddings of users and items through a mask pretraining mechanism, thereby achieving more stable recommendation system performance.
3. Methods
This chapter is organized as follows: Conceptualization of Scene section provides a formal conceptualization of the Scene. Acquisition of User Representations and Item Embeddings section elucidates the pre-training methodology for deriving user representations and item embeddings. Extraction of Scene feature representations section presents a schematic representation of the proposed model and delineates the process of extracting Scene feature representations. Extraction of Scene Embeddings section explicates the utilization of a multi-head attention mechanism for generating Scene embeddings. Items Reranking section articulates the computational framework for final user embeddings and item scores, which serve as the basis for reranking the candidate list to yield an optimal output sequence. The chapter concludes with Optimization section, which specifies the loss function employed during the model’s training phase.
3.1. Conceptualization of scene
This section elucidates the concept of “Scene” as employed in our research framework. We define a Scene as a collection of items forming a localized adjacency structure within the feature space, characterized by high local relative influence factors among its constituents. To operationalize and leverage these factors, we establish local relationship metrics, primarily focusing on similarity and diversity. Each Scene is anchored by a key item that encapsulates its overall characteristics, while the constituent items maintain a delicate balance of high intra-Scene similarity and inter-item distinctiveness. This formulation adheres to the dual criteria of similarity and diversity. By restructuring the original candidate list into multiple Scene-based arrangements, we facilitate a more nuanced and comprehensive analysis of inter-item associations, thereby enhancing the model’s capacity for contextual understanding and personalized reranking. Quantifying inter-item local relative influence factors requires a judicious selection of a metric function ,Wherein
and
represent arbitrary, non-identical items from the candidate set. The function
can represent any arbitrary similarity metric capable of measuring the resemblance between vectors
and
, such as cosine similarity or other distance-based measures. Consider a Scene
encompassing a pivotal item
. To ensure that the constituents of S concurrently manifest both similarity and diversity in their local relationships, the following constraints must be satisfied:
Herein, and
denote the categorical classifications of
and
, respectively, corresponding to their assigned labels.
represents a threshold constant, implemented to ensure a cohesive preference relationship among all constituents within a given scene. This parameter additionally regulates scene granularity, thereby ultimately governing diversity within the final recommendation sequence.Observable from the aforementioned constraints, condition (1) mandates that all items within a single Scene belong to distinct categories, thus preserving diversity in local relationships. Condition (2) stipulates that the similarity metric between any arbitrary pair of items must surpass a predetermined threshold, thereby maintaining the local relationship of similarity. Given a predefined upper bound n on Scene cardinality, the initial candidate set
can be decomposed into a sequence of Scenes
where each Scene adheres to the aforementioned constraints.
3.2. Acquisition of user representations and item embeddings
The acquisition of precise user representations and initial item embeddings is paramount for optimizing the efficiency of reranking models. Conventional direct initialization techniques frequently result in performance instability and diminished robustness. To mitigate these issues, we propose a pre-training methodology utilizing a Transformer encoder,drawing inspiration from advancements in Computer Vision (CV) and Natural Language Processing (NLP). This approach, demonstrating superiority over traditional architectures such as recurrent neural networks, efficiently processes variable-length sequences, thereby capturing global semantic information for both users and items. The Transformer’s inherent capability to discern contextual relationships in arbitrary-length sequences ensures the generation of robust embeddings that encapsulate holistic semantic knowledge. Consequently, this methodology significantly enhances the overall performance and stability of the reranking process. The architectural framework of this pre-training model is delineated in Fig 1, providing a visual representation of our proposed approach.
Given a set of initial candidate sequences , a distinctive token
is affixed to the anterior of each sequence, serving as a categorical identifier for the entire sequence.For an arbitrary element
, a transformation is required to convert its item feature vectors into corresponding embedding
:
In this context, the function denotes the token embedding layer within the transformer architecture. Consequently, the candidate sequence
can be represented as a set of embeddings
. These embedding are then propagated through the Transformer encoder layers to derive the final embedding for each constituent item:
Leveraging the delineated methodology, we derive the definitive item embedding vectors. With respect to user embeddings, exploiting the transformer’s capability to process sequences of arbitrary length, we conceptualize the user as a singular instance of the original candidate sequence , where
. Applying the aforementioned algorithmic process to this construct yields the initial user embedding
and the final item embedding
.
3.3. Extraction of scene feature representations
Fig 2 provides a schematic representation of the holistic architectural framework of our proposed model.Upon acquiring the embedding vectors for all items within the initial candidate list, the model’s principal objective is to perform scene-based segmentation of the candidate list and derive the corresponding scene embeddings. Given a sequence of scene features that adheres to constraints (1) and (2), we utilize a Graph Convolutional Network (GCN) [11] to learn the feature representation for each scene.GCN are specifically designed to process graph-structured input data, it is imperative to transform the original list structure of our sequence into an equivalent graph representation to facilitate the application of GCN to our task. First, following the formal definition of scenes established in Conceptualization of Scene section, the entire item set undergoes scene partitioning to generate a structured sequence of scenes
. For an arbitrary Scene
, in accordance with the mathematical formulation proposed in Conceptualization of Scene section, we construct its corresponding graph structure G using the following formalism:
The parameter herein is congruent with
in constraint (2). This methodology guarantees that exclusively highly correlated items are incorporated into an identical graph structure G. Moreover, given that scenes must preserve inter-item similarity and be characterized by key item representations to delineate the scene’s distribution in feature space, our approach mandates the construction of an adjacency matrix. This matrix serves to quantify the influence between adjacent nodes within the graph structure:
Where denotes the identity matrix, and
represents a normalization function that applies L1 regularization to each row vector of the input matrix.Subsequently, given a scene
containing a key item
, we can derive the scene’s representation
according to the following formula:
Where denotes the column vector in the adjacency matrix
corresponding to the key item
,
represents the concatenation matrix of embedding vectors for all items in scene
, with item embedding having been obtained during the pre-training phase.
denotes the dimension of the embedding, and
represents a learnable parameter matrix. By applying this method to process the candidate list sequence, we can automatically learn the representation for each scene during the training phase.
3.4. Extraction of scene embeddings
To mitigate the challenge of output volatility stemming from varied input sequences of scene representation, we implement a multi-head attention mechanism. This approach facilitates the learning of global inter-sample influence factors, consequently yielding more cohesive scene embeddings.Consider a scene representation matrix , which comprises the concatenation of all scene representation
from the initial candidate list, where
denotes the dimensionality of each scene representation. The attention values for this scene representation matrix can be computed using the following formulation:
Where ,
and
represent learnable parameter matrices, which are integral to the generation of queries, keys, and values, respectively, within the transformer architecture. The function
serves to map the computational output to the unit interval. This methodology enables the automatic derivation of inter-scene influence factors, manifested as attention values.
We can now derive a novel scene embedding matrix via the multi-head attention mechanism, transforming the initial scene representation matrix and incorporating h distinct mutual influence factors:
Where [•, •] denotes the concatenation operator, representing the operation of con- catenating all elements within the parentheses along the row dimension. represents the number of input layer nodes in the multi-head attention mechanism.
signifies the learned parameter matrix in the i-th attention module. Finally, to enhance the quality of scene embedding representations, the computational results are processed through a feed-forward neural network after the attention module:
Let denote the set of learnable parameters employed in the feed-forward neural network, where
,
are weight matrices and
,
are bias vectors.
represents the dimensionality of the feed-forward network’s hidden layer. Through iterative application of multi-head attention modules and feed-forward networks, we derive high-fidelity scene embeddings
. These refined scene embeddings are then integrated with user representation embeddings to compute the terminal scores for individual items.
3.5. Items reranking
To maximize the congruence between the final ranking and user preferences, we compute a preference score for each item within the candidate list. Subsequently, these items are arranged in descending order of their computed scores, yielding a partially ordered sequence that is then presented to the user.For an arbitrary user , we utilize a multilayer perceptron to transform the user’s initial representation
into a deep embedding vector
.
Let denote the set of learnable parameters of the multilayer perceptron, where
and
represent the weight matrix and bias vector for the i-th layer, respectively. This multilayer perceptron facilitates the mapping of the user’s initial embedding vector to a feature space that is isomorphic to that of the scene embedding vectors. Consequently, we can compute preference scores independently between the user features and any scene within the candidate list:
Where denotes the vector inner product operation, and
represents the i-th scene representation in the scene embedding
. Subsequently, for any item
in the candidate list, we compute its similarity matching score with all scenes:
Let denote the key item in the i-th scene. We then derive the final score for item
by calculating a polynomial weighted sum of the user-scene preference score
and the item-scene matching score
:
The unbounded nature of ’s range presents challenges for optimization using conventional loss functions. Consequently, rather than directly utilizing
as the terminal score, we have implemented a refinement to augment its scalability:
Where represents a function that maps the input parameters to the interval [0,1]. For all items in the original candidate list, we compute their corresponding final scores according to equation (16). Once we have obtained the final scores for all items, we can rank the original candidate list based on these scores, thereby deriving the optimal output sequence and completing the reranking of the original candidate list.
3.6. Optimization
The optimization objectives for reranking models primarily encompass two frameworks: list-wise and point-wise approaches. List-wise functions, predominantly employed in generative reranking models, evaluate the quality of stochastically generated reranking sequences to compute model loss. However, this method necessitates exhaustive enumeration of all possible permutations, presenting significant computational challenges. Conversely, point-wise functions directly assess individual items within the original candidate list, circumventing the need for comprehensive permutation evaluation. This approach substantially reduces computational overhead and enhances model efficiency. Given these considerations, this study adopts a point-wise optimization function as the model’s objective function, leveraging its computational efficacy and scalability in the context of our reranking task.
For a given user , we employ the mean squared error loss function to quantify the discrepancy between the model’s predicted scores and the item label values. Consider an arbitrary item
from the original candidate list, a given scene list
, and the item’s final score
. The item label value is computed as follows:
Let denote the key item in the i-th scene,
represent the final score of the key item in the i-th scene, and
be a function mapping input parameters to the unit interval. The model’s overall loss function is then defined as:
Let q denote the cardinality of the original candidate set, while and
represent the label value and final score for item
, computed using equations (18) and (16) respectively.
4. Experiments
4.1. Dataset and evalution metric
This study utilizes three public datasets: MovieLens-1M, MovieLens-20M [32], and the TripAdvisor dataset [33] for comparative experiments between our proposed model and other benchmark models. Following the methodology in [31], samples with user ratings exceeding 3 are designated as positive, while others are considered negative. For each dataset, we limit positive samples per user to approximately 15, subsequently ordering them chronologically by timestamp. The initial 80% of positive samples constitute the training set, with the remaining 20% forming the test set. During training, all positive samples are selected as key items, with highly correlated but non-positive samples randomly chosen to construct scenes. This rigorous data preparation ensures a balanced and temporally consistent evaluation framework, facilitating robust model comparison and performance assessment across diverse recommendation scenarios. To evaluate our proposed model performance, we employ four metrics for comparative analysis between our proposed model and benchmark models: NDCG@N, Precision@N, Recall@N, and SR@N. The first three metrics—NDCG@N, Precision@N, and Recall@N—are widely adopted for assessing the quality of output sequences in recommender systems, providing comprehensive insights into ranking accuracy and relevance. SR@N, in contrast, serves as a measure of local inter-item relationships within the list, offering a unique perspective on the model’s ability to capture item-item interactions. This diverse set of metrics ensures a multifaceted evaluation of model performance, encompassing both traditional recommendation quality indicators and more nuanced measures of item relationship modeling.
Following the definition proposed in [11], given a recommendation sequence , we construct multiple scenes from it. Let
be the key item, with i initially set to 1. We then find the maximum value K such that
is a scene with
as the key item, where K can be 0. Then, we set i to i + k + 1 and repeat the above steps until the last item
in the recommendation sequence is included in the last scene. Let PS represent the number of scenes containing more than one item, and NS denote the number of scenes containing only one item. The SR@N is then defined as follows:
4.2. Implementation details
In accordance with the parameter optimization framework delineated in [11], we conduct an extensive hyperparameter search to determine the optimal configuration for our model. The resulting parameter settings, derived through rigorous experimentation and validation, are as follows:、
、
、
、
and
. In our experimental framework, the parameter N is uniformly set to
across all evaluation metrics. To ensure statistical robustness and representativeness, we compute the mean performance of the model on the test set for each metric.
4.3. Benchmark model
This section introduces the benchmark models used for comparison with our proposed model in the experiments. These benchmark models are widely employed in various recommender systems’ reranking modules:
- PRM [21]: This model utilizes a Transformer module to capture global semantic relationships between any pair of items in the list. It encodes related items into embedding vectors containing these semantic relationships. Subsequently, it concatenates user and item embeddings, feeding them into a neural network to obtain final scores. The items in the list are then ranked based on these scores to form the optimal output sequence.
- SRR [31]: This model pioneered the application of the scene concept in reranking. It divides the candidate list into multiple scenes using a clustering-like method. It then obtains feature representations for each scene through a graph neural network. These representations are fed into a multi-head attention module to compute embedding vectors for each scene. The model then calculates the matching degree between each scene embedding vector and the user, deriving scores for each scene. Finally, all scenes are ranked according to their scores, with items within each scene undergoing a secondary ranking based on their order of inclusion, resulting in the optimal output sequence.
- SRR-MP [11]: This model’s primary operational process is identical to SRR. However, before dividing the candidate list into scenes, it obtains embedding vectors for all items in the list through masked pre-training. It then divides scenes based on item embedding vectors rather than item feature representations. Compared to SRR, SRR with MR demonstrates more robust performance and stable operation across various recommendation tasks.
- PEAR [22]: This model divides the reranking task into two parts: 1) Learning intrinsic feature-level relationships between users and items. 2) Learning intrinsic instance- level relationships between items, considering user historical behavior as a reference. In the feature-level intrinsic relationship learning, user profile representations, initial list item representations, and user historical behavior items are directly merged and fed into a feature association module to mine intrinsic connections between list items, forming a set of vectors containing contextual semantic representations of the list. In the instance-level learning part, the model uses user historical behavior as guidance for the reranking process, employing multi-layer self-attention modules and cross- attention merge modules to calculate each item’s potential attractiveness to the user, finally ranking based on attractiveness.
- DCDR [9]: This model employs a generator-evaluator paradigm, applying diffusion models to reranking tasks. The overall model consists of two parts: 1) Forward separation process, 2) Backward reference process. In the forward separation process, noise is added to all samples in the initial sequence through a separation operator with a processable backend. In the backward reference process, a denoising model learns the original sequence’s feature space to generate relevant candidate sequences. Finally, a backward feedback mechanism is introduced to search for the highest-scoring sequence to recommend to the user.
- PODM-MI [34]: This model simultaneously learns from the initial item sequence and user profile. For the user profile processing branch, the model uses a variational inference-based multidimensional Gaussian distribution to capture users’ diverse uncertain preferences. In the sequence processing branch, it employs a backbone network to obtain users’ deterministic preferences and item feature embeddings. Finally, the model maximizes the mutual information between users’ diverse preferences and deterministic preferences, constructing a relationship matrix to guide item reranking.
4.4. Comparisions with state-of-the-art
The evaluation results of our proposed model in comparison with other benchmark models on the MovieLens-1M, MovieLens-20M, and TripAdvisor datasets are presented in Tables 1, 2, and 3, respectively. These results consistently demonstrate the superior performance of our proposed method in reranking tasks across all datasets.
In the MovieLens-1M datasets experiments, our model achieves an NDCG@15 of 0.1877, outperforming the best benchmark model, PODM-MI, by 1.9%. The P@15 metric reaches 0.1316, a 3.2% improvement over PODM-MI, while R@15 attains 0.2306, representing a 1.5% increase compared to PODM-MI.
For the MovieLens-20M datasets, PODM-MI again emerges as the top-performing benchmark. Our model surpasses PODM-MI by 3.02%, 3.5%, and 4% in NDCG@15, P@15, and R@15 metrics, respectively, demonstrating consistent improvement across all evaluation criteria.
In the TripAdvisor datasets experiments, PODM-MI maintains its position as the best-performing benchmark. Our proposed model exhibits improvements of 1.5% in NDCG@15, 2.6% in P@15, and 1.1% in R@15 compared to PODM-MI, further validating its effectiveness across diverse datasets.
Regarding the SR@N metric, PRM, PEAR, DCDR, and PODM-MI models show relatively poor performance across all three datasets. The highest SR@15 value among these models is achieved by PODM-MI on MovieLens-20M, reaching only 0.0854. In contrast, SRR, SRR-MP, and our proposed model demonstrate robust performance across all datasets, with most values falling within the [0.5, 0.6] range. Specifically for the SR@15 metric, SRR-MP emerges as the best-performing benchmark model. Our model surpasses SRR-MP by 2.65%, 3.31%, and 3.17% on the three datasets respectively, indicating substantial improvements in capturing local item relationships.
- According to the definition of the SR@N metric, a higher number of scenes with item cardinality greater than one results in a higher SR@N value. The poor performance of PRM, PEAR, DCDR, and PODM-MI models on this metric can be attributed to their approach in processing item interactions and relationships.The superior performance of SRR, SRR-ML, and our proposed method on the SR@N metric can be attributed to their focus on sample point clustering phenomena in the feature space during the embedding stage. These approaches delineate scenes comprising multiple highly correlated items, deriving embedding vectors for subsequent processing. In contrast, PRM, PEAR, DCDR, and PODM-MI, despite capturing inter-item interactions, primarily process individual items during feature embedding, overlooking localized clustering structures. This results in scenes predominantly composed of single items and a more dispersed distribution. The ability of SRR, SRR-ML, and our method to explicitly model localized structures leads to a higher number of scenes with item cardinality exceeding one, thus outperforming the other models on the SR@N metric. This observation underscores the importance of incorporating spatial awareness and cluster-based processing in recommendation systems, particularly for tasks requiring a nuanced understanding of intricate item relationships.
- Comparative analysis reveals our proposed method’s most significant improvement over the PRM model across all metrics. This enhancement is primarily attributed to our comprehensive modeling approach. While PRM exclusively focuses on global semantic relationships within the list, neglecting localized inter-item influences, our method incorporates both global semantics and local item correlations through the concept of scenes. By integrating this dual-perspective approach, our model demonstrates a more nuanced understanding of item relationships, leading to more effective reranking decisions. The incorporation of scene-based modeling enables the capture of subtle, localized patterns often overlooked in purely global approaches. This marked superiority underscores the importance of considering multi-scale relationships in recommendation systems, suggesting that future advancements in reranking algorithms could benefit from balancing global semantic understanding with granular item relationships for more accurate and context-aware recommendations.
- Among PEAR, DCDR, and PODM-MI models, PEAR exhibits the largest performance gap compared to our proposed method, while still significantly outperforming the PRM model. This performance hierarchy can be attributed to the varying approaches in modeling local item relationships. PEAR explicitly models local item interactions but overlooks the localized clustering structures (scenes) in the feature space, potentially leading to suboptimal item placements in the final reranking. In contrast, DCDR and PODM-MI, while also not explicitly focusing on localized structures, compensate for this limitation through distinct methodologies. PODM-MI employs variational inference to adaptively optimize each item’s optimal recommendation position, while DCDR leverages a noise-addition and denoising process to directly learn user-preference-aligned item feature distributions, guiding subsequent reranking. Our method’s superior performance underscores the importance of comprehensively modeling both global semantics and localized item clusters, suggesting that future advancements in reranking algorithms could benefit from integrating these multi-faceted approaches to capture complex item relationships more effectively.
- Among models that incorporate scene, this model demonstrates superior performance. While SRR and SRR-ML achieve noteworthy results by modeling the local clustering structure of items, they only consider item-scene similarity in the final scoring phase, neglecting relevance to other scenes. Moreover, their ranking approach, which directly determines item position based on scene ranking, leads to suboptimal placement of items with scene ambiguity but high user preference, resulting in overall efficiency decline. In contrast, this model addresses items with scene ambiguity by calculating item-scene relevance across all scenes and employing a weighted sum with scene-user matching scores. This approach enables the model to consider item ranking from a global perspective encompassing all scenes present in the list, rather than limiting focus to an item’s primary scene. Consequently, items with scene ambiguity are positioned more accurately within the ranking, enhancing the model’s overall precision.
4.5. Ablation experiments
This section evaluates the impact of the newly proposed components – the feature embedding layer and the item-scene similarity function – on model performance. Specifically, the following variant models are constructed:
- W/O FE: The feature embedding layer proposed in Acquisition of User Representations and Item Embeddings section is removed from the original model and replaced with an identity mapping.
- W/O SSE: The weighted scene scoring function introduced in Items Reranking section is eliminated, with all similarity matching scores set to a uniform weight of 1.
The experimental parameters align with those described in Implementation details section. Ablation studies are conducted on the MovieLens-1M, MovieLens-20M, and TripAdvisor datasets. Evaluation metrics include NDCG@N, Precision@N, Recall@N, and SR@N(where SR@N follows the definition in Dataset and Evalution Metric section, with N set to N = {5, 10, 15}).Results are presented in Tables 4–6.
According to the experimental data in the table, the variant model with some modules removed showed varying degrees of decline in various indicators in the ablation comparison experiments on different datasets. According to Table 4, the W/O FE model performs well in the MovieLens-1M dataset NDCG@15 The decrease from 0.1877 to 0.1698 is 9.5%; P @ 15 decreased from 0.1316 to 0.1143, with a decrease of 13.1%. According to Table 5, in the MovieLens-20M dataset, NDCG@15 decreased from 0.2588 to 0.2337, with a decrease of 9.7%; P @ 15 decreased from 0.1801 to 0.1664, with a decrease of 7.6%. According to Table 6, in the TripAdvisor dataset NDCG@15 The decrease from 0.3036 to 0.2857 is 5.8%; P@15 decreased from 0.1088 to 0.0993, with a decrease of 8.7%. Through experimental data analysis, it can be concluded that removing the feature embedding layer from the original model will have a certain degree of impact on various performance indicators of the model. One possible reason is that the feature embedding layer can grasp the contextual connections between tokens in the input sequence, better evaluate the impact of each token on the deep semantics of other tokens, and thus form a feature embedding that can preserve the original feature distribution more completely and be more robust.
For the W/O SSE model, the decline in various evaluation indicators cannot be ignored, such as in the MovieLens-1M dataset NDCG@15 The decrease from 0.1877 to 0.1646 is 12.3%; P@15 decreased from 0.1316 to 0.1056, with a decrease of 19.7%. In the MovieLens-20M dataset, NDCG@15 decreased from 0.2588 to 0.2281, with a decrease of 11.9%; P@15 decreased from 0.1801 to 0.1579, with a decrease of 12.3%. In the TripAdvisor dataset NDCG@15 The decrease from 0.3036 to 0.2791 is 8.1%; P@15 decreased from 0.1088 to 0.0946, with a decrease of 13.0%. Based on the analysis of the experimental data above, it can be concluded that the weighted scenario scoring formula has the greatest impact on model performance. This may be due to the fact that weighted scenario scoring can finely calculate the compatibility between each item in the candidate list and the user’s interests and preferences. Losing weighted scenario scoring will have a certain impact on the rating of each item in the list for scenarios that are not related to the user’s interests and preferences, resulting in higher ratings for items that should not match the user’s interests and preferences, leading to a decrease in recommendation performance.
5. Conclusions
We propose a scene-weighted reranking framework to enhance preference matching in personalized recommendations by addressing local item relevance. It defines “scene” as clusters of semantically related items, establishing them as fundamental reranking units. The model partitions candidate lists into scenes, employs graph neural networks for scene representation learning, and utilizes multi-head attention to generate high-fidelity scene embeddings. To resolve efficiency degradation from item-scene ambiguity, we design a scene-weighted scoring mechanism that computes item scores by: (1) measuring similarities between target items and key items across scenes, and (2) weighting these against scene-user preference matching scores. This dual-perspective approach ensures accurate item positioning in final sequences while comprehensively reflecting user preferences. By integrating advanced neural architectures, our framework captures both local scene relationships and global preference alignment. Experiments show significant improvements over benchmarks across multiple metrics, validating its capability to deliver precise, preference-aware recommendations.
References
- 1. Han P, Shang S, Sun A, Zhao P, Zheng K, Zhang X. Point-of-Interest Recommendation With Global and Local Context. IEEE Trans Knowl Data Eng. 2022;34(11):5484–95.
- 2.
Han P, et al. Contextualized point-of-interest recommendation. In: International Joint Conferences on Artificial Intelligence, 2020.
- 3.
Xi Y, et al. Context-aware reranking with utility maximization for recommendation. In: arXiv preprint, 2021.
- 4. Xi Y, Liu W, Zhu J, Zhao X, Dai X, Tang R, et al. Multi-Level Interaction Reranking with User Behavior History. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022. 1336–46.
- 5.
Nogueira R, Cho K. Passage re-ranking with BERT. In: arXiv preprint, 2019.
- 6. Cheng H-T, Koc L, Harmsen J, Shaked T, Chandra T, Aradhye H, et al. Wide & Deep Learning for Recommender Systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, 2016. 7–10.
- 7.
Goodfellow I, et al. Generative adversarial nets. Advances in neural information processing systems. 2014.
- 8.
Kingma DP, Welling M. Auto-encoding variational bayes. In: arXiv preprint, 2013.
- 9. Lin X, Chen X, Wang C, Shu H, Song L, Li B, et al. Discrete Conditional Diffusion for Reranking in Recommendation. In: Companion Proceedings of the ACM Web Conference 2024, 2024. 161–9.
- 10. Xu S, Pang L, Xu J, Shen H, Cheng X. List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation. In: Proceedings of the ACM Web Conference 2024, 2024. 1330–40.
- 11. Han P, Zhou S, Yu J, Xu Z, Chen L, Shang S. Personalized Re-ranking for Recommendation with Mask Pretraining. Data Sci Eng. 2023;8(4):357–67.
- 12. Chen X, Liu D, Zha Z-J, Zhou W, Xiong Z, Li Y. Temporal Hierarchical Attention at Category- and Item-Level for Micro-Video Click-Through Prediction. In: Proceedings of the 26th ACM international conference on Multimedia, 2018. 1146–53.
- 13.
Cao Z, et al. Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine Learning, 2007. 129–36.
- 14. Liu W, Liu Q, Tang R, Chen J, He X, Heng PA. Personalized Re-ranking with Item Relationships for E-commerce. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020. 925–34.
- 15. Pang X, Wang Y, Fan S, Chen L, Shang S, Han P. EmpMFF: A Multi-factor Sequence Fusion Framework for Empathetic Response Generation. In: Proceedings of the ACM Web Conference 2023, 2023. 1754–64.
- 16. Yin D, Hu Y, Tang J, Daly T, Zhou M, Ouyang H, et al. Ranking Relevance in Yahoo Search. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016. 323–32.
- 17. Zhu S, Yang L, Chen C, Shah M, Shen X, Wang H. $R^{2}$ Former: Unified Retrieval and Reranking Transformer for Place Recognition. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 19370–80.
- 18.
Cheng HT, Koc L, Harmsen J. Wide & deep learning for recommender systems. In: Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 2016. 7–10.
- 19. Ai Q, Bi K, Guo J, Croft WB. Learning a Deep Listwise Context Model for Ranking Refinement. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018. 135–44.
- 20.
Zhuang T, Ou W, Wang Z. Globally optimized mutual influence aware ranking in e-commerce search. arXiv preprint. 2018.
- 21. Pei C, Zhang Y, Zhang Y, Sun F, Lin X, Sun H, et al. Personalized re-ranking for recommendation. In: Proceedings of the 13th ACM Conference on Recommender Systems, 2019. 3–11.
- 22. Li Y, Zhu J, Liu W, Su L, Cai G, Zhang Q, et al. PEAR: Personalized Re-ranking with Contextualized Transformer for Recommendation. In: Companion Proceedings of the Web Conference 2022, 2022. 62–6.
- 23. Qiu Z-H, Jian Y-C, Chen Q-G, Zhang L. Learning to Augment Imbalanced Data for Re-ranking Models. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021. 1478–87.
- 24. Yang C, Chen L, Wang H, Shang S, Mao R, Zhang X. Dynamic Set Similarity Join: An Update Log Based Approach. IEEE Trans Knowl Data Eng. 2023;35(4):3727–41.
- 25. Li F, Qu H, Fu M, Zhang L, Zhang F, Chen W, et al. Neural Reranking-Based Collaborative Filtering by Leveraging Listwise Relative Ranking Information. IEEE Trans Syst Man Cybern, Syst. 2023;53(2):882–96.
- 26. Liu Q i, et al. Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models. In: arXiv preprint, 2024.
- 27. Yoon S o y o u n g, et al. ListT5: Listwise reranking with fusion-in-decoder improves zero-shot retrieval. In: arXiv preprint, 2024.
- 28. Gangi Reddy R, et al. First: Faster improved listwise reranking with single token decoding. In: arXiv preprint, 2024.
- 29. Xu S, Pang L, Xu J, Shen H, Cheng X. List-aware Reranking-Truncation Joint Model for Search and Retrieval-augmented Generation. In: Proceedings of the ACM Web Conference 2024, 2024. 1330–40.
- 30.
Zhang Y, et al. HLATR: Enhance Multi-Stage Text Retrieval with Hybrid List Aware Transformer Reranking. In: arXiv preprint, 2022.
- 31. Han P, Shang S. Scene Re-ranking for Recommendation. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), 2022. 1–6.
- 32.
Harper FM, Konstan JA. The movielens datasets: history and context. ACM Transactions on Interactive Intelligent Systems. 2015. 1–19.
- 33. Alenezi T, Hirtle S. Normalized Attraction Travel Personality Representation for Improving Travel Recommender Systems. IEEE Access. 2022;10:56493–503.
- 34. Wang H, Li M, Miao D, Wang S, Tang G, Liu L, et al. A Preference-oriented Diversity Model Based on Mutual-information in Re-ranking for E-commerce Search. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024. 2895–9.