Exchanging registered users’ submitting reviews towards trajectory privacy preservation for review services in Location-Based Social Networks

In Location-Based Social Networks (LBSNs), registered users submit their reviews for visited point-of-interests (POIs) to the system providers (SPs). The SPs anonymously publish submitted reviews to build reputations for POIs. Unfortunately, the user profile and trajectory contained in reviews can be easily obtained by adversaries who SPs has compromised with. Even worse, existing techniques, such as cryptography and generalization, etc., are infeasible due to the necessity of public publication of reviews and the facticity of reviews. Inspired by pseudonym techniques, we propose an approach to exchanging reviews before users submit reviews to SPs. In our approach, we introduce two attacks, namely review-based location correlation attack (RLCA) and semantic-based long-term statistical attack (SLSA). RLCA can be exploited to link the real user by reconstructing the trajectory, and SLSA can be launched to establish a connection between locations and users through the difference of semantic frequency. To resist RLCA, we design a method named User Selection to Resist RLCA (USR-RLCA) to exchange reviews. We propose a metric to measure the correlation between a user and a trajectory. Based on the metric, USR-RLCA can select reviews resisting RLCA to exchange by suppressing the number of locations on each reconstructed trajectory below the correlation. However, USR-RLCA fails to resist SLSA because of ignoring the essential semantics. Hence, we design an enhanced USR-RLCA named User Selection to Resist SLSA (USR-SLSA). We first propose a metric to measure the indistinguishability of locations concerning the difference of semantic frequency in a long term. Then, USR-SLSA can select reviews resisting SLSA to exchange by allowing two reviews whose indistinguishability is below the probability difference after the exchange to be exchanged. Evaluation results verify the effectiveness of our approach in terms of privacy and utility.


Introduction
Recently, Location-Based Social Networks (LBSNs) [1] have become the dominant way people share information with others in our daily life, due to the rapid development of online social networks and the Location-Based Service. As an important component of an LBSN, Local Business Service Systems(LBSSs), such as Yelp, Tripadvisor, Dianping, etc., provide a review service [2]. In these systems, a registered user publishes a review each time she visits a pointof-interest (POI) and enjoys the service provided by the POI. Here, a POI is a business or shop registered in an LBSS. Note that the term 'user' in this paper refers to the registered user. Users hope to build reputations for POIs by publishing reviews. For example, a user altruistically publishes a good review since she has enjoyed a good service of a restaurant so that more people can enjoy the service. In general, the review has the following features: • A POI and its location (geographic coordinates) correspond one-to-one. As we know, in an LBSS, although a POI is an area containing many geographic coordinates, the LBSS only selects a constant one as the location of the POI. Therefore, a review for a POI is equivalent to a review for the location.
• Facticity. Users are altruistic. They hope to build objective reputations for POIs by publishing reviews and do not mind who published reviews. Thus, all reviews are not fabricated and are published by the registered users who have enjoyed the service provided by a POI.
• Real time. Generally, most users publish reviews in a short time after visiting POIs and enjoying services. That is, visiting and publishing reviews for them are at the same time.
• Historicity. The POIs reviewed by a user during a period of time form a time-dependent trajectory. Generally, a user repeatedly and sequentially reviews some POIs, since she has a consistent lifestyle [3]. For example, she goes to a fixed restaurant for lunch at 11:00 am and a fixed cinema at 17:00 every day.
After enjoying the service of a POI, a user publishes her review through a two-step process. In the first step, a user submits her reviews to the LBSS server. In the second step, the LBSS server anonymizes and publishes the reviews while storing the reviews. We name the two steps as submit reviews (SR) and publish reviews (PR), separately. For example, Alice's name is anonymous as A � . The typical LBSS architecture is shown as in Fig 1. In the LBSS server, the user registration information, such as cell phone number and ID card, and reviews content are unencrypted and not anonymous. That is, for anonymously published reviews, users' identities are not anonymous to the SPs. After compromised with the SPs, the adversary can obtain user registration information and reviews, and then easily correlate the user identity.
Currently, privacy leakage, especially the trajectory privacy, has become one of the important challenges users face when using review services in LBSSs. For the one hand, because of frequently than others in a long term. The frequency difference of semantics can be utilized to establish a connection between locations and users. In this paper, we call this attack semanticbased long-term statistical attack (SLSA). As we show, SLSA can identify a user's real location among the exchanged locations with a high probability.
Basing on the above analysis, we present an approach to resist the above attacks. The approach contains two methods named User Selection to Resist RLCA (USR-RLCA) and User Selection to Resist SLSA (USR-SLSA). As we know, the adversary links a user and her trajectories by analyzing correlation. We use USR-RLCA to resist RLCA. In USR-RLCA, we propose a metric to measure the correlation between them. Based on the metric, we suppress the number of locations on each reconstructed trajectory below a threshold. Compared with existing methods, USR-RLCA can significantly prevent a user and her trajectories from being linked. Yet, due to ignoring the essential semantics, USR-RLCA fails to protect locations from being indistinguishable against SLSA. Hence, we propose USR-SLSA to solve this problem. In USR-SLSA, we first propose a metric that measures how indistinguishable different locations are concerning the frequency difference of semantics in a long term. Then, we select some reviews as a group for each review that will send to the SPs. In the group, users exchange their reviews based on the above metric. Two reviews are allowed to exchange if the probability difference of their semantics is below a threshold after the exchange. It ensures that this method can resist SLSA. We conduct experiments to evaluate the effectiveness of our approach in terms of privacy and utility. Results show that our approach can preserve users' privacy against RLCA and SLSA and outperform existing methods.
Besides user privacy, another issue that needs to be considered is whether our approach excessively reduces user reviews that can be published publicly, i.e., the user utility. The existing technique [2] to protect user privacy mainly limits the reviews that are publicly released. Different from it, our approach submits only reviews that do not reveal the trajectory privacy to the SPs, and does not focus on how the SPs publish the reviews. To evaluate the user utility, we use (�,δ)-public principle [2] to publish reviews and use the ratio of the public reviews to measure it. We show that even though our approach submits fewer reviews to the SPs than [2], it does not reduce the user utility. The reason is that (�,δ)-public principle would allow a higher ratio of reviews to be published if the SPs receive fewer reviews.
In summary, the major contributions of this paper are as follows: • We propose a mechanism to preserve users' trajectory privacy in the PR scenario and retain user reviews. To the best of our knowledge, this is the first paper to investigate how to protect the privacy in the scenario, which users' identities and reviews are unencrypted and not anonymous to the adversary.
• According to the consistent lifestyle of a human, we introduce SLSA, which can exploit the frequency difference of semantics to establish a connection between locations and users. We also propose two methods to resist RLCA and SLSA.
• We propose two metrics that measure the correlation between a user and a trajectory and the indistinguishability of locations concerning the difference of semantic frequency in a long term, separately. Using them, we design USR-RLCA to suppress the number of locations on each reconstructed trajectory below a threshold, and USR-SLSA to ensure that the probability difference of semantics is below a threshold after users' reviews exchanging.
• The effectiveness of our methods in terms of the privacy and utility is verified on a real dataset. Results show that our methods can preserve users' privacy against RLCA and SLSA and outperform existing methods in terms of the utility.

Related work
With growing concerns for privacy arising from prevalent LBSNs, many approaches have been proposed to protect user trajectory privacy. According to the way of protecting trajectory privacy, these approaches can be divided into four categories: cryptography, generalization, suppression and pseudonyms. Cryptography mainly encrypt and make user privacy information invisible to the adversary [10,11]. However, in the LBSSs, users' trajectories are visible for the adversary, since the reviews are public. To overcome the vulnerability, cryptography can encrypt users' other privacy information. In [12], a user's pseudo-ID is encrypted by using a symmetric encryption algorithm. In [7], the communication between the user and the LBS server is encrypted. Nevertheless, the adversary can still infer users' real identities by analyzing the spatial-temporal correlation between locations in trajectories [9,13].
Generalization protects user trajectory privacy by hiding the user's actual trajectory (identity) among other users' trajectories (identities). The k-anonymity is one of the most widely used methods of generalization, which includes two primary approaches of Dummy Trajectory and Historical Trajectory. In Dummy Trajectory, for each location, methods [14][15][16][17][18] fabricated k − 1 locations to send to the LBS server. For example, method [17] randomly selects k − 1 dummy locations near the real location; method [18] rotates the real trajectory by an angle as a fake trajectory. In fact, the fabricated locations in [18] may not have been visited by anyone. Hence, using Dummy Trajectory, we can still review on unvisited locations. This violates the facticity of the reviews. In Historical Trajectory [19][20][21][22], for each location, methods do not fabricate, but sample k − 1 visited locations from historical data. For example, methods [19,20] sample k − 1 complete trajectories to achieve k-anonymity; method [21] samples k − 1 trajectories and extends k locations sampled from k different trajectories into the same cloaking area; method [22] samples segments of trajectories to combine k − 1 trajectories. But, like the cryptography, Historical Trajectory is not workable to solve our problem, since sampling trajectories contain the spatial-temporal correlation of locations.
In the suppression, methods [19,21,23,24] can make users' trajectories undistinguishable from the adversary by suppressing each user's personalized locations different from others. These methods are used to limit the published trajectory data. For example, the method [24] suppresses the sensitive or often visited locations in the trajectories; method [21] extends k personalized locations in k different trajectories into a cloaking area. A few works have studied users' privacy under conditions of non-personalized locations. The authors [19] limit the locations that are used to reconstruct the user trajectories to send to the LBS server. In particular, paper [2] set the threshold that the number of the user public reviews, so that the user reviews for a POI cannot all be published. However, suppressing too many locations or reviews will reduces the user utility.
The existing works [7,9,12] are based on pseudonyms. In [7], a user stores several user names and selects one of them as the current user name requesting LBS service at each query. In [9], each user has a pseudo name. If necessary, after consultation, all users replace their old pseudonyms with new pseudonyms while simultaneously restarting to use the LBS. Method [12] protects user identities from the attacker's recognition through exchanging their identities. Although pseudonym is infeasible for our problem due to the facticity of review, inspired by [12], we exchange reviews of different users to break the spatial-temporal correlation of locations.

System model and basic boncepts
Assume an LBSS has M POIs denoted as {POI 1 , POI 2 , � � �, POI M } and N users denoted as {u 1 , u 2 , � � �, u N } in a city area. Though in theory each POI POI i has a one-to-one mapping with a unique geographic coordinate, one geographic coordinate can locate multiple POIs because of the low precision. For example, (lat:39.959679, lon:116.362065) is the FengLan International Shopping Center in Beijing, while (lat: 39.958624, lon: 116.363542) and (lat: 39.958744, lon: 116.363428) represent Watsons and Mothercare in the Shopping Center respectively. In many circumstances, some businesses or shops have the same name, such as chain shops. So, we use a unique five-tuple to define the POI: Definition 1: A POI is a five-tuple as POI = < stra, name, lon, lat, type >. Here, stra, name, lon, lat and type represent structured address, name, geographic coordinate (often the longitude and latitude), and the semantic of the POI, respectively. The semantic refers to the type of service that a POI can provide, such as food, shopping, education, etc.
According to Definition 1, the POI in an LBSS can uniquely represent a business or shop with a geographical region in terms of physical location.
In the system, each user u j visits different POIs at different times. Each time, u j immediately reviews POI i after he visits it. In this paper, we use POI t l ij to denote POI i on which u j reviews at time t l . Considering the spatial and temporal correlation among POIs, we give a formal definition of the trajectory as follows.
Definition 2: For a user u j , her trajectory T j is a set of time-dependent discrete POIs reviewed in a cycle, which can be expressed as: Where r ij is the i-th location on T j and denoted as a three-tuple < POI(r ij ), t(r ij ), τ(r ij ) >, which means u j has visited and reviewed the POI(r ij ) at time t(r ij ) in period τ(r ij ) in a cycle. In essence, a trajectory is a sequence of locations sorted in chronological order in which they are reviewed, e.g., t(r 1j ) � t(r 2j ) � � � t(r nj ).
In an LBSS, each POI is uniquely represented by a 5-tuple. For each POI, users fill out and submit review to the SPs after logining account. The SPs can obtain each user's real identity and all reviews, due to the real-name registration. In reality, people always have a consistent daily life, means that a user always visits and reviews her most fixed places periodically, such as home, workplace, etc., different places where they engage in the same activity at the same period in different cycles(e.g., a user plays table tennis or badminton every night at 20:00). In this paper, we refer to the locations with the same semantic as the same semantic locations. Also, different users always engage in their activities at the same period in a cycle. We refer to the locations where different users engage in activities at the same time period as the same period locations. The above facts are the reason why the adversary can launch SLSA and why we define the two basic concepts. Note that Definition 2 can reflect activities in which a user visits some places periodically. Assume r ij and r i 0 j are the POIs where u j visits at the same period in different cycles and provide the same services. Then we can know that she visited the same location or participated in the same activity at the same time in two different periods.

Adversary model
The principal goal of the adversary is to collect privacy information about a particular user by associating her real identity with the corresponding trajectories. In this work, we consider two types of adversaries. One is the unauthorized third party which could illegally obtain users' information by conducting eavesdropping attacks, purchasing from LBS, collecting from the released data. The other is the SPs which could obtain the current reviews sending by users and all historical original reviews. Additionally, they also could also know the identify information of all users. The reviews and identify information are stored on the server and can be seen by the SPs. The two types of adversaries are selfish and curious and infer visited and sensitive locations of each user using the gained data. In particular, the SPs will compromise with the unauthorized third party and sell users' reviews and identify information to them for selfinterest. Hence, we consider the above two as the adversary in our paper. In our adversary model, we assume the adversary attempts to infer the following two types of trajectory privacy by using users' information.
• A particular user and her trajectories. The adversary analyses the spatial and temporal correlation of locations and reconstructs trajectories if a user's reviews are protected through pseudonym exchange. If so, she can know to whom the reconstructed trajectories belong.
• The most frequent semantic in the historical reviews. Based on the consistent lifestyle of a particular user, the adversary can count the most frequent semantic from historical data whose corresponding locations are most likely to be her real location.

Motivation and basic idea
In existing LBSSs, a user submits her reviews to the LBSS server. The reviews and the user identifies are stored on the LBSS server as historical data and are visible to the SPs. To protect user privacy, a workable method is pseudonym. However, we cannot directly assign pseudonyms to users since the adversary knows users' real identities. So, one effective approach is to exchange reviews. But its weakness is that distorted trajectories always contain some sub-trajectories of original trajectories. In this paper, we refer to the trajectories before and after exchanging reviews as the original trajectory and the distorted trajectory, respectively. For the example in Table 1, T 1 , T 2 and T 3 are original trajectories and the corresponding distorted trajectories are T 0 1 , T 0 2 and T 0 3 , respectively. a 2 ! a 3 is a sub-trajectory of T 1 . Adversaries can exploit the sub-trajectories to infer users' real identities. To illustrate this problem, we first give the following definition.
Definition 3 (sub-trajectory): For a trajectory T j , we assume there exists a trajectory T 0 = {r l | l = 1, 2, � � �, m and m � n}, where r l is the l-th location on T 0 and 8r l 2 T 0 , r l 2 T j . Let r l and r l+1 be equivalent to r ij and r i 0 j on T j , respectively. For 8r l , r l+1 , if it satisfies the condition: i < i 0 , T 0 is a sub-trajectory of T j . Definition 3 ensures that the consistency of spatial-temporal sequence of locations between a trajectory and its sub-trajectories, e.g., a user visits r l before r l+1 in T 0 . A distorted trajectory can contain several sub-trajectories from different original trajectories. https://doi.org/10.1371/journal.pone.0256892.t001

PLOS ONE
Trajectory privacy preservation for review services in Location-Based Social Networks As stated in [9,13], the adversary can still infer users' real identities if existing methods only encrypt user identities without distorting the original trajectory. Here, distorting refers to replace some locations on an original trajectory with some different locations. So, in the PR scenario, we need to distort the original trajectories after exchanging reviews. However, existing methods have not yet proposed a metric to measure how distorted the original trajectory is.
In our scenario, for a particular user, the adversary knows her trajectory stored on the LBSS server is a distorted trajectory. Yet, he wants to get her original trajectory by reconstructing the distorted trajectory. In many cases, an adversary can obtain a particular location or a sub-trajectory of a user. For example, the adversary may accidentally know Alice's home or path due to an encounter or walking together. Once obtaining these locations, he can exploit them to recover the original trajectory in a variety of ways, such as correlation attack [19], aggregated model [13]. Intuitively, for a distorted trajectory, the more locations the adversary knows, the more likely he is to recover the original trajectory. Note that the more unreplaced locations on an original trajectory, the more likely it is to be recovered. So, we uniformly use the maximal common sub-trajectory of a distorted trajectory and its corresponding original trajectory to represent locations that the adversary has already known.
According to the above analysis, the distortion metric is proposed to capture the correlation between the maximal common sub-trajectory and the distorted trajectory. It reflects how difficult it is for the adversary to recover the original trajectory. The larger the value of distortion is, the less likely the adversary is to recover the original trajectory. Then, we define the distortion metric as follows: We define the distortion between T o j and T d j as: Where jT 0 ðT o j ; T d j Þj and jT o j j are the number of locations of T 0 ðT o j ; T d j Þ and T o j , respectively. disðT o j ; T d j Þ denotes the probability that the adversary can recover complete T o j . Its physical meaning is that, for the T o j with a fixed number of locations, the more replaced locations, the less likely the adversary can recover complete T o j . Additionally, there is a threshold for the maximal common sub-trajectory. That is, the adversary can completely recover an original trajectory as long as he knows enough but not all locations on it. So, for T o j and T d j , we must ensure that the distortion between them is bound by δ j 2 (0, 1]. Note that we mainly consider users who have exchanged reviews with a particular user. δ j = 0 means two users did not exchange any reviews.
In our scenario, the distorted trajectories of users who have exchanged reviews with u j also need to be bound by δ j . We assume D j is the set of these distorted trajectories. Then, for we must ensure that the distortion between them is bound by δ j .
The above analysis formalizes the conditions for satisfying trajectory privacy protection during exchanging reviews. If a distorted trajectory contains fewer exchanged locations, the adversary can exploit it to obtain the original trajectory. In this paper, we call this kind of attack review-based location correlation attack (RLCA).
Note that our paper mainly focuses on how the adversary exploits the sub-trajectory to obtain the original trajectory, rather than inferring which user the original trajectory belongs to. In other words, the adversary can obtain users' real identities once he determines the original trajectory.
As far as privacy protection is concerned, RLCA ignores the fact that a user always engages in the same activities periodically in the long term. Consider Alice who goes to some restaurants (perhaps not the same restaurants) near her workplace for lunch at 12:30 every day. The POIs that Alice visits will have the same semantics (called Food & Beverages Service). Though Alice exchanged reviews with others, using the historical data the adversary can still infer that Alice visited a place with the semantic 'Food & Beverages Service' since Alice appears more frequently than others and the POIs with the semantic 'Food & Beverages Service' appear more frequently than other POIs.
To clarify the above problem, we assume the adversary has obtained Alice's historical data during a period of time. In the historical data, Alice has submitted n reviews to the LBSS server. For simplicity, suppose that n POIs related to these reviews have the same semantics. For each review, we select other k − 1 users to form an anonymous group with Alice, in which k users exchange their reviews and send them to the LBSS server. For these k × n POIs, there are m different semantics denoted as {s i |i = 1, 2, . . ., m, 1 � m � ((k − 1) × n + 1)} and n i is the number of s i appearing in these POIs. We assume s 1 is the semantic with which Alice has submitted reviews.
Then, the number of s 1 and other semantics appearing in these POIs are k � n À P m i¼2 n i and P m i¼2 n i , respectively. Among these k × n POIs, the frequency of s 1 is p 1 , then: For 8s i (2 � i � m), the frequency of s i is p i , then: Consider that Alice sends her reviews in the long term. That is, Alice will send an unlimited number of reviews to the LBSS server. Then, we can get: Furthermore, we denote these users as {u l |l = 1, 2, � � �, h, k � h � (k − 1) × n + 1} and n l is the number of u l appearing in these users. Then, the number of Alice (We assume u 1 is Alice) appearing in these users is n. Except for Alice, the number of u l appearing among these users is n l (1 � n l � n). Then, we can get the frequency q l of u l as follows: When the reviews sent to the LBSS server are unlimited, Alice will appear in every review, others will not. So we can get the following formula: According to the above formulas, we draw the following conclusions: (1) When n tends to infinity, the frequency of s 1 and s i (i 6 ¼ 1) will be closed to 1 and 0, respectively. That is, so long as Alice submits large enough reviews with the same semantic to the LBSS server for a long term, the semantic of Alice's reviews must be much more frequent than every other semantic.
(2) When the conditions are the same as (1), the frequency of Alice appearing in all users must be far more than others. By analyzing the historical data, the adversary can conclude that Alice and s 1 will appear in every review with extremely high probability. Once the adversary obtains some reviews including Alice and a location with s 1 , it will be determined that Alice visited the location. In this paper, we call this kind of attack as semantic-based long-term statistical attack (SLSA).
The above analysis states two mechanisms through which the adversary launch RLCA and SLSA to obtain the trajectory when we protect user trajectory privacy by exchanging reviews in our scenario. (1) The adversary will recover a trajectory with enough unreplaced locations and know who it belongs to. (2) If a particular user periodically visits the POIs with the same semantic in the same time period for a long time, the frequency of the user and the frequency of the semantic will be much higher than those of other users in the historical data. Hence, our basic idea is that a user exchange reviews with different users as much as possible. Besides, the frequency difference of different semantics in the historical data is as small as possible.
To implement the above basic idea, our solution is to select users who exchange reviews from two aspects. First, before a user sends a review to the LBSS server each time, we try to select some other users to form an anonymous group. In the anonymous group, each user has at least one user whose the distortion between their trajectories does not exceed the threshold after they exchange reviews. It ensures that the adversary cannot recover the trajectories of every user in the anonymous group by launching RLCA. Second, we should select users to form anonymous groups and exchange reviews based on historical data. For a particular user, we select users to form an anonymous group, in which the frequency of each user and each semantic in the historical data are as the same as possible. It guarantees that the adversary cannot infer a user's location by launching SLSA.

System architecture
To select suitable users who exchange reviews to resist RLCA and SLSA, our system architecture should consider two facts: (1) the SPs are adversaries and we cannot storage non-anonymous user historical data anonymized on the LBSS server; (2) the overhead of storage and calculation are huge and we cannot implement them on mobile terminals. Therefore, we employcentralized architecture as our system architecture. Our system architecture contain three roles as shown in Fig 2. Users: In our system, users can use microcomputers, mobile terminals, etc., to register with the Trusted Central Authority (TCA) by sending a registration request. Users can also send query requests, reviews, etc., to TCA so that they can query and review the services provided by POIs.
TTP: TTP is an independent and trusted third-party server, which receives query requests and registration requests from users and forwards them to the LBSS server. It receives query results from the LBSS server and returns them to users. During the registration process, TTP stores users' real identities information. Additionally, the TTP server stores the user reviews in the database, and selects users to exchange reviews to protect user privacy, also stores some data related to privacy protection functions, such as POIs within some cities and their services.
LBSS server: It provides users with services such as query, registration, and review. Specifically, the LBSS server receives query requests and registration requests from the TTP server and returns the query results to it. The LBSS server stores users' real identities information and the anonymized reviews in the database. The LBSS server also publishes reviews on the Internet.

The algorithm framework
In an LBSS, users not only wish to enjoy the business services, but also hope to publish objective reviews for the service so that others can also enjoy them. If user u j wants to publish reviews, she needs to register with the system by sending her real identity. However, her trajectory privacy is inevitably leaked, since the SPs are untrustworthy and her identity information and reviews are stored on the server. Hence, we propose a method to focus on how to select users who exchange reviews. To select proper users, the TTP server first select u j and other k − 1 users to form an anonymous group and each selected user exchange reviews with another user by running algorithm 1. If the trajectories of k users cannot resist RLCA after exchanging reviews, the TTP server needs to reselect users to exchange reviews by running algorithm 2; if the trajectories of k users can resist RLCA but not SLSA, the TTP server needs to reselect users to exchange reviews by running algorithm 3; then, the TTP server send the trajectories that can resist RLCA and SLSA. The framework of our algorithms is shown in Fig 3. The algorithm design RUS algorithm. The main purpose of RUS algorithm is to randomly select users to exchange reviews without considering RLCA and SLSA. To better protect the trajectory privacy, every time a user submits a review, we select k − 1 users whose reviews have not been exchanged for her to form an anonymous group, in which each user selects another one to exchange review.
denote the set of original trajectories and the set of distorted trajectories already stored on the TTP server, respectively. For u j , each location on T o j corresponds to a review and has been used to exchange reviews with other users. For each location on T d j , it corresponds to the review that has been used to exchange with one review of u j . Suppose u j submits a review for a POI that needs to be exchanged to the TTP server and the location corresponding to the POI is denoted as r In an LBSS, every time a user submits a review, RUS algorithm needs to search for all unexchanged reviews received by the TTP server. According to the given security parameter k, we select k users to form an anonymous group and exchange reviews. The process to solve this problem is shown below.
First, when user u j submits a review, RUS algorithm needs to input all unexchanged reviews and get the set of locations R ¼ fr 0 j ; r 0 p 1 ; r 0 p 2 ; � � � ; r 0 p L g that correspond to these reviews. Where r 0 p l 2 R is the location of u p l and tðr 0 p l Þ is the time when u p l has reviewed r 0 p l . The system also needs to determine the security parameter k (in our paper, k is a constant not less than 3). Given k, RUS algorithm needs to select u j and other k − 1 users to form an anonymous group. A bigger k leads to more users who can exchange reviews with u j and better trajectory privacy protection.
Second p lþ1 meets both conditions. Then, u j and u p lþ1 exchange locations. As far as u p lþ1 and her exchanged location r 0 j is concerned, we select a location for r 0 j to exchange as we do for u j . At last, we follow the above steps to select k users including u j to form an anonymous group G. In G, each element is a two-tuple composed of the user and the exchanged location.
Third, RUS algorithm searches the database stored on the TTP server to find O 0 and A 0 , which are the subset of O and the subset of A, respectively. O 0 and A 0 respectively contain the original trajectory and the distorted trajectory of k users in G. For each location in G, we add the location to its corresponding user. For example, for r 0 p l , we add it to T o p l , and then add it T d j if u p l and u j exchange their reviews. Finally, we output the anonymous group contains k users, their original trajectories and distorted trajectories. The anonymous group is sent to the LBSS server, and the original trajectories and the distorted trajectories are stored in the database on the TTP server. The pseudocode is described as Algorithm 1.
Algorithm 1 describes how RUS algorithm selects users and exchanges reviews. It ensures that the SPs can obtain a user's real identity and real review, but don't know who submits the review. Therefore, it can effectively protect users' trajectory privacy. However, when selecting users to exchange reviews, RUS algorithm fails to consider RLCA and how k will lead to the leakage of trajectory privacy after exchanging reviews. As a result, we should enhance RUS algorithm so that it can address the problem.

Algorithm 1: Random-User Selection Algorithm
Input: all reviews that have not been exchanged with others, security parameter k Output: the anonymous group G which contains k users and their reviews that have been exchanged with others, the set of the original trajectory O 0 , the set of distorted trajectory A 0 . 1 Get the set of locations R ¼ fr R 0 ¼ R 0 nfr 0 p l g, temp u ¼ u p l ; 10 Searches the database stored on the TTP server and get the original trajectories and the distorted trajectories of k users in G; 11 O 0 add each location in G to the corresponding original trajectory; 12 A 0 add each location in G to the corresponding distorted trajectory; 13 Return G, O 0 , A 0 USR-RLCA algorithm. To protect trajectory privacy more effectively, RUS algorithm must be enhanced by considering the RLCA and k. According to formulas Eqs (2) and (3), to resist RLCA, the distortion between an original trajectory and any distorted trajectories exchanged reviews with the original trajectory must be less than δ j . It means a user should avoid exchanging reviews with the same user multiple times as much as possible. In other words, a user should exchange reviews with as many users as possible. However, more users always lead to higher overload due to the selection of more users.
Another problem to be considered is, for an original trajectory, no matter how many reviews of it are exchanged, the adversary can always recover it. To illustrate it, we assume an original trajectory T o j contains 10 locations and δ j = 0.4. It means that any user can exchange reviews with u j no more than 4 times. However, if only two trajectories, no matter how many times they exchange reviews, the adversary can still infer T o j . To address the problem, each trajectory (including T d j ) should contain at most 4 identical locations with T o j after exchanging reviews. At this time, during the exchange of all reviews, u j should exchange reviews with at least b10/(0.4 � 10)c + 1 = 3 users. That's, k is determined by δ j and the number of locations on T o j . To address this problem, every time a user exchanges reviews with others, we set the number of users in the anonymous group to be as least b 1 d j c þ 1. So, k is represented as Based on the aforementioned analysis, we propose USR-RLCA algorithm to select users and exchange reviews. USR-RLCA is an enhancement RUS algorithm, since it considers the threshold δ j and k. By running USR-RLCA, u j submits a review each time, at least b 1 d j c þ 1 users are selected to form an anonymous group and the distortion between T o j and any distorted trajectories is ensured to be less than δ j . Algorithm 2 gives the pseudo-code to describe how USR-RLCA algorithm selects users and exchanges reviews. When receiving the reviews submitted by u j , the TTP server first passes the parameters set and the unexchanged review to USR-RLCA algorithm. USR-RLCA gets the locations set R ¼ fr For the one hand, to prevent the adversary from launching SLSA, we consider the case that u j sends n reviews in a period of time and selects k − 1 users to exchange reviews each time. Obviously, u j will participate in exchanging reviews every time while others are not, because they may not visit the POIs at the same time as u j or be selected by USR-RLCA algorithm to participate in exchanging reviews. Therefore, there are two solutions to this problem. An optimal solution is that we select the same users to form an anonymous group every time u j sends a review. However, since it is impossible to ensure that each user and u j submit reviews at the same time, the optimal solution is not feasible, especially u j submit a large number of reviews in a long term. The other solution is that we can select different users but ensure that the difference between the probability of any user (denote the set of all users as D u ¼ fu a 1 ; u a 2 ; � � � ; u a D g) and u j is bound by the threshold δ u in the long term. Then, for each user u a d 2 D u , the solution can be formalized as Eq (10).
For the other hand, the other question for u j is that she will submit reviews with the same semantic (denote it as s j ) during the same time period in each cycle while other users selected to exchange reviews are not. This causes that the number of semantic s j is far more than other semantics during the same time period of each cycle in the long term. In other words, the p(s j ) (probability of s j ) is much bigger than the probability of other semantics. According to the analysis that u j has the highest probability among all users and s j also has the highest probability among all semantics, the adversary can refer that u j is the user who visits a POI with s j . So, for s j and all semantics S = {s 1 , s 2 , � � �, s S }, we ensure that the difference between the probability of s j and 8s i 2 S is bound by the threshold δ s . Compute pðs p l Þ in R 00 S fs p l g and pðu p l Þ in U 0 S fu p l g; 13 if there exists at least one s i 2 Snfs p l g which d s ðs i ; s p l Þ � d s or u 0 in R 000 . For a location r 0 p l 2 R 000 , if its corresponding user u p l and all users in U 0 satisfy Eq (10) or its corresponding semantic s p l and all semantics in S satisfy Eq (11)

Feasibility discussion
In this section, we discuss the feasibility of the proposed scheme in terms of both implementation and security. Specifically, following the aforementioned goals, we discuss whether our scheme can be implemented and achieve the desired privacy protection requirements.

Implementation analysis
Users and System Providers (SPs). The core of our scheme is that users exchange reviews with each other. It means that for Alice, in a public review list, Bob will publish her review. Therefore, the first question we consider in our implementation analysis is whether users are willing to exchange reviews with others.
For LBSSs, users' identities are anonymized in various ways, such as pseudonyms, hiding key characters, etc. It indicates that the user is more concerned about the impact of the review on the business than about who published it. In fact, by storing the original trajectory in a database, TTP servers can still maintain authentic review lists for each user and display the review list in a way that is personally visible to each user. Therefore, it is feasible to assume that users are willing to exchange reviews to protect the trajectory privacy.
For the SPs, we mainly consider whether they are willing for users to exchange reviews with each other when it is legally regulated. In general, the SPs are motivated by the desire for users to submit as many authentic reviews as possible so that they can build an objective reputation for the business. In our scenario, although users can exchange reviews, they do not submit dummy reviews as the k anonymous. Therefore, it does not affect the objectivity of the business's reputation. At the same time, considering that trajectory privacy can be protected, users will be willing to submit much more reviews. So, it is feasible to assume that the SPs are willing for users to exchange reviews.
The existence of the solution. In our scenario, the ideal solution of our scheme is that we can select k − 1 users for each review of user u j to form an anonymous group to exchange reviews and that the trajectories of all users in the anonymous group cannot be identified by the adversary exploiting RLCA and SLSA. However, as the aforementioned analysis in Section USR-RLCA algorithm, such ideal solution does not always exist. Thus, we prove that our solution is feasible by demonstrating the existence of such an ideal solution in this section. Let R ¼ fr g are the set of original trajectories and the distorted trajectories corresponding to the locations in R. U ¼ fu j ; u p 1 ; u p 2 ; � � � ; u p L g is the set of users corresponding to the locations in R. We first give the following definition.
Definition 5: For any user in U, e.g., u j , the solution of our scheme exists if we can select k − 1 users from U to achieve the goal of our scheme by exchanging reviews with each other.
In this paper, our scheme achieves three goals of trajectory privacy protection, i.g., randomly selecting users to exchange reviews, resisting RLCA, and resisting SLSA, which is achieved by running the algorithms of RUS, USR-RLCA and USR-SLSA, respectively.
Theorem 1: For our scheme, the solution exists.
Proof: We consider the solutions of three algorithms of our scheme from the following aspects. RUS algorithm: The existence of a solution to RUS algorithm refers that it can select k − 1 users from U to exchange reviews for u j . Obviously, it can easily achieve this goal, since the number of users in U is greater than k − 1.
USR-RLCA algorithm: Given parameters k and δ j , the existence of a solution to USR-RLCA algorithm refers that it can select k − 1 users from U and ensure that each of these k trajectory (contain T d j ) contains at most ðjT d j j þ 1Þ � d j . Using Eq (9) we have Therefore, USR-RLCA algorithm can select k − 1 users from U that achieves its goal to exchange reviews.
USR-SLSA algorithm: Given parameters k, δ j , δ u and δ s , the existence of a solution to USR-SLSA algorithm refers that it can select k − 1 users from U to exchange reviews and ensure that the difference in probability between different semantics or the probability of k users satisfies Eqs (10) and (11). For every review of u j , if the selected k users are the same or the locations of the selected k users have the same semantics, it will satisfy Eq (11). USR-SLSA algorithm can easily select such k users whose locations have the same semantics every review of u j , since R contains enough locations. Besides, USR-SLSA algorithm can also select the same users every review of u j . Hence, USR-RLCA algorithm can select k − 1 users from U that achieves the goal to resist SLSA.
Time complexity. Our scheme consists of three algorithms of RUS, USR-RLCA, and USR-RLCA. RUS algorithm includes of two processes of sorting the set R by the order of time and selecting k users to form an anonymous group. Assume R contains L locations (except u j ). In the worst case, the time complexity of sorting the set R is O(L 2 ). In the process of selecting k − 1 users for u j from R 0 , in the worst case, the time complexity is Oð P k i¼1 ðL À ðk À 1ÞÞÞ. Therefore, the time complexity of RUS algorithm in the worst case is OðL 2 þ P k i¼1 ðL À ðk À 1ÞÞÞ. USR-RLCA algorithm contains the same processes as RUS algorithm. The difference is that in the process of selecting k − 1 users, USR-RLCA needs to calculate the distortion between the two trajectories. Assume For these k − k 0 locations, assume there are k u locations where corresponding users belong to U 0 but semantics don't belong to S and k s locations where corresponding users belong to S but users don't belong to U 0 . Then, the time complexity P k s i¼1 ðL À k 0 À ðk s À 1ÞÞ � ðU À k 0 À ðk s À 1ÞÞ þ ðjT o j j þ 1ÞÞ. Hence, the time complexity of USR-SLSA algorithm in the worst case is O(

Security analysis
In our scenario, the adversary gaining trajectory privacy means that the adversary infers an original trajectory and the user to whom it belongs. In our scheme, the adversary gains trajectory privacy in three ways: (1) There is a correspondence between a user and her trajectory; (2) The adversary can infer the original trajectory by launching RLCA; (3) The adversary can infer the POI that a user periodically visits by launching SLSA. For (1), it is clear that there is no correspondence between the user and her trajectory by adopting our scheme. Hence, in this section, we only prove that our scheme can resist both RLCA and SLSA.
1. Resisting to RLCA. In this part of the analysis, the adversary knows some locations that a user has visited. Once he finds that a distorted trajectory of her original trajectory contains some of these locations, he will likely infer her original trajectory. 2. Resisting to SLSA. As the aforementioned analysis in USR-SLSA algorithm, when the LBSS server accepts the anonymous group G from the TTP server which is formed for u j to exchange reviews with others, the adversary can know all probabilities of users U 0 = [u j , u 1 , u 2 , � � �, u k−1 ] and semantics S ¼ fs j ; s 1 ; s 2 ; � � � ; s 0 k g. He also knows the difference in probability between s j and other semantics and the difference in probability between u j and other users. For 8s i 2 S and 8u l 2 U 0 , once the d s (s i , s j ) is more than δ s , and the d u (u l , u j ) is more than δ u , he will infer that the review with the semantics s j is the most likely one u j has visited. Definition 7: For u j and the semantic s j , our scheme can resist SLSA if each u l 2 U 0 and the semantic s i 2 S corresponding to the location of u l satisfy one of the two following conditions: i) d s (s i , s j ) are less than the threshold δ s and all d u (u l , u j ) are less than the threshold δ u ; ii) d s (s i , s j ) are less than the threshold δ s or all d u (u l , u j ) are less than the threshold δ u .
Theorem 3: Our scheme is resistant to SLSA. Proof: For u j and s j , every user in U 0 and their semantics in S meets one of the above two conditions. When we run USR-SLSA algorithm to select users to form the anonymous group, these users whose d s (s i , s j )�δ s and d u (u l , u j )�δ u are selected in priority. Then, if less than k users are selected, USR-SLSA algorithm continues to select users whose d s (s i , s j )�δ s or d u (u l , u j )�δ u to form the anonymous group until it can select k users. It ensures that the adversary can't infer which is the semantics of location most likely visited by u j . Thus, our scheme can resist SLSA.

Evaluation setup
Generally, privacy and utility [31,32] are two significant metrics to measure privacy pretection technology. In this section, we implement experiments on a real-world dataset to evaluate the performance of our scheme in terms of the privacy and the utility.

Dataset
The Dataset we use for the evaluation is Yelp dataset [33] and is collected from Yelp, which is the largest review site in the United States. It contains 3 types of information: businesses, reviews and user profiles and has been used for many academic researches, such as recommendation system [34], privacy protection [2], sentiment analysis and opinion mining [35]. By pre-processing Yelp dataset, we get a new dataset (called Combination Dataset) containing 264562 valid reviews in 510 cities for evaluating our experiments. We also add a semantic field into the Combination Dataset. In general, the semantic is used to describe the functionality of the business. For example, the semantic 'restaurant' indicates that the POI is a location providing food. In this sense, if a user visits a business, we can describe the user's activity as the semantic of the business. Thus, the semantic in this paper refers to the user's activity. Based on [33], we classify the semantics into 15 categories, as shown in Table 2.
In Combination Dataset, the number of reviews varies significantly in different cities. For example, the city with the least number of reviews only has one review, the city with the most number of reviews has thousands of reviews. It means that our evaluation is easily affected by the extreme reviews in such cities. Hence, we use the median value of the number of reviews in all cities to reduce the impact of such extreme reviews for our evaluation. The median value is a concept in statistics and probability theory. In this paper, the median value refers to the "middle" number, when the number of reviews in all cities are listed in order from smallest to greatest. But, no city has the same number of reviews as the median value. So, we use the data from Las Vegas which has the closest number of reviews to the median. The statistic for Las Vegas is shown in Table 3.

Experimental settings
In reality, humans are accustomed to periodically engaging in the same activities in the same areas. For example, humans eat lunch near their workplace every weekday. Thus, we partition Las Vegas into 5 � 5 grids and each grid represents a region. Considering that humans schedule activities based on weekdays and weekends, we set the user's activity cycle based on the week. Assume Alice engages in many activities (visit the businesses) in a grid and the activity 'Shopping' on the 3rd day (Tuesday) of the week appears most frequently. Then we set Alice to engage in the activity 'Shopping' on the Tuesdays of each week.
Intuitively, people are accustomed to engaging in different activities at different times of the day. It is customary to divide time period according to morning-afternoon-evening. Based on this division, we assume that people work in the workplace in the morning and go to nightclubs in the evening. Therefore, the division of time periods will influence the adversary's inference about user's activities. This division is also based on the fact that the frequency of people's activities is stable at different times of the day. To set the time period, we analyze the frequency distribution of users' activities at different time periods on days of the week. Fig 4 shows   Evaluation metric Privacy metric. The goal of the adversary is to get users' original trajectories. To do this, the adversary will exploit a particular location or series of locations he has known exclusively for a user to reconstruct the original trajectories. The more locations the adversary knows, the more likely he is to reconstruct them. Hence, the adversary will use different ways to get as many exclusive locations of a user as possible. For example, launching SLSA to find that a location with the most frequently occurring semantic is the user's real location. Based on Section Motivation and Basic idea, the distortion metric can be used to quantify the privacy. For a user and one of her trajectories, if its distortion is greater than δ j , the adversary can reconstruct it, i.e., the privacy is compromised.
For our scheme, among all original trajectories, the more trajectories with the distortion not greater than δ j , the better the privacy protection of this scheme. Thus, we use the ratio of original trajectories whose corresponding distortion is not greater than δ j to quantify the privacypreserving efficiency of our scheme (called effective distortion ratio).
It includes 4 cases in which the adversary can reconstruct the user's original trajectory: 1. Users do not exchange reviews with other users, which allows the adversary to directly obtain the original trajectory; 2. reviews are exchanged between users, but RLCA is not considered; 3. RLCA is considered, but SLSA is not; 4. both RLCA and SLSA are considered. To evaluate the impact of these different cases on user privacy, we compare USR-SLSA algorithm with RUS, USR-RLCA, the non-exchange review solution (Non-exchange) and the theoretically optimal solution (Optimal). Non-exchange corresponds to the case1. Optimal will lead to a theoretically optimal result that the adversary can't infer any original trajectories.
Utility metric. Because users submit reviews to SP primarily for publishing, we must consider the user utility of users in terms of publication. Paper [2] is the first study on review publishing considering system utility, personal profile, and privacy in multiple regions and can preserve user location privacy by suppressing some public reviews. Since our scheme does not focus on how to publish reviews, we use (�, δ)-public principle, which is a review publication mechanism used in the literature [2], to publish reviews. � and δ are thresholds and are used to balance the number of anonymous reviews and the number of public reviews for each business. In the mechanism, all reviews are public when the number of the reviews for L i (L i refers to a business.) is less than δ. At least � out of top-δ useful reviews are public when the number of the reviews for L i is no less than δ. The mechanism can preserve users' location privacy by suppressing some public reviews.
As mentioned in the Introduction, users hope to build reputations for POIs by publishing reviews. To ensure a more objective reputation, users want to publish as many reviews as possible. Therefore, we define the user utility as users' reviews that are published and measure the utility as the ratio of the public reviews. Public review is a metric used in the literature [2] and refers to the number of all users' published reviews. Public reviews increase as the global budget increases. Global budget refers to the maximum number of reviews that can be published by every user in all regions. To evaluate the user utility, we compare our scheme with the method of literature [2] (LRPM) and we set � = 2,δ = 3 and the global budget ranges from 20 to 70.

Results
1. Privacy-preserving efficiency. We first evaluate the privacy-preserving efficiency of the USR-SLSA for our scheme. Due to the impact of different parameters on the privacy-preserving efficiency, we separately evaluate the impact of k, δ j , δ u , and δ s on the privacy-preserving efficiency. Figs 5-8 show the effective distortion ratio for five compared algorithms with different parameters, respectively. Specifically, Fig 5 shows the change of effective distortion ratio when k increases. Note that only reviews with the same POI, time period, etc., can be exchanged. Therefore, in our dataset, only no more than 8 users can form an anonymous group. We can see that the effective distortion ratio for RUS, USR-RLCA and USR-SLSA slowly reduces with the growth of k when k < 7 and remains constant when k � 7. This is because the number of anonymous groups that can contain at least k reviews reduces as k increases. Fig 6 shows the change of effective distortion ratio when δ j increases. As shown in Fig 6, the effective distortion ratio for three algorithms hardly changes with the increase of δ j when δ j > 0.7 (the corresponding δ j for RUS, USR-RLCA and USR-SLSA range from 0.7 to 1.0, 0.8 to 0.9 and 0.8 to 1.0, separately). δ j can determine the privacy-preserving efficiency for three different algorithms only when δ j is less than 0.7. From Figs 7 and 8, we observe that the effective distortion ratio of three algorithms hardly changes when δ s (δ u ) is more than 0.7. When we set δ u = 0.5, their effective distortion ratio increases as δ s grows when δ s is more than 0.5. When we set δ s = 0.5, their effective distortion ratio increases as δ u grows when δ u is more than 0.5. This is because some users only submit 1 or 2 reviews. Considering the occasionality and randomness of user behavior, adversaries cannot exploit such reviews to obtain the privacy of the corresponding users, which lead to an increase in the effective distortion ratio.
Figs 5-8 also shows some similar evaluation results. Firstly we can see that the effective distortion ratio of USR-RLCA is larger than that of RUS. The reason is that, compared with RUS, the adversaries cannot identify more trajectories by launching RLCA when the reviews are exchanged using USR-RLCA. It proves that USR-RLCA can resist RLCA. The effective distortion ratio of USR-SLSA is larger than that of USR-RLCA. The reason is that USR-SLSA is resistant to only both RLCA and SLSA. As a result, USR-SLSA enables fewer trajectories to be identified by the adversafy than USR-RLCA. Secondly, the result of Nonexchange shows that even if users do not exchange reviews, adversaries cannot gain all users' privacy. As in the above analysis, adversaries cannot infer the privacy of users who submit only 1 or 2 reviews. However, such reviews can be exploited to some extent by adversaries to identify the trajectories of other users. As a consequence, the effective distortion ratio of USR-SLSA is always less than 1.0. Thirdly, since some reviews will be exchanged by performing RUS, the effective distortion ratio of Non-exchange is lower than that of RUS. Besides, we also observe that the effective distortion ratio of USR-SLSA is larger than that of USR-RLCA. This is because USR-SLSA can resist RLCA and SLSA while USR-RLCA can only resist RLCA.
2. User utility. In this part, we evaluate user utility in the case where the SPs receive reviews sent by USR-SLSA and publish them by the (�, δ)-public principle. Fig 9 shows the ratio of public reviews for USR-SLSA and LRPM for different global budget. We observe that USR-SLSA has a larger ratio of public reviews than LRPM. Because the SPs receive fewer reviews published by USR-SLSA than LRPM. Thus, for the same global budget, USR-SLSA can publish a larger percentage of public reviews. However, Fig 9 does not sufficiently illustrate that USR-SLSA can publish a larger number of reviews than LRPM. Therefore, we further evaluate the ratio of the number of reviews published by USR-SLSA to all reviews. The evaluation results are shown as Fig 10. The ratio of public reviews is almost identical for both of them. Because the reviews submitted by the users to the SPs through USR-SLSA cannot reveal the privacy, thus the users can publish more reviews.

Conclusion
In this paper, we study the exchanging reviews for trajectory privacy protection in LBSSs.
Since the LBSS is a registration system, adversaries can easily obtain user profiles and trajectories embedded in the reviews submitted to the SPs by compromising with the SPs. To protect trajectory privacy, we propose an approach to exchanging reviews before users submitting reviews to the SPs. However, after analysis, we find that exchanging reviews can be easily broken by RLCA and SLSA if we randomly exchange users' reviews. To resist the two attacks, we design two schemes named USR-RLCA and USR-SLSA to exchange reviews. For USR-RLCA, we propose a metric to measure the correlation between a user and a trajectory. Based on the metric, USR-RLCA can select reviews resisting RLCA to exchange by suppressing the number of locations on each reconstructed trajectory below a threshold. For USR-SLSA, we propose a metric to measure the indistinguishability of locations concerning the difference of semantic frequency in a long term. Based on the metric, USR-SLSA can select reviews resisting RLCA to exchange by allowing two reviews, which the probability difference of their semantics is below a threshold after the exchange, to be exchanged. The evaluation results demonstrate that our approach can effectively protect trajectory privacy when real-name users submit their reviews to SPs and do not degrade users' utility in terms of review publication. Yet in fact, our study is based on two assumptions: (1) users are registered in real names on LBSSs; (2) a user is allowed to review on businesses he has enjoyed the service. There are still some LBSSs that do not require users to register with real names or allow users to review on businesses without restrictions. This enhances the complexity of the exchange of reviews and privacy protection would be a more interesting and challenging topic. Our future will focus on how to exchange reviews in such scenarios.