Improved privacy preserving method for periodical SRS publishing

Spontaneous reporting systems (SRSs) are used to collect adverse drug events (ADEs) for their evaluation and analysis. Periodical SRS data publication gives rise to a problem where sensitive, private data can be discovered through various attacks. The existing SRS data publishing methods are vulnerable to Medicine Discontinuation Attack(MD-attack) and Substantial symptoms-attack(SS-attack). To remedy this problem, an improved periodical SRS data publishing—PPMS(k, θ, ɑ)-bounding is proposed. This new method can recognize MD-attack by ensuring that each equivalence group contains at least k new medicine discontinuation records. The SS-attack can be thwarted using a heuristic algorithm. Theoretical analysis indicates that PPMS(k, θ, ɑ)-bounding can thwart the above-mentioned attacks. The experimental results also demonstrate that PPMS(k, θ, ɑ)-bounding can provide much better protection for privacy than the existing method and the new method dose not increase the information loss. PPMS(k, θ, ɑ)-bounding can improve the privacy, guaranteeing the information usability of the released tables.


Introduction
Many developed countries have established spontaneous reporting systems (SRSs) for the collection of adverse drug events (ADEs). These datasets allow researchers to analyze possible correlations between drugs and adverse reactions. Typical spontaneous reporting systems include FAERS of the US Food and Drug Administration [1] and the UK Yellow Card scheme [2].
However, these datasets usually involve information which relates to an individual's privacy. Sensitive attributes (SAs), e.g., adverse drug reaction and disease type are also included. Publishers usually remove attributes which can identify individuals uniquely before releasing their reports, however, Sweeney [3] has pointed out that an adversary can use quasi-identification attributes (QIAs) to link the released table to other publicly available datasets in an effort to uniquely identify an individual. A QIA can be Age, Gender, etc. A single quasi-identification attribute cannot uniquely identify an individual. To protect SAs, privacy preserving data publishing (PPDP) usually anonymizes original tables before releasing. In recent years, PPDP has been widely studied and seeks to maintain the tradeoff between privacy/security and information usability in released tables. K-anonymity [3] and its variants [4][5][6][7][8] are only suitable for static tables. When faced with Dynamic data tables, some incremental data publishing methods [9][10][11][12][13][14] are presented, such as BCF-anonymity [9], m-invariance [10], etc. Most techniques cannot preserve identity in released tables and are not well suited to SRS data publishing. Differential privacy [15][16][17][18] can make presence or absence of a record in the dataset have little effect on the outcome. However, the utility of released tables will be adversely affected by the added noise [19]. SRSs release updated datasets periodically, for example, the US Food and Drug Administration releases the adverse drug event datasets quarterly. Lin et al. [20][21] showed that SRS datasets usually include some characteristics, e.g., multiple individual records, multivalued sensitive attributes, etc. More importantly, the related ADE records of an individual may be contained in tables released in each period. These records share case identification (CaseID) to trace follow-ups to an event [22]. Thus, conventional data publishing methods cannot handle SRS datasets. To resolve this, Wang et al. [22] defined three types of attacks in SRS dataset publishing and presented a periodical SRS data publishing method-PPMS(k, θ � )-bounding. These attack types are defined as follows.

Definition 1 (Backward-attack)
Assume target individual P whose record t is in sanitized released table T i , use t.QIA and C to represent the QIAs values of t and P's candidate CaseID set in T i , respectively. U contains all this record r: r is from the previous released tables {T 1 , T 2 , . . ., T i-1 }, and r' s CaseID is in C. The Backward-attack (B-attack) may happen if there is a record r (r2U) whose QIAs values r. QIA does not cover t.QIA. We Denote the set of these excludable records as B. The QIAs values of r (r.QIA) cover the ones of t (t.QIA), if the records r and t satisfy: for each quasi-identification attribute QI in QIA, r.QI is equal to or more generalized than t.QI.

Definition 2 (Forward-attack)
Assume target individual P whose record t is in sanitized released table T i . We use t.QIA and C to represent the QIAs values of t and P's candidate CaseID set in T i , respectively. U contains all this record r: r is from the subsequent released tables {T i+1 , T i+2 , . . .}, and r's CaseID is in C. The Forward-attack(F-attack) may happen if there is a record r (r2U) whose QIAs value r.QIA does not cover t.QIA. Denote the set of these excludable records as F.

Definition 3 (Latest-attack)
Assume target individual P whose record t is in sanitized released table T i , and none of the previous released tables {T 1 , T 2 , . . ., T i-1 } contain the CaseID of P. We use C to represent the P's candidate CaseID set in T i . The Latest-attack(L-attack) may happen if there is any CaseID cid (cid2C) which appears in some previous released tables. Denote the set of these excludable records as L.
Based on the above three attacks, the definition of an anonymity model can be introduced [22]: 1. For each individual P, assume the candidate CaseID set of P in T i is C, then there is |C- 2. For every individual P, an adversary can conclude that P has any sensitive attribute value s j with a probability at most θ j .
PPMS(k, θ � )-bounding uses NC-bounding and OID-bounding to defend against the above three attacks. NC-bounding makes each group contain at least k new CaseIDs, which can defend against backward-attack and latest-attack. Let t be a record in released T i , t 1 , t 2 , . . ., t j are records have the same CaseID with t in previous tables T x , T y , . . ., T z (x<y<. . ., <z). To thwart forward-attack, OID-bounding requires that QIAs values of t should cover all records that share the same CaseID with t in previous tables. Wang et al. [22] also found that the forward-attack can be avoided when QIAs values of t only cover the records that share the same CaseID with t in T x , they call this method PPMS_EAR. Their experiments show that PPMS_EAR can thwart the above three attacks while maintaining the usability of tables.

Example 1
Bob knows that Alice(F, 39) is in Table 2(B), and he can relate Alice to records{t 13 , t 15 , t 18 }. He also knows that Alice will stop medicine in quarter 3, because her illness is shown as cured in quarter 2. Therefore, Bob can exclude t 13 , t 15 , and conclude Alice's record is t 18 with the probability of 100%. The privacy of Alice is disclosed.

Example 2
Bob knows that Clare(M, 47) is in Table 2(B), and can be related to records{t 16 , t 17 , t 21 }. Bob cannot relate Clare to a unique record, but {t 16 , t 17 , t 21 } all contain many more adverse drug reactions than other records. Thus, Bob can conclude that Clare gets more adverse drug reactions than other people. The privacy of Clare is disclosed.
PPMS(k, θ � )-bounding has not considered medicine discontinuation and records with massive symptoms, hence privacy may be disclosed by an adversary. As the extended versions of PPMS(k, θ � )-bounding, the other existing SRS data publishing methods [23][24][25] have not considered the attacks described by example 1 and example 2, either. It is necessary to find a way to defend against the above attacks. However, increasing the security usually makes the information usability decline. It is challenging to balance privacy security against the information utility. To alleviate these problems, this paper proposes a new SRS data publishing method which can improve the privacy and guarantee the information usability. The main contributions of this paper are summarized as follows: 1. Identifying two new attacks which are aimed at ADEs data publishing; 2. Based on the PPMS(k, θ � )-bounding, proposing a new data publishing method-PPMS(k, θ, α)-bounding. The new method can enhance the security of privacy and preserve the quality of released tables. A corresponding algorithm is presented.
3. Using a real FAERS database from the US Food and Drug Administration to verify PPMS (k, θ, α)-bounding.

Related work
ADE reporting is a special style of incremental data publishing released periodically. Subsequent tables may add new records, and delete/update records offered previously. For the purpose of tracing individuals, the same CaseID can appear in different released tables. Wang et al. [22] divided traditional incremental data publishing into two types: continuous data publishing and dynamic data publishing. Continuous data publishing [9,11]: periodic publishing that carries over records from previously released tables. The data holder needs to release all the data collected so far, if he wants to publish the data which is collected recently. Suppose that the data holder has collected data D i in timestamped t i . In general, the data holder has to release R i which is the anonymized ver- Some matching records can be excluded by the adversary, because he/she can infer that the records are not related to the target's QIA values or timestamp [9]. Thus, Fung et al. [9] pointed out that the excluded records can help the adversary access to a smaller set of candidates. Thus, they presented a privacy model (called BCF-anonymity) to evaluate anonymity after excluding some matching records. Besides, an efficient algorithm was presented to achieve a suboptimal BCF-anonymization.
Pei et al. [11] pointed out that in continuous data publishing scenario, k-anonymity [3] may be compromised due to the possible inferences using multiple releases. They presented a privacy preserving approach, called Monotonic Incremental Anonymization, to guarantee the k-anonymity on each release. Meanwhile, the approach can reduce information loss by using more and more accumulated data. Some continuous data publishing methods can preserve the identities of individuals among different tables [11], but this type of methods cannot support the operations of deletion and updating. Therefore, dynamic data publishing methods were presented later.
Dynamic data publishing [10,[12][13][14]: periodic publishing where records can be added, deleted or updated from previously released tables. This method cannot preserve the identities of individuals among different tables. Suppose that the data holder had collected the initial set of tuples D 1 in time t 1 , and published R 1 as the anonymized version of D 1 . During the period [t 1 ,t 2 ], when there were new records coming, the data holder inserted them into D 1 . At the same time, some records from D 1 might be deleted or updated by the data holder. Finally, the D 2 could be obtained in time t 2 . Thus, the data holder published R 2 as the anonymized version of D 2 . In general, the data holder publishes R i as the anonymized version of D i in time t i .
Xiao et al. [10] found out that when incremental data publishing supported deletions, the adversary could disclose the privacy of victims by comparing the series of released k-anonymous [3] and l-diverse [8] data. They presented a privacy model, called m-invariance, to guarantee certain "invariance" in all the QIA groups that a tuple is incorporated into at different publication timestamps.
Li and Zhou [12] defined the updates on attribute values as internal updates. They pointed out that the internal updates related to sensitive values were not arbitrary, the requirement of m-invariance was unreachable in this scenario. A counterfeit generalization approach called m-Distinct was presented to guarantee the security of dynamic publication with internal updates, insertions and deletions.
Following the work of [12], Anjum and Raschia [13] further assumed that new values might not have any association with the old ones, and the adversary knew the "event list". An attack model based on their assumption, called τ-attacks, was defined. To prevent the new attack, Anjum and Raschia also presented a publication approach called τ-safety, which is based on m-invariance and individual-oriented protection.
Bewong et al. [14] illustrated that the transactional data had some special features, such as having many common private terms. Thus, they pointed out that the existing incremental publishing methods were inapplicable. A transactional data publication mechanism called Sanony was also presented, to prevent composition attacks by utilizing counterfeits.
Fully data evolution is supported in dynamic data publishing. However, identity preservation cannot be supported in this type of multiple releases, which results in its inapplicability to ADEs data publishing.
Differential privacy [15][16][17][18] has garnered a lot of attention in recent years, it can minimize the chances of identifying records. However, the noise added by differential-based methods is unbounded and random, which will adversely affect the utility of released tables [19].
2016, Lin et al. [21] began to study ADEs data publishing, and presented the MS(k, θ � )bounding method based on the characteristics of ADE data. Because this method had not considered the correlation among different released tables, an adversary can exploit this situation when seeking to disclose the privacy of individuals. To resolve this, 2017 Wang et al. [22] proposed PPMS(k, θ � )-bounding for periodical ADEs data publishing. This method can defend against the three attack methods(B-attack, F-attack and L-attack) which are based on correlations among different released tables. After that, several ADEs data publishing methods based on PPMS(k, θ � )-bounding were presented [23][24][25]. Hsiao et al. [23] presented a privacy model, called Closed l-diversity, to process the missing value by guaranteeing that each partial QID-group includes at least l different sensitive values. They also proposed an algorithm, called Closed l-diversification, to achieve Closed l-diversity. Cui et al. [25] presented a SRSs data publication approach, called EQZS, to improve the efficiency of PPMS(k, θ � )-bounding.
The new values and old values covered each other in this method, which resulted in the limitation of the released data usability.
The existing SRSs data publishing methods have not considered the situation of medicine discontinuation and massive symptoms. The adversary can use related background knowledge to disclose privacy. This paper presents a new ADEs data publishing method to address the matter.

PPMS(k, θ, α)-bounding model
In PPMS(k, θ � )-bounding [22], the adversary learns target individual P's QIAs values, and knows P in a released table. An initial adverse drug reaction can also be revealed. We assume the adversary may learn extra information: P stops medication in the next quarter. Patients will stop the medication when the illness is cured or other therapies(e.g., surgery, food therapy) are chosen. Thus, the assumption is realistic. The adversary can use the information of medication discontinuation to disclose the privacy, like in example 1.

Definition 5 (Medication Discontinuation-attack)
Assume target individual P whose record t is in sanitized released table T i , and T i+1 does not contain the CaseID of P. Use C to represent the P's candidate CaseID set in T i . The Medication discontinuation-attack(MD-attack) may happen if there is any CaseID cid(cid2C), which appears in T i+1 . Denote the set of these excludable records as MD. Example 1 is an instance of MD-attack.

Definition 6 (Substantial Symptoms-attack)
The adversary can conclude that the target individual experiences more symptoms/adverse drug reactions than other people. We have used example 2 to illustrate this style of attack (SSattack). We refer to the record with substantial symptoms/adverse drug reactions as an ssrecord. Publishers can decide the specific method for defining an ss-record.
Based on PPMS(k, θ � )-bounding, PPMS(k, θ, α)-bounding needs to thwart these two new attacks: 2. θ-bounding: For every individual P, an adversary can conclude that P has any sensitive attribute value s j with a probability at most θ j .
3. a-bounding: For every individual P, an adversary can conclude that P has many more symptoms/adverse drug reactions than others with the probability at most α.
The privacy requirement of Definition 7(1) is used to avoid record disclosure. MD-attack is considered under this requirement which is the extended version of PPMS(k, θ � )-bounding. The adversary cannot distinguish the target individual from at least k records, even though he/ she has excluded some candidates through MD-attack, F-attack, B-attack and L-attack. The privacy requirement of Definition 7(2) states that the probability of attribute disclosure will not exceed a threshold, even though the adversary can exclude some candidate records from various attacks. Thus, this privacy requirement is to guarantee the security of sensitive attributes values. Besides, the SS-attack can be thwarted by meeting the privacy requirement of Definition 7(3) that limits the frequency of ss-record in groups.
To satisfy the PPMS(3, 1/3, 1/4)-bounding, Table 3(A)-3(C) can be released for Table 1 (A)-1(C). Each sanitized group incorporates at least three such individual P which have these properties: (a) P is the first time appearing in multiple releases; (b) P will not appear in the next release. At the same time, the new sensitive values cover the corresponding old ones according to the OID-bounding. Each group still contains at least three records and the frequency of each sensitive value does not exceed the threshold 1/3, even though the sets B,F,L and MD have been excluded by the adversary. Thus, the attacks in [22] can be prevented. Specially, the exclusion of set MD from releases has no effect on reaching privacy acquirements, so MD-attack in example 1 can be resisted. Besides, the frequency of ss-record in groups does not exceed 1/4, the SS-attack (example 2) can be also thwarted.

Algorithm and analysis
In this section, we propose an heuristic algorithm to achieve PPMS(k, θ, α)-bounding. The related definitions and symbols are in the section 4.1. We give specific steps and illustration of the algorithm in section 4.2, meanwhile, the analysis and lemmas on which the algorithm depends are also included.

Definitions and symbols
Before stating the algorithm, we introduce new symbols as follows: Substantial symptoms-record(ss-record): as mentioned earlier, for a record t(t2T), if t has many more symptoms/adverse drug reactions than others in T, then t is an ss-record in table T.
New-record (n-record): for a record t(t2T), t's caseid is its initial appearance in a released table. That is, t is a n-record in table T.

Old-record(o-record): for a record t(t2T), t's caseid is not the initial appearance in a released table. That is, t is an o-record in table T.
Medication discontinuation-record(md-record): for a record t(t2T i ), t's caseid will not appear in next table T i+1 , t is a md-record in T i . nx-record(x2{ss, n, o, md}): if t is not x kind of record, t is nx-record. For instance, if t is not a md-record in table T, then t can be denoted as nmd-record in T.
x&y-record(x, y2{ss, n, o, md}): if t is x kind of record and y kind of record, then t is an x&y -record. For instance, if t is a n-record and md-record in table T, then t can be denoted as n&md-record in T.
According to the background knowledge, the adversary can derive four views for an anonymous group G. Assume the adversary knows the target individual P's QIAs values, and learns that the target is in a released table.
View 1: the adversary knows that individual P is in a specific group G, P appears for the first time in released tables and P stops medication in next quarter. We denote view 1 of group G as GV 1 . GV 1 contains all the n&md-records in G.
View 2: the adversary knows that individual P is in a specific group G, P appears for the first time in released tables. We denote view 2 of group G as GV 2 . GV 2 contains all the n-records in G.
View 3: the adversary knows that individual P is in a specific group G, P stops medication in next quarter. We denote view 3 of group G as GV 3 . GV 3 contains all the md-records in G.
View 4: the adversary knows P is in a specific group G. We denote view 4 of group G as If the adversary can disclose privacy in any one of these views, the anonymous group G is unsafe. Thus, we have to provide privacy protection in all these four views.

Algorithm of PPMS(k, θ, α)-bounding
We present an algorithm called HA to achieve PPMS(k, θ, α)-bounding. The overview of this algorithm is as shown in Algorithm 1. The algorithm merges records with the same caseid into super records firstly (line 1). Next, it achieves QID-bounding strategy to prevent F-attack (line 2-line 6). Then, the algorithm groups the records in current table with procedure Grouping (line 7). Last, the algorithm anonymizes the table and releases it (line 8-line 9). Our algorithm has made three changes to the PPMS_EAR [22]. We first redefine the privacy risk to satisfy θ-bounding when medication discontinuation-attack is considered (the change 1). The introduce and analysis of the change 1 are as follows. Lemma 1. To resist Medication discontinuation-attack, for any sensitive value v, the allowed largest number of v in a group G is as formula (1).
Proof. It is easy to know that η v (G)/|GV 1 |�θ v . Meanwhile, it is clear that η v (G)/|GV x |�θ v because |GV x |�|GV 1 | (x2{2, 3, 4}). Thus, the frequency of v will not exceed the threshold θ v in the four views of group G. The proof is completed.
The privacy risk is the same as that in PPMS_EAR [22] except η v (G), it is as formula (2).
8 > < > : can evaluate the privacy risk caused by record t's sensitive value v after G including t. The occurrence of v in G is denoted as σ v (G). In fact, a record usually has multiple sensitive values in ADE data, thus the privacy risk caused by record t is as formula (3).
( S t denotes the set of all the sensitive values contained by record t. The difference of information loss (4IL(G, t)) between group G and group G[t is the same as in PPMS_EAR. Thus, we can get the4PRIL(G, t) [22] which is as formula (4).
A record t has less 4PRIL(G, t) [22] with a greater probability to be included in group G.
This use of 4PRIL(G, t) is shown on line 10 of the procedure grouping which will be introduced with the change 2. 4PRIL(G, t) = 1 represents that the inclusion of t will break the θbounding, thus t cannot be included in G when 4PRIL(G, t) = 1. Therefore, the change 1 is actually about the redefinition of 4PRIL(G, t) with the consideration of MD-attack. Now we illustrate the change 2 which is included by the procedure grouping. In the procedure grouping, to resist medication discontinuation-attack, for each group G, the |GV 1 | of G should be no less than k (the change 2, line 7-line 18). When |GV 1 | = k, group G is completed; otherwise, the grouping of G will continue. Thus, the k-bounding can be guaranteed. Besides, the procedure processes the remaining records after the completion of grouping step (line 25-line 30).
The third change is also in procedure grouping. We should verify if the α-bounding can be satisfied. However, before G is completed, |GV x | (x2{2, 3, 4}) cannot be known. Thus, we find a way to verify α-bounding in the group process.
Our algorithm quits predetermining the maximum number of ss-records, so it is more flexible. The experimental results also show that our heuristic algorithm can maintain the usability of released tables when has more stringent privacy requirement.

Experimental results and analysis
In this section, we compare our method (PPMS(k, θ, α)-bounding achieved by HA) with PPMS_Ear [22]. We implement both the methods with Microsoft Visual C++ 2015. All the experiments are conducted on a PC with Intel Core 2.60 GHz CPU and 8 GB main memory, running the Microsoft Windows 10 operating system.
We analyze the methods from security and information loss. The 14 recent datasets are chosen from FEARS of FDA:2014Q3-2017Q4. The quasi-identifiers (QIAs) and sensitive attributes(SAs) are the same as in [22],

Security
Dangerous Identity Ratio(DIR) [21][22] and Dangerous Sensitivity Ratio (DSR) [21][22] are used to evaluate the security of publishing methods. We call a group as a dangerous identity group (DIG) if the number of records in the group is less than threshold k. If a group contains at least one sensitive value v i whose frequency is higher than its threshold θ i , we call it as a dangerous sensitivity group (DSG). DIR/DSR represents the ratio of DIG/DSG in all anonymous groups.
For measuring the ability to resist substantial symptoms-attack, we define the substantial symptoms group ratio (SSGR). If the frequency of ss-record in a group is higher than threshold α, we call the group as substantial symptoms group(SSG). Similar to DIR and DSR, SSGR represents the ratio of SSG in all anonymous groups.
As shown in Figs 1 and 2, the DIR and DSR of PPMS_Ear are both greater than 0, because PPMS_Ear has not taken the medication discontinuation-attack into consideration. The DIR and DSR are even greater than 10% in some released tables of PPMS_Ear. An adversary can compromise the privacy requirements in the released tables of PPMS_Ear which is vulnerable to the medication discontinuation-attack. lMeanwhile, PPMS(k, θ, α)-bounding has considered that an adversary may disclose privacy through information about medication discontinuation. Therefore, PPMS(k, θ, α)-bounding can avoid Medication discontinuation-attack, DIR and DSR are both 0.
The SSGR of the two methods is shown in Fig 3. We can see that the SSGR of PPMS(k, θ, α)-bounding is 0 because our method can thwart SS-attack with limiting the frequencies of ssrecords. However, PPMS_Ear cannot resist SS-attack, hence its SSGR in some released tables is greater than 10%. The settings of θ(θ = 0.4, θ = bf) are omitted, because they generate similar results.

Information loss
Normalized Information Loss(NIL) [21][22] is used to evaluate the information usability. As shown in Fig 4, we can see that the NIL of PPMS(k, θ, α)-bounding is very close to PPMS_Ear. Our analysis result is that procedure Jugde_α_bounding achieves α-bounding through a heuristic method, and it is easier for records to be incorporated by groups in this method. The HA estimates the frequencies of ss-records to guarantee α-bounding when grouping. Compare with predetermining the maximum number of ss-records in groups, this heuristic method can "accommodate" more ss-records in each anonymous group while the privacy requirement is not compromised. Therefore, our method can have similar information loss with PPMS_Ear even if the privacy requirements of ours are more stringent.

Discussion
The tradeoff between privacy and utility is the focus in data publishing. The traditional periodical data publishing mechanisms [9][10][11][12][13][14] are not suitable for the SRS data due to its some special

PLOS ONE
Improved periodical SRS publishing features. The existing SRS data publishing methods [22][23][24][25] had considered these types of attacks in the scenario of SRS data publishing: B-attack, F-attack and L-attack, trying to guarantee the information usability of the released tables. However, the SRS data has some special features, such as identity preservation and multivalued sensitive attributes, which make it more vulnerable to security threats than the other types of data. We discover and define two new attacks in this paper: MD-attack and SS-attack, and find that the existing SRS data publishing methods cannot resist them. In order to solve these problems, we present a new SRS data publishing model and the corresponding heuristic algorithm in this paper. We consider these attacks in the evaluations of DIR and DSR: MD-Attack, B-attack, F-attack and L-attack, the related experimental results (Figs 1 and 2) show that all these attacks can be resisted by the proposed method. The evaluation of SSGR is used to analyze the SS-attack, and the corresponding results (Fig 3) also demonstrate that the proposed method can thwart this type of attack. Thus, the results of security evaluations exhibit that the proposed method can defend against the various known attacks. For the information usability, the related experimental results (Fig 4) demonstrate that the information loss degree of the proposed method is similar to the existing one. The results of information usability evaluation suggest that the proposed heuristic algorithm is an effective way to decrease the information loss when the anonymous standard becomes more stringent.
According to the above experimental results of security and usability, we can know that the proposed method can provide better protection than the exiting SRS publishing methods and the guaranteed information usability can be provided by our method. Compared with the existing SRS publishing methods, our method can achieve a better balance between privacy security and information usability.

Conclusion
In this paper, we consider medication discontinuation and substantial symptoms in periodical ADEs data publishing, and propose a new periodical ADEs data publishing method, which can resist medication discontinuation-attack and substantial symptoms attack. The experimental results also show that our method can protect against various known attacks, and can enhance the security based on PPMS_Ear. Besides, comparing with PPMS_Ear, our method does not have an obvious increase in information loss.
Several directions for future work are also initiated by this work. First, it would be interesting to study personalized anonymity [26] in SRSs data publishing. This technique enables individuals to specify privacy levels to their own sensitive information, in order to yield a better tradeoff between privacy security and information utility. Second, it would be exciting to extend the proposed method to be applicable for big data publication [27]. Research towards this direction may discover effective parallel algorithms with guaranteed privacy security and information usability.