Determining Fuzzy Membership for Sentiment Classification: A Three-Layer Sentiment Propagation Model

Enormous quantities of review documents exist in forums, blogs, twitter accounts, and shopping web sites. Analysis of the sentiment information hidden in these review documents is very useful for consumers and manufacturers. The sentiment orientation and sentiment intensity of a review can be described in more detail by using a sentiment score than by using bipolar sentiment polarity. Existing methods for calculating review sentiment scores frequently use a sentiment lexicon or the locations of features in a sentence, a paragraph, and a document. In order to achieve more accurate sentiment scores of review documents, a three-layer sentiment propagation model (TLSPM) is proposed that uses three kinds of interrelations, those among documents, topics, and words. First, we use nine relationship pairwise matrices between documents, topics, and words. In TLSPM, we suppose that sentiment neighbors tend to have the same sentiment polarity and similar sentiment intensity in the sentiment propagation network. Then, we implement the sentiment propagation processes among the documents, topics, and words in turn. Finally, we can obtain the steady sentiment scores of documents by a continuous iteration process. Intuition might suggest that documents with strong sentiment intensity make larger contributions to classification than those with weak sentiment intensity. Therefore, we use the fuzzy membership of documents obtained by TLSPM as the weight of the text to train a fuzzy support vector machine model (FSVM). As compared with a support vector machine (SVM) and four other fuzzy membership determination methods, the results show that FSVM trained with TLSPM can enhance the effectiveness of sentiment classification. In addition, FSVM trained with TLSPM can reduce the mean square error (MSE) on seven sentiment rating prediction data sets.


Introduction
Following the popularization of forums, blogs, and online shopping websites, amount of usergenerated reviews are growing explosively [1]. Techniques for extracting, arranging, and drawing conclusions from these multitudinous reviews, and in particular, for classifying them according to their sentiment orientation and sentiment intensity are receiving an increasing amount of interests from researchers and manufacturers [2]. In general, customers frequently utilize the Internet to search for related comments about an item before purchasing. Meanwhile, manufacturers want to obtain the customers' advice so as to improve the product design as well. Thus, the classification of this information according to sentiment tendency is very convenient for both manufacturers and customers. Sentiment classification is aimed to recognize sentiment information hidden in the texts automatically, for example, opinions, emotions, and standpoints [3]. In addition, the applications of sentiment classification are also extensive, such as text filtering, e-business, and public opinion prediction [4].
As compared with traditional classification tasks, sentiment classification is relatively challenging. A deep semantic analysis of the documents is required to judge the sentiment orientation [5,6]. Supervised machine learning models, such as, support vector machine (SVM), decision tree, and bayesian classification, have been applied to the text sentiment classification task. Among those models, SVM has achieved effective results [7]. However, SVM assigns equal weight to all samples, while different samples affect or contribute to the classification surface very differently [8,9]. Fuzzy support vector machine (FSVM) introduces fuzzy membership to the SVM. Each sample is assigned a value of fuzzy membership. The samples which are noisy data or make small contributions to the classification have a lower weight, and the samples that make greater contributions to the classification have a higher weight. Using this strategy, FSVM gives different fuzzy memberships to samples contributing a different amount to classification [10,11]. Comparing with SVM, FSVM can improve the classification accuracy and reduce adverse effects from the noisy data.
Clearly, sentiment scores can describe the sentiment orientation and sentiment intensity of documents in great detail. It is hard for human beings to estimate the accurate sentiment score of a specified document and the results are also unreliable [12]. Therefore, techniques for capturing sentiment scores automatically are very important. In order to obtain the sentiment score of review documents, researchers have adopted a sentiment lexicon to count the positive and negative words and their sentiment intensity. In addition to the sentiment words and sentiment lexicon, researchers have also used the distance to the class centroid for measuring the fuzzy membership [13]. FSVM has been proved to be effective in theory and applications for classification task. In sentiment classification, we should construct the membership function according to the characteristics of data set and data features. In FSVM, the key is to determine the appropriate fuzzy membership of samples. Fuzzy sentiment membership should reflect the contribution degree of a document to sentiment classification. Generally, we think that strong sentiment intensity of positive or negative documents make large contributions to sentiment classification, while weak sentiment intensity samples are unimportant. Therefore the stronger is the sentiment intensity of documents, the bigger is degree of membership to the sentiment labels. To get more accurate sentiment classification results, we use the absolute value of sentiment score as the fuzzy membership to train the FSVM.
To determine the fuzzy sentiment membership of documents, we adopt a three-layer sentiment propagation model (TLSPM). In this context, the so-called three layers refer to documents, topics, and words. First, we construct nine relationship pairwise matrices between documents, topics, and words. The sentiment score of documents, topics, and words are determined by their sentiment neighbors. Then we obtain a steady sentiment score through continuous iterations. In order to achieve better sentiment classification results, we give higher weights to training samples having a strong sentiment intensity of positive or negative polarity, and lower weights to those having weak sentiment intensity. By using these weighted training samples, a text sentiment classifier of an FSVM can be obtained. Fifteen frequently used real-world sentiment data sets, including eight two-class data sets and seven multi-level data sets, were selected to evaluate the effectiveness of the proposed method. As compared with SVM and four other fuzzy membership determination methods, the experimental results show that FSVM trained with TLSPM can increase the accuracy of sentiment classification. In addition, FSVM trained with TLSPM can also reduce the mean square error (MSE) on seven sentiment rating prediction data sets.

Related work
In this section, we briefly review the existing methods for two-class sentiment classification, sentiment rating prediction methods, and application of the topic model to sentiment classification.

Two-class sentiment classification
Traditional text sentiment classification in general divides the reviews into positive or negative categories according to their sentiment orientation [14]. Current methods for two-class sentiment classification can be roughly divided into three approach categories: lexicon-based, semi-supervised, and supervised machine learning [15,16].
Lexicon-based approach. Sentiment lexicons are widely used in the fine-grained sentiment analysis of reviews. The lexicon-based approach calculates the orientation of a document from the sentiment orientation of words or phrases in the document. Turney [17] proposed a simple unsupervised learning algorithm to predict the sentiment orientation using the average sentiment orientation of the phrases in the review. They first identified phrases that contained adjectives or adverbs using the part-of-speech tagger. In their method, the sentiment orientation of a phrase is calculated as the mutual information of the given phrase and the word "excellent"minus the mutual information of the given phrase and the word "poor". If the average sentiment orientation of all its phrases is positive, the review is considered positive, and if negative, the review is considered negative. Ohana and Tierney [18] applied the SentiWordNet lexicon to the problem of automatic sentiment classification of film reviews. They determined sentiment orientation by counting positive and negative term sentiment scores. On this basis, they used machine learning methods to classify the reviews and found the relevant sentiment features using SentiWordNet. Through a comparative experiment, they found that the feature set approach was better than the sentiment term counting approach. Kanayama and Nasukawa [19] first detected polar clauses that conveyed positive or negative aspects, after which they built a sentiment lexicon that comprised polar atoms through an unsupervised method. The polar atoms were defined as the minimum syntactic elements that express sentiment. They used context coherency to obtain candidate polar atoms. They needed only untagged domain corpora and an initial lexicon to select the appropriate polar atoms from among candidates.
Semi-supervised approach. Sometimes, the labeled training data for sentiment classification are precious and scarce, while abundant unlabeled reviews are easier to get. By designing strategies or techniques, semi-supervised methods combine a certain amount of unlabeled data with the labeled data in the learning process. Wan [20] focused on the problem of crosslingual sentiment classification, and leveraged an available English corpus for Chinese sentiment classification by using the English corpus as training data. They first used machine translation methods to reduce the gap between Chinese and English. In their method, English features and Chinese features are considered two independent views of the classification problem. They proposed a co-training approach to utilize unlabeled Chinese data. Li et al. [21] adopted two views, personal and impersonal, and employed them in both supervised and semi-supervised sentiment classification systematically. In this method, personal views consist of those sentences that directly express a speaker's feelings and preference for a target object, while impersonal views focus on statements about a target object for evaluation. Based on this, an ensemble method and a co-training algorithm are explored to employ the two views in supervised and semi-supervised sentiment classification, respectively. Yu et al. [16] proposed a semi-supervised approach to solve the imbalance between the subjective and objective classes in the twitter sentiment task. The emotion sentiments automatically was extracted from the tweets, and the required training data set was selected in an automatic manner. With more and more social media users sharing their opinions with additional images and videos, You et al. [22] presented a cross-modality consistent regression (CCR) model, which was able to utilize both the state-of-the-art visual and textual sentiment analysis techniques.
Supervised machine learning approach. Supervised sentiment classification methods employ mainly supervised machine learning methods, such as decision tree, naive bayes, SVM, and neural networks [23]. Based on the words that convey sentiment, a new feature selection method based on matrix factorization was proposed by Liang et al. [14] to identify the words with strong inter-sentiment distinguish-ability and intra-sentiment similarity. Ye et al. [24] compared three supervised machine learning methods, naive bayes, SVM, and the character-based N-gram model, for sentiment classification of the reviews of travel blogs for seven popular travel destinations. After experimental verification, they found that the SVM and N-gram approaches performed better than the naive bayes method. The three machine learning approaches reached at least 0.8 when the training set was sufficiently large. Facing with encoding the intrinsic relations between sentences in the semantic meaning of document, Tang et al. [1] presented a neural network approach to learn continuous document representation for sentiment classification. They reported that gated recurrent neural network outperformed traditional recurrent neural network. Aiming at target-dependent Twitter sentiment classification task, Vo et al. [15] explored a rich set of neural pooling functions for automatic feature extraction, drawing theoretical correlations behind these functions.

Sentiment rating prediction
It is worth noting that most studies of sentiment classification in general divide the reviews into positive and negative categories. This is because two-class sentiment classification is relatively simple, being concerned only with the polarity of the comments and not considering the sentiment intensity of reviews. As compared with two-class sentiment classification, sentiment rating prediction is a challenging task. Not only does it judge the sentiment orientation, but also it classifies the reviews into more detailed categories [25,26].
Pang and Lee [27] applied a meta-algorithm based on a metric labeling formulation of the problem, which altered a given n-ary classifier's output in an explicit attempt to ensure that similar items were assigned similar labels. They showed that the meta-algorithm could provide significant improvements over both multi-class and regression versions of SVM when a novel similarity measure appropriate to the problem was employed.
Qu et al. [28] captured the sentiment polarity and intensity of N-grams by introducing a novel kind of bag-of-opinions representation. In their method, each opinion is composed of a root word, a set of modifier words from the same sentence, and one or more negation words. For example, in the opinion "not very helpful", "helpful"is the root word, "very"is the modifier word, and "not"is the negation word. On this basis, they obtained the sentiment score of each opinion using a constrained ridge regression method over a large number of domain-independent reviews. The ratings of test reviews were determined using the sentiment score of all the opinions in the review and a domain-dependent unigram model. As compared with the previous sentiment ratings prediction methods, its validation for books, movies, and music data sets showed the effectiveness of the bag-of-opinions model.
Long et al. [29] proposed a novel review selection approach for accurate feature rating estimation. They used a bayesian network classifier to predict the sentiment star for each topic in the reviews. In order to achieve better results, their approach selected only those reviews that were related to the topics by using the Kolmogorov complexity (KC) information measure. The rating estimation of the feature for these selected reviews using machine learning techniques provided more accurate results than that for other reviews. The average of these estimated feature ratings also better represented an accurate overall rating for the feature of the service, which provided feedback that helped other users to choose their satisfactory service.
Snyder and Barzilay [30] formulated the sentiment rating prediction task as a multiple aspect ranking problem, where the goal was to produce a set of numerical scores, one for each aspect. They presented an algorithm that jointly learned ranking models for individual aspects by modeling the dependencies between assigned ranks. This algorithm guided the prediction of individual rankers by analyzing meta-relations between opinions, such as agreement and contrast. They proved that an agreement-based joint model was more expressive than individual ranking models.
Wang et al. [31] proposed a new opinionated text data analysis called latent aspect rating analysis (LARA). To analyze the sentiment star of topical aspects, LARA started with some reviews with sentiment ratings, particular aspects in the reviews, and each reviewer's ratings for a given aspect. To achieve this deeper and more detailed understanding of a review, they proposed a two-stage approach based on a novel latent rating regression model. First, they adopted a bootstrapping method to select the main aspects and segments of reviews. In the second stage, a new generation of the latent rating regression model (LRR) was trained to predict aspect ratings. An important assumption was that the overall rating was generated based on a weighted combination of the latent ratings over all the aspects. Their evaluation using a hotel data set showed the effectiveness of the latent rating regression model and the aspect ratings generation assumption.

Application of the topic model to sentiment classification
The LDA model can detect topics that are implicit in the texts and achieved great success in the text mining field [32,33]. In LDA, the generation process is defined as follows. (i) For each document, extract a topic from the topic distribution; (ii) extract a word from the topic to be able to get above the word corresponding to the distribution; and (iii) repeat the process until every word in the document has been traversed. The application of a topic model in sentiment analysis can improve the performance of sentiment analysis by mining the topics that are implicit in the texts and sentiment preferences for the topics.
Mei et al. [34] proposed a novel probabilistic model to capture the mixture of topics and sentiments simultaneously. They proposed a topic-sentiment mixture model (TSM). TSM first classified the words into two categories, where one category was irrelevant to the topics, and the other was related to the topics. Then, the second category was divided into positive, negative, and neutral sub-categories, and the probability distribution of words in each class was estimated by using an EM algorithm. Finally, particular topic life cycles and the relationship between topics and sentiment were extracted.
Li et al. [35] assumed that topics in texts were relevant to the sentiment and proposed the sentiment topic joint model (Sentiment-LDA model). Based on this, they found that sentiment was independent of the local context, and they proposed the Dependency-Sentiment-LDA model, in which the sentiment of the words in the text formed a Markov chain, and the sentiment of a word was independent of the previous word.
Lin et al. [36] proposed a novel probabilistic model framework called the joint sentimenttopic model(JST), and the re-parameterization JST model called the Reverse-JST model. Both methods were weakly supervised, and therefore, they could easily be adapted to other domains. Joint sentiment-topic models added the sentiment layer into the text layer and topic layer to form four models. The reverse-JST model was also a four-layer Bayesian model, but the sentiment generation process was independent of the topics, as compared with JST.

Three-layer sentiment propagation model
To determine the fuzzy sentiment membership of documents, three-layer sentiment propagation model (TLSPM) makes full use of the relationships between document, topics, and words by using pre-existing tools, such as cosine distance, LDA, and fisher feature selection methods. An important advantage of TLSPM is to construct the sentiment propagation network and its matrix representation between documents, topics, and words. In the sentiment propagation process, the sentiment forming of a document, a topic, or a word is regarded as a propagation process on its carriers. At the same time, the sentiment scores of a document, a topic, and a word are determined by their sentiment neighbors in the sentiment network. The matrix representation of the sentiment network and three kinds of sentiment propagation process including toward document, toward topic, and toward word in the sentiment network are unified in sentiment propagation algorithm. In sentiment propagation algorithm, the sentiments of documents, topics, and words are imaged as a steady state of the sentiment network after the propagation process. After consecutive iterations of sentiment score sets of documents, topics, and words through sentiment neighbors propagation, we get the fuzzy membership set of documents and fuzzy training document set. C 2 = {1, 2, 3, 4, 5}: Sentiment rating prediction label set, where 1 represents a strong negative tendency, 2 represents a negative tendency, 3 represents a neutral sentiment tendency, 4 represents a positive tendency, and 5 represents a strong positive tendency.

Symbols and notions
Each w j in W has a sentiment label. Sentiment score: Sentiment score measures the sentiment tendency and intensity of a document (d i ), topic (t h ), or word (w j ). Their sentiment score are denoted by score(d i ), score(t h ), and score(w j ), respectively.
Fuzzy sentiment membership of a document: The absolute value |score(d i )| of the sentiment score score(d i ) is defined as the fuzzy sentiment membership of a document (d i ).
Fuzzy training document set: The fuzzy training document set is defined as D F train ¼ fðd 1 ; y 1 ; s 1 Þ; ðd 2 ; y 2 ; s 2 Þ; Á Á Á ; ðd n ; y n ; s n Þg, where d i is a document, y i is the sentiment label of d i , s i is the fuzzy sentiment membership of d i , and (d i , y i , s i ) is a fuzzy training sample. S = {s 1 , s 2 , Á Á Á, s n } is the fuzzy membership set.
Three-layer sentiment network: This network is a weighted directed graph, which is used to describe the relationships among documents, topics, and words. A document (d i ) is composed of most relevant topics, meanwhile a topic (t h ) is composed of most relevant words (w j ). In the graph, if a kind of relation is symmetric, the corresponding bidirectional edges are then drawn as undirected lines. The weight of an edge expresses the relation intensity of both nodes linked by the edge. It should be noted that the sentiment information on the network is propagated together with the direction of the edges. A sketch of the structure of this network is shown in Fig 1. Neighbors in the sentiment network: The neighbors of a node (document, topic, or word) in the sentiment network refer to other nodes that link toward the node. The larger the value of the relation intensity between two nodes in the sentiment network, the higher probability of their becoming sentiment neighbors. For example, in the Book domain, the words "good"and "excellent"are linked by a large value of relation intensity, and therefore, they are sentiment neighbors in the sentiment network.  We know that any directed graph can be equivalently represented as its adjacent matrix. It is not difficult to see that the graph G can be divided into nine subgraphs with their adjacent matrices asP,Q,M,Ñ,Ũ,Ṽ,G,H, andZ, respectively.

Matrix representation of the sentiment network
Their definitions are given by constructing their adjacent matrices as below. P: The adjacent matrix between documents. The weight of the edge related to documents d i and d j is defined asP Here, d i and d j are also used to denote the vectors of documents d i and d j , respectively. Q: The adjacent matrix between words. The weight of the edge related to words w i and w j is defined asQ F −1 represents standard normal distribution of the accumulated anti-probability function, p (w i |w j ) is the probability of w i appearing when w j appears in the same window, and pðw i jw j Þ is the probability of w i appearing when w j does not appear. The bi-normal separation (BNS) method was proposed by Forman [37]. We set the window size as 10.

M:
The adjacent matrix from words to documents. The weight of the edge from word w j to document d i is defined asM where tf w j is the word frequency of w j in d i , idf w j is the inverse text frequency of w j , idf w j = 1 + log(N/n w ), N is the total number of documents, and n w is the number of documents that contain the word w j .M ij measures the contribution degree of word w j to document d i .

N:
The adjacent matrix from documents to words. The weight of the edge from document d j to word w i is defined asÑ where tf d j is the word frequencies of d j , idf d j is the inverse text frequency of d j , idf d j = 1 + log(N/ n d ), N is the total number of words, and n d is the number of words that occur in the document d j . d 2 w i means that the document d contains the word w i .
Ũ : The adjacent matrix between topics. The weight of the edge related to topics t i and t j is defined asŨ Here, t i and t j are also used to denote the vectors of topics t i and t j , respectively. V: The adjacent matrix from topics to documents. The weight of the edge from topic t j to document d i is defined asṼ where p(t j |d i ) is the weight of the topic t j in document d i in the LDA results, and measures the contribution degree of topic t j to document d i . G: The adjacent matrix from documents to topics. The weight of the edge from document d j to topic t i is defined asG where p(d j |t i ) is the weight of the topic t i in the document d j in the LDA results, and measures the contribution degree of topic d j to document t i . H: The adjacent matrix from words to topics. The weight of the edge from word w j to topic t i is defined asH where p(w j |t i ) is the weight of w j in t i in the LDA results, and measures the contribution degree of word w j to topic t i . Z: The adjacent matrix from topics to words. The weight of the edge from topic t j to word w i is defined asZ where p(t j |w i ) is the weight of the word w i in the topic t j in the LDA results, and measures the contribution degree of word t j to topic w i .

Sentiment propagation process
Sentiment propagation in this paper refers neither to sentiment propagation among individuals in a social system nor to changes in the sentiment of individuals who already have certain sentiments in a system. In this paper, it means only the acquisition of a more precise sentiment depiction of review documents by using some known exact sentiment information about documents and words, middle layer nodes, i.e., topics hidden in documents, and the relationship among them on the sentiment network through information propagation. It can be assumed that a review document is generated as follows. The author first selects one or more topics which interest him/her and then selects some of his/her favorite words to describe each topic using his/her sentiments. Therefore, in the aspect of generating review documents, the sentiment of a document determines the sentiments hidden in the document, and the sentiment of a topic determines the sentiments of the words that are related to the topic. Conversely, in the aspect of the composition of a document, the sentiment of a topic is determined by the words that are aggregated to the topic; the sentiments of topics and words determine the sentiment of a document. Therefore, we regard sentiment forming as a propagation process on its carriers, documents, topics, and words in the sentiment network. We image the sentiments of documents, topics, and words as a steady state of the sentiment network after the propagation process. Here, we assume that score(d i ), score(t h ), score(w j ) are influenced by its sentiment neighbors. A flowchart of TLSPM can be seen in Fig 2. We construct the sentiment score vector as We next design three kinds of sentiment propagation process in the sentiment network. In the following propagation formulas, α, β, and γ are the document weight, topic weight, and word weight, respectively, and are restrained by α + β + γ = 1.
According to the label propagation algorithm (LPA), which was proposed by Liu and Murata [38], score(d i ), score(t h ), and score(w j ) in the propagation graph G are determined by its sentiment neighbors in the sentiment network. We use its neighbors to refer to other nodes that link toward the node.
At the initialization of the adjacent matrices stage, we prune the propagation graph G and keep k neighbors of each node. We keep k major values of each row inP,Q,M,Ñ,Ũ,Ṽ,G, H, andZ, assign 0 to the others, and normalize each row. In the process of sentiment propagation, we choose only the nearest k neighbors to determine score(d i ), score(t h ), or score(w j ). At each step of the sentiment propagation, we update the sentiment score of each node by using adjacent nodes. The greater the similarity of the adjacent nodes, the stronger is the influence weight of adjacent nodes. ForP,Q, andŨ, we first setP ii ¼ 0,Q ii ¼ 0, andŨ ii ¼ 0, if any row is 0, we assign 1/k to each element in the corresponding row. Then we keep the k major value of each row, and assign 0 to the others. Finally, we normalize each row of the matricesP,Q, andŨ.
ForM,Ñ,Ṽ,G,H, andZ, if any row is 0, we assign 1/k to each element in the corresponding row. Then we keep the k major values of each row, assign 0 to the others, and normalize each of their rows. Remark 3. Initialization and normalization of sentiment score. For the two-class sentiment classification condition, the initial sentiment score of positive reviews is 1 and that of negative reviews is -1. For the sentiment rating prediction condition, the initial sentiment scores of 5-star, 4-star, 3-star, 2-star, and 1-star reviews are 1, 0.5, 0, -0.5, and -1, respectively.
Normalizing score(D) to make Normalizing score(W) to make X w j 2W pos Initializing score(T) to make score(t h ) = 0, 1 h l.
Normalizing score(D) in accordance with Eq (19): Normalizing score(T) in accordance with Eq (20): Normalizing score(W) in accordance with Eq (21): To obtain the steady sentiment score of documents, we repeat the ToD, ToT, and ToW processes until the variation amount of all score(d i ) is less than 0.00001.
We renormalize score(D) using Eq (22) in order to obtain the sentiment score of documents.
The absolute value |score(d i )| of the sentiment score score(d i ) is defined as the fuzzy sentiment membership of a document (d i ). An example of initialization and normalization of sentiment score can be seen in

Sentiment propagation algorithm and discussions
Sentiment propagation algorithm. We use sentiment propagation algorithm to get the sentiment scores of documents, topics, and words in TLSPM. Specifically, for implementing sentiment propagation in the sentiment network, we construct the sentiment network and its matrix representation between documents, topics, and words. The overall sentiment propagation process in the sentiment network can be divided into toward document, toward document, and toward word. We design the sentiment propagation process by using the sentiment propagation Formulas (14)(15)(16) and the normalization of the sentiment score Formulas (17)(18)(19)(20)(21)(22). Generally, we think that strong sentiment intensity of positive or negative documents make large contributions to sentiment classification, while weak sentiment intensity samples are unimportant. The sentiment intensity can reflect the fuzzy membership to the sentiment labels. Therefore we determine the absolute value of sentiment score as the fuzzy sentiment membership. Then we get the fuzzy membership set and fuzzy training document set.
The complete algorithm is described in Algorithm 1.  (17) and (18), initialize score(T) to make score(t h ) = 0; 3 repeat 4 for 1 i n do 5 Calculate score(d i ) using Eq (14); 6 end 7 Normalize score(D) using Eq (19); 8 for 1 h l do 9 Calculate score(t h ) using Eq (15); 10 end 11 Normalize score(T) using Eq (20); 12 for 1 j m do 13 Calculate score(w j ) using Eq (16) Complexity analysis and convergence. In each iteration of TLSPM, we need O(n(n 2 + l 2 + m 2 )) to update score(D) in ToD process, O(l(n 2 + l 2 + m 2 )) to update score(T) in ToT process, and O(m(n 2 + l 2 + m 2 )) to update score(W) in ToW process. The complexity in each iteration is O((n + l + m)(n 2 + l 2 + m 2 )), where n is the number of training documents, l is the number of extracted topics, and m is the number of words. Now, we illustrate the convergency of the algorithm. To keep the convergence of the sentiment propagation algorithm, if any row ofP,Q,M,Ñ,Ũ,Ṽ,G,H, andZ is 0, we assign 1/ k to each element in the corresponding row. In the sentiment network G for sentiment propagation, for any given d i , t h or w j , a path constituted by other documents, topics, or words connected to it must exist. Therefore, the sum of each row inP,Q,M,Ñ,Ũ,Ṽ,G,H, andZ is not 0. This indicates that the sentiment network G is strongly connected and the corresponding matrix G is irreducible. According to [39] and [40], score(D) must be able to converge to a stable value.

Algorithm 1: Sentiment propagation algorithm
Parameter selection. In this paper, we use the validation set to determine the parameters set θ = {k, ntopics, α, β, γ}, where k is the selected number of sentiment neighbors, ntopics is the number of topics, α is the texts weight, β is the topics weight, γ is the words weight.

The loss function is defined as
Where y is the true label of d,ŷ ¼ f ðd; yÞ is the prediction label by FSVM. The parameter optimization goal is to estimate the parameters: Whereŷ is the optimal parameters set, n is the number of training documents.
To get the optimal parameters, we test the influence of parameters on the validation data set in the proposed approach. To test the parameters which influence the accuracy of the algorithm, we fix the other parameters remaining unchanged during the testing, test various parameters influencing the accuracy of the algorithm individually. After obtaining the optimal parameters with the best accuracy on the validation data set, we get the results on the testing data with the selecting the optimal parameters. In the experimental results and analysis section, we test the performance with varying number of neighbors, selected topics, and documents, topics, words weight, iteration times performance with varying number of neighbors on the validation data set.

Sentiment classification
In this section, after describing the manner in which the sentiment scores and fuzzy membership of all training documents are obtained by the sentiment network and the sentiment propagation algorithm (SPA), we introduce their usage in sentiment classification by the FSVM.
In order to obtain a more accurate sentiment orientation of reviews from the testing set, we use s i = |score(d i )| and obtain the fuzzy train set (d 1 , y 1 , s 1 ), (d 2 , y 2 , s 2 ), Á Á Á, (d n , y n , s n ). Then, we use {(d 1 , y 1 , s 1 ), (d 2 , y 2 , s 2 ), Á Á Á, (d n , y n , s n )} to train an FSVM f. Large value of sentiment intensity of d i indicates strong sentiment expression and high fuzzy sentiment membership degree. Therefore d i makes big contribution of to sentiment classification.
The prioritization scheme of the FSVM can be formalized as subject to According to Eqs (25)- (27), we obtain the optimal solution a Ã ¼ ða Ã 1 ; a Ã 2 ; Á Á Á ; a Ã n Þ T . In this type of support vector lies in the edge of the hyper plane. If a Ã i ¼ s i C, this type of support vector is a misclassified sample.
An important difference between the SVM and FSVM models is that points with the same value of a Ã i may indicate different types of support vector because of the value of s i : A small value makes the sample d i less important in the training and a big value makes the sample d i more important to the classification [41].
In this study, we used the Gaussian kernel function Kðd i ; d j Þ ¼ exp as the kernel for constructing the FSVM classifier. The corresponding optimal solution is a Ã ¼ ða Ã 1 ; a Ã 2 ; Á Á Á ; a Ã n Þ T . Therefore, the fuzzy optimal classification is For the two-class sentiment classification task, we use the positive tendency reviews as the positive category samples and negative tendency reviews as the negative category samples to train the FSVM. For the sentiment rating prediction task, the only difference is that we use one of the classes as the positive category and the remaining classes as the negative category to train the FSVM (one versus the rest). For example, in the K-class classification task, we use one of the classes as the positive and the remaining K-1 classes as the negative category to train the FSVM. Finally, we obtain K classifiers {f 1 , f 2 , Á Á Á, f K }. Therefore, each test sample d i has K results: where f K (d i ) is the result of the Kth classifier and ε(f K (d i )) is the confidence of the Kth classifier. Finally, we select the corresponding label of max{ε(

Experimental design Data sets
We constructed experiments using eight two-class sentiment classification data sets and seven sentiment rating prediction data sets. Books (2), DVD (2), Electronics (2), and Kitchen (2) are review sets from Blitzer et al. [42]. Each data set contains 2000 reviews, of which 1000 are positive and 1000 are negative. Notebook (2), Hotel (2) and E-commerce (2) are Chinese review sets from Tan et al. [43]. Each data set has 4000 reviews, of which 2000 are positive and 2000 are negative. Movie (2) is a review set from Pang et al. [27]. This data set contains 50000 reviews, of which 25000 are positive and 25000 are negative. The details of the eight two-class sentiment classification data sets are shown in Table 1. All eight two-class sentiment classification data sets are balanced. Books (4), DVD (4), Electronics (4), Kitchen (4) are review sets from Blitzer et al. [42]. Hotel (5) and MP3 (5) are review sets from Wang et al. [44]. Movie (5) is a review set from Pang et al. [27]. Table 2 shows the sentiment rating distributions of seven sentiment rating prediction data sets. In Table 2, it can be seen that all the sentiment rating prediction data sets, except MP3 (5), are primary balanced. The sentiment ratings distribution of the MP3 (5) data set is unbalanced: the 5-star ranking has the most reviews, and 2-star and 3-star rankings have the least reviews.

Text representation and processing
For 12 English data sets, we use the fisher feature selection method [45] to choose the top-800 effective features after removing the stop words. The top 15 features from the Books (2) data set are "poor, fan, lack, repeat, evidence, negative, disappointment, completely, democracy, classic, level, rich, strange, great, and intelligence". If a word appears in the text, the weight is 1; otherwise the weight is 0. Each review is represented as a bag of words, and then, expressed as a vector space model. For the three Chinese data sets (Notebook (2), Hotel (2), and E-commerce (2)), we first use an MMSEG segmentation algorithm (http://technology.chtsai.org/mmseg) for segmentation. MMSEG is a word identification system for mandarin Chinese text based on two variants of the maximum matching algorithm. Then, we remove stop words taken from the Chinese stopping words vocabulary. Top-1000 features are selected by fisher feature selection method [45] for each Chinese data set. For example, the top 15 features from the Notebook (2) data set are as follows: 外观 (appearance), 不错 (not bad), 配置 (configuration), 漂亮 (beautiful), 很好 (good), 性能 (performance), 麻烦 (trouble), 系統 (system), 满意 (satisfy), 便宜 (cheap), 舒服 (comfortable), 价位 (price), 轻便 (light), 做工 (workmanship), 喜欢 (love)". Then we express each text as a vector space model, and the feature weight is determined by Boolean value.
We use the JGibbLDA model (http://jgibblda.sourceforge.net) to extract topics in the document. -Alpha is set as 50/ntopics, -twords is set as 50, -savestep is set as 200, -niters is set as 1000, -beta is set as 0.1.

Evaluation metrics
In this study, the evaluation metrics [46] for two-class sentiment classification are shown in Table 3.
For sentiment rating prediction, we used accuracy and mean square error (MSE) as the evaluation metrics. The calculation method is where n(right answer) is the number of samples, the output ratings of which are in accordance with the original ratings, N − n is the total number of test samples, i is the number of test samples, answer i is the sentiment rating of the original rating of d i , and result i is the output rating. Following most experiment results that were used in studies in the literature, we extract 60% as the training set, 20% as the validation set, and the remaining 20% as the testing set by using stratified sampling [22,47]. We train the FSVM with TLSPM on the training set, tune parameters on the validation set, and evaluate the effectiveness on the testing set.

Baselines
To verify the validity of TLSPM, we design the following comparison test. The fuzzy membership that generated by Lexicon, Centroid, S-type, Compact, and TLSPM are the input of FSVM. The classification results of FSVM can verify the effectiveness of the listed five fuzzy membership determining methods. • SVM: Use LIBSVM (http://www.csie.ntu.edu.tw/*cjlin/libsvm) with a linear kernel and default parameters [48].
• Lexicon:. SentiWordNet (http://sentiwordnet.isti.cnr.it) 3.0 is a lexical resource publicly available for research purposes and is an improved version of SentiWordNet 1.0 [49]. We used the positivity, negativity, and neutrality scores that are annotated by SentiWordNet3.0. We used the positive sentiment scores minus the negative sentiment scores of all the terms in the review as the final score of a review. Finally, the fuzzy membership is determined by the absolute value of sentiment score.
• Centroid: Fuzzy membership determining mechanism based on distance to the class centroid [8]. The definition of the centroid of the training set S isd ¼ 1 n P n i¼1 d i and the distance between sample d i and class centroidd is defined as The fuzzy membership s i of d i is where r is the radius of the class and r ¼ max fdisðd i ;dÞg: • S-type: Lin and Wang [41] first calculated the distance between d i and class centroidd,  • Compact: Batuwita et al. [13] defined the fuzzy membership as where μ(d i ) is the membership of d i and belongs tod and m k ðd i ;dÞ is a fuzzy connectivity membership to the centroid and is determined by We define s i using Eq (45): μ(d i ) is calculated by Eq (41).

Experimental results and analysis Comparing results and analysis
In order to validate the effectiveness of the presented TLSPM, we designed experiments using 15 real-world sentiment data sets. At the same time, we compared TLSPM with SVM and four other fuzzy membership determination methods. The comparative experimental results on the testing data can be seen in Tables 4-12 Table 13.
In Tables 4-12 and Fig 5, we can see that: 1. The accuracies of the six methods (SVM, Lexicon, Centroid, S-type, Compact, and TLSPM) on the seven sentiment rating prediction data sets are lower than on the eight two-class data sets.
2. As compared with using the SVM method directly, the accuracy of the five fuzzy membership determination methods improves greatly; for example, the Lexicon, Centroid, S-type, Compact, and TLSPM methods improved 0.044, 0.032, 0.029, 0.045, and 0.091, respectively on the Books (2) data set. 3. The accuracy of the Compact method is higher than that of the Centroid and S-type methods on 14 sentiment data sets, but not on the Kitchen (2) data set. For example, the Compact method improved 0.016 and 0.013 over the Centroid and S-type methods on the Electronic (2) data set.
4. TLSPM behaves better than Lexicon, Centroid, S-type, and Compact methods on 15 data sets, for example, TLSPM improved 0.036, 0.038, 0.029, and 0.015 on the Hotel (2) data set.  These results can be concluded as below.
1. As compared with two-class sentiment classification, sentiment rating prediction is a more challenging task, because it must not only judge the sentiment orientations of reviews, but also measure their intensity.
2. SVM has been successfully applied to sentiment classification, but it is sensitive to irrelevant and noisy training samples. The FSVM sentiment classification results that using sentiment score as fuzzy membership are very stable and robust.
3. Although TLSPM is more complex than the other methods, it can achieve more accurate sentiment score of documents. Clearly, the fuzzy membership of documents should be determined by using the semantic relations between the documents, topics, and words other than three universal spatial location fuzzy membership determining methods.

Performance with varying number of neighbors
The second focus of the research was to study the performance with varying number of neighbors on the validation set. ntopics is set as 50, α is set as 0.4, β is set as 0.3, and γ is set as 0.3, we test the accuracy variation when the value of k increases from 5 to 50. The experimental results are given in Fig 6. It can be clearly seen that when the value of k changes from 5 to 50, the accuracy first increases and then stabilizes on (03) Electric (2), (08) Movie (2) and (11) Electric (4) data sets. The accuracy for the remaining data sets first increases and then decreases. Meanwhile, the accuracy of the seven sentiment rating prediction data sets is lower than that of the eight two-class sentiment data sets. The sentiment score of document, topic, or word is likely to be influenced of noise if k is too small. Instead, the sentiment score will be influenced of irrelevant neighbors if the selected number of neighbors is too big.

Iteration times performance with varying number of neighbors
To test the iteration times performance with varying neighbors when the value of k changes from 5 to 50, α is set as 0.4, β is set as 0.3, γ is set as 0.3, and ntopics is set as 50. The results can  be seen in Fig 7. The curves of (02) DVD (2), (04) Kitchen (2), (07) E-commerce (2), (13) Movie (5), and (15) MP3 (5) are very similar: the iteration times first increase and then decrease. The only difference is that the k values that obtain the maximum value are different. The curves of the remaining data sets are very similar, and are not very sensitive to the number of selected neighbors k. This indicates that TLSPM can converge quickly and a large value of k frequently leads to fast convergence.

Performance with varying selected topics
In order to test the accuracy variation when the value of ntopics changes from 10 to 100, k is set as 35, α is set as 0.4, β is set as 0.3, and γ is set as 0.3. The results for the 15 sentiment data sets can be seen in Fig 8. As shown in Fig 8, the curves of the accuracy of the 15 data sets are very similar. The only difference is that the accuracy of the seven sentiment rating prediction sets is lower than that of the eight two-class sentiment data sets. The accuracy first increases and then decreases with the increase number of selected topics.

Performance with varying documents, topics, words weight
To further demonstrate the performance with varying documents, topics, words weight, k is set as 35, ntopics is set as 50, restricted conditions are γ = 1 − α − β and α + β < 1. As shown in Fig 9, we can see that different data sets have different accuracy ternary contour distributions, and the different weights value affect the final sentiment classification results. Specifically, when α, β, and γ have the same value in principle, we get the maximum accuracy. This indicates that words, topics, and documents are all the seem important in determining fuzzy membership of document.

Conclusions and future work
In this paper, a new framework of determining the fuzzy sentiment membership of documents is adopted for sentiment classification task. Our main findings and contributions include the following items. The sentiment score can describe the sentiment orientation and sentiment intensity in great detail. Therefore, sentiment score determination methods are very useful for fine-grained sentiment analysis. Our experiments verified this finding and a better sentiment classification result is achieved by using the sentiment score as fuzzy sentiment membership. SVM has been successfully applied to sentiment classification, but it is sensitive to irrelevant and noisy training samples. Our experiments show that the FSVM model can resolve this problem by assigning different fuzzy memberships to different samples, and different fuzzy membership determination methods lead to different classification results. As is known, sentiment expression is very domain-specific. The same word may have different sentiment orientations and intensity in different domains. Therefore, it is not appropriate to determine the sentiment score of reviews by using an universal sentiment lexicon. The proposed three-layer sentiment propagation model determines the sentiment score of reviews by using the semantic relationships of documents, topics, and words. Therefore, it performs better than three universal spatial location fuzzy membership determining methods (Centroid, S-type, and Compact). For large-scale sentiment classification task, TLSPM need to solve the following problems: hyper-high dimensional space, massive operation time, and so on. To reduce storage space and running time, we plan to use the partition and combination of the adjacent matrices between documents, topics, and words from MapReduce framework. In the later research, we will promote TLSPM to decrease the algorithm complexity in the matrix and validate our method in big data sets.