Public Mood and Consumption Choices: Evidence from Sales of Sony Cameras on Taobao

Previous researchers have tried to predict social and economic phenomena with indicators of public mood, which were extracted from online data. This method has been proved to be feasible in many areas such as financial markets, economic operations and even national suicide numbers. However, few previous researches have examined the relationship between public mood and consumption choices at society level. The present study paid attention to the “Diaoyu Island” event, and extracted Chinese public mood data toward Japan from Sina MicroBlog (the biggest social media in China), which demonstrated a significant cross-correlation between the public mood variable and sales of Sony cameras on Taobao (the biggest Chinese e-business company). Afterwards, several candidate predictors of sales were examined and finally three significant stepwise regression models were obtained. Results of models estimation showed that significance (F-statistics), R-square and predictive accuracy (MAPE) all improved due to inclusion of public mood variable. These results indicate that public mood is significantly associated with consumption choices and may be of value in sales forecasting for particular products.


Introduction
Traditionally, behavioral economics emphasized the effect of emotion on individual behavior and decision-making. When there are enough people in the same mood, it may have considerate influence on related events. For instance, public mood has been shown to have an influence on presidential election [1], economic indexes [2], fluctuation of financial markets [3][4][5] and even the number of national suicides [6]. Moreover, public mood information is also valuable in predicting related events and indexes [7][8][9][10][11]. However, few prior studies have examined the relationship between public mood and consuming choices at society level, let alone applying public mood information to make a prediction.
In previous studies, public mood information was extracted from news titles, survey data [5], searching data from Google [2] as well as social media [1,[3][4][5][6]10]. Among these sources, data from social media has been widely applied in forecasting financial indexes [3,4], economic indexes [2] and suicide numbers [6]. Moreover, compared with offline data, social media data has been demonstrated to be more accurate in predicting financial indexes, which also has a greater lead time [5].
The territorial dispute between China and Japan has been around for a long time and the conflict was more intense in July 2012 because Japanese government announced the ownership of Diaoyu Island at that time. This led to a sharp fluctuation of Chinese public mood and subsequently changed Chinese consumption choices of Japanese products.
With the rapid development of IT and social media in China, to extract public mood information from social media becomes available, and makes it possible to study the relation between public mood and consumption choices. Because Sony camera is a familiar Japanese product and is widely consumed in China, it was adopted to study the influence of Chinese public mood on the sales of Japanese products during the period of the "Diaoyu Island" event.
The key points of this study were to extract public mood data from social media and to explore its relationship with the sales of Japanese products. This relationship would contribute to the building of forecasting models with public mood as independent variable and sales data as dependent variable.

Sales data
Without official authorization, it would be impossible to collect complete sales data (daily or weekly) of Sony camera in China. Considering that Taobao (www.taobao.com) is the biggest Chinese e-business company with more than 3.7 hundred million members and tens of millions daily deals, we chose Taobao as sales data resource. In 2012/10/10, we collected daily sales of Sony camera (S t ) from 2012/8/1 to 2012/10/8 on shu.taobao.com. These data can be seen in S1 Dataset.

Social Media Data
Sina Microblog is one of the leading social media in China, users of which include sport and movie stars, enterprise managers, media practitioners, government officials and other people from nearly all industries. Thus, Sina Microblog was chosen as the data source of public mood information.
As has been mentioned above, Japanese government announced the ownership of Diaoyu Island in July, 2012, which led to a fierce diplomacy conflict between China and Japan. In the present study, we defined the measurement: daily original blogs (B t ) as the number of daily original blogs simultaneously mentioning the Chinese words "Diaoyu Island", "territory", "sovereignty" and at least one of the following terms: "boycott of Japanese goods", "defending Diaoyu Islands", "defending sovereignty", "defending territory", "fighting for sovereignty", "fighting for territory", "protesting against the Japanese government", "disdaining Japan" and "disdaining Japanese government". The daily original blogs (B t ) might reflect negative public mood of Chinese toward Japan and Japanese products. A similar method has been adopted in previous studies. [5,6] For example, one study used the Tweet volumes of financial search terms (TV-FST) as negative mood variable of financial market. [5] And another one used the daily document frequency mentioning particular words as negative public mood variable to forecast national suicide numbers. [6] All the media data used in this study can be collected from www.weibo.com, and we collected these data in 2012/10/10. These data can be seen in S1 Dataset.

Ethics Statement
This study collected existing data that were publicly available on the Internet. No individual and personal details were identified. Therefore, ethics approval was deemed unnecessary.

Statistical analysis
Cross-correlation analysis is the basic method of forecasting a time series with another one and cross-correlation coefficients can be very helpful in building prediction models. However, this method is not always working. A better method is to find suitable functions to change time series and let the changed time series to be expectedly significantly cross-correlated. With the new cross-correlation, forecasting models can be built including autocorrelation of dependent variables. The key point in exploring correlational relationship in big data is also finding the appropriate functions to change variables and make them significantly correlated.
Based on these rules, cross-correlation analysis between daily original blogs (B t ) and daily sales of Sony camera (S t ) was conducted at first place in order to explore direct cross-correlation. Afterwards, we explored appropriate functions to change the two time series data to get a better cross-correlation. After that, partial autocorrelation analysis of daily sale data of Sony camera (S t ) was conducted to study the effect of advance sales on later ones. Finally, three stepwise regression models were built with the two correlation coefficients (cross-correlation and partial autocorrelation) and were evaluated according to Mean Absolute Percentage Error (MAPE). All statistical analyses including variable selection and models construction were performed using SPSS19.0.

Trends of daily original blogs (B t ) and daily sales of Sony camera (S t ) with cross-correlation analysis
Over the 70-days period of this study, both daily original blogs (B t ) and daily sales of Sony camera (S t ) experienced obvious fluctuations (Fig 1). Compared with S t , B t experienced fiercer variation, booming from 2 to nearly 30000. Cross-correlation analysis between S t and B t was also conducted, which was shown in Table 1.
From Fig 1, we did not find the relationship between B t and S t intuitively, mainly due to the wide range of B t . We speculated that there might be a cross-correlation between the two time series data and the result (Table 1) supported our conjecture. In the lags (days) of 1, 5, 6, B t was significantly cross-correlated with S t .
Exploring transformation functions for daily original blogs (Bt) and daily sales of Sony camera (St) The first chosen transformation was moving average, since it would reduce drastic fluctuations of the two time series data (B t and S t ). Then, we tried many different transformation functions including moving average, logarithmtics and combination of them with different parameters as far as possible. We ultimately determined the following combination of transformation functions as it was the best one in all groups we tried with higher significance level and better  Table 1. Results of cross-correlation analysis between daily original blogs (B t ) and daily sales of Sony camera (S t ).
The trends of transformed time series data (X t and Y t ) could be seen in Fig 2. From the result, we could conjecture that there were negative correlations between X t and Y t intuitively. According to the curves of X t and Y t , the 70-days periods could be further divided into six subperiods: which are 2012/8/1 to 2012/8/13, 2012/8/14 to 2012/8/25, 2012/8/26 to 2012/9/10, 2012/9/11 to 2012/9/24, 2012/9/25 to 2012/9/30 and 2012/10/1 to 2012/10/8 respectively. During each sub-period, the negative cross-correlation between X t and Y t was more obvious.

Cross-correlation test of public mood variable (Xt) and Camera sales variable (Yt)
In Fig 3, there were ten significant cross-correlation coefficients (p<0.05), which were X t-6 and Y t , X t-5 and Y t , X t-4 and Y t , X t-3 and Y t , X t-2 and Y t , X t-1 and Y t , X t and Y t , X t+1 and Y t , X t+2 and Y t , X t+3 and Y t . Among these coefficients, the latter four were meaningless in predicting camera sales with public mood variable either because the forecasting direction was reversed (applying camera sales variable to predict public mood variable) or the advance lag is 0 (X t and Y t ). If we had applied the 6 preceding public mood variables (X t-6 , X t-5 , X t-4 , X t-3 , X t-2 and X t-1 ), there might be a high multicollinearity in regression model.
Thus, in order to determine what public mood variables should be chosen in final prediction models, we conducted regression analysis for each public mood variable (X t-6 , X t-5 , X t-4 , X t-3 , X t-2 and X t-1 ) and camera sales variable (Y t ). The results were shown in Table 2.
From the statistics, we could see that X t-3 had the highest t-value, the lowest p-value and the highest R-square, so the public mood variable (X t-3 ) would be included in final prediction models. Moreover, results of stepwise regression for all public mood variables (X t-6 , X t-5 , X t-4 , X t-3 , X t-2 and X t-1 ) and camera sales variable (Y t ) also supported this choice: Y t ¼ 259:720 À 12:027X tÀ3 F ¼ 30:069; p < 0:001; t ¼ À5:484; p < 0:001 In the above stepwise regression, other public mood variables (X t-6 , X t-5 , X t-4 , X t-2 and X t-1 ) were all removed excluding X t-3 .
Practically, when we use cross-correlation test to predict a time series data with another one, valuable information in the forecasted variable might be ignored. Therefore, autocorrelation of camera sales variable (Y t ) would be studied in the next section.

Autocorrelation and autoregression analyses of camera sales variable (Yt)
Since partial autocorrelation coefficients correspond to autoregression models of time series data, we made autocorrelation analysis of Y t (Fig 4).

Multiple regression models for Sony camera sales
Adopting the selected variables (X t-3 , Y t-1 and Y t-2 ), we built prediction models for camera sales (Y t ) applying stepwise regression (Table 3).
In Table 3, we could see that there were three significant models (all p-value<0.001) including different independent variables: Model 1 only included a sales variable (Y t-1 ), and Model 2 included a sales variable (Y t-1 ) and a public mood variable (X t-3 ). Model 3 included two sales variables (Y t-2 and Y t-1 ) and a public mood variable (X t-3 ). Equations of these models were as Table 2. Regression analysis for each public mood variable (X t-6 , X t-5 , X t-4 , X t-3 , X t-2 and X t-1 ) and camera sales variable (Y t ).

Model
Variables Regression Coefficient T P Adjusted R-square

Model estimation
In order to test the value of public mood variable in prediction of camera sales, we compared F statistic, R-square and prediction accuracy of Model 1 and Model 2. Forecasting accuracy was measured in terms of Mean Absolute Percentage Error (MAPE). The MAPE was defined as follows: Where A t was the actual value and F t was the predicted value at the time point t.
The results could be seen in Table 4, which showed that inclusion of the public mood variable (X t-3 ) (1) promoted significance (reduction of F-statistic, 53.146->32.575), (2) while increased R-square (0.454->0.508) and (3) reduced MAPE prediction error (12.70->11.35). Therefore, we would conclude that public mood could influence consumption choices, which was an appropriate indicator in forecast of sales.

Discussion
In previous studies, some scholars attempted to extract public mood indicators from a huge amount of online data (e.g. search engine and social media data) and studied their prediction validity in presidential election [1], economic operations [2], financial market indexes [3][4][5], and even the number of national suicides [6]. These researches [1][2][3][4][5][6] have demonstrated that it is feasible to extract public mood indicators from online data to make predictions. However, few previous researches have examined relationship, specially forecasting relationship between public mood and consuming choices at society level.
Concerned with the "Diaoyu Island" event, this study extracted Chinese public mood information toward Japan and Japanese products from social media, and then analyzed the crosscorrelation between the public mood variable and sales variable applying suitable functions. Finally, three prediction models for Sony camera sales (Y t ) were built, with public mood information and advance sales as independent variables. The results showed that: (1) the public mood variable could be significantly cross-correlated to the sale variable of a particular product and this correlativity could be used to build prediction models; (2) adding public mood variable in prediction models would promote the significance (reducing F-statistic), increase R-square and reduce MAPE prediction error in the prediction models. These results indicated that public mood was significantly associated with consumption choices and might be of value in sales forecasting for particular products. This study was the extension and supplement of previous data mining researches of online big data.
The main contributions of this study are as follows: 1) the present study paid attention to the "Diaoyu Island Event" between China and Japan, and empirically studied the influence of public mood on related consumption, which had not been studied by previous researches; 2) beyond correlation between public mood and related consumption, this study found that public mood might be valuable in forecasting sales of particular products; 3) the current paper discussed the approach of applying variable transformation to probe the correlativity between time series data, which might be a new way to analyze online big data in the future.
Supporting Information S1 Dataset. Data of daily original blogs (Bt) and daily sales of Sony camera (St). (XLSX)