Coupling News Sentiment with Web Browsing Data Improves Prediction of Intra-Day Price Dynamics

The new digital revolution of big data is deeply changing our capability of understanding society and forecasting the outcome of many social and economic systems. Unfortunately, information can be very heterogeneous in the importance, relevance, and surprise it conveys, affecting severely the predictive power of semantic and statistical methods. Here we show that the aggregation of web users’ behavior can be elicited to overcome this problem in a hard to predict complex system, namely the financial market. Specifically, our in-sample analysis shows that the combined use of sentiment analysis of news and browsing activity of users of Yahoo! Finance greatly helps forecasting intra-day and daily price changes of a set of 100 highly capitalized US stocks traded in the period 2012–2013. Sentiment analysis or browsing activity when taken alone have very small or no predictive power. Conversely, when considering a news signal where in a given time interval we compute the average sentiment of the clicked news, weighted by the number of clicks, we show that for nearly 50% of the companies such signal Granger-causes hourly price returns. Our result indicates a “wisdom-of-the-crowd” effect that allows to exploit users’ activity to identify and weigh properly the relevant and surprising news, enhancing considerably the forecasting power of the news sentiment.


Financial time series
We consider only trading hours (i.e. from 9:30 AM to 4:00 PM) and trading days (i.e. business days at the New York Stock Exchange), and we disregard the trading events that occur out of this time window. From the financial data we create three time series: the first one is the traded volume for each minute for each company, the second one is the logarithmic returns, and the last time series is the absolute value of the logarithmic returns.
1. Trade volume (V) time series. It consists of the traded volume of a given company at minute time scale. We build a time series at time scale t by summing the traded volumes v τ at the smallest time scale τ (in our case one minute). Then, the total volume v t can be defined as follows: 2. Price return (R) time series. We preliminary compute the logarithmic return, defined as r t = log 10 ( p t p t− 1 ) , where p t is the last recorded price in the interval (t − 1, t]. Returns at intra-day time t are rescaled by a factor ζ r t , which is computed as the average, over all days, of absolute returns at time t rescaled by the average volatility. More precisely, if r d,t is the raw return of day d and intra-day time t, we define the rescaled time series as where and where we compute the mean value for all t in a day d. 3. Volatility (σ) time series. We employ as a simple proxy for the de-seasonalized intra-day volatility the absolute value of the rescaled return Notice that the volatility σ t is already de-seasonalized, because of the definition of R

Sentiment analysis
We discuss here how we tagged the news and how we performed the sentiment analysis. The news data contained in the log of Yahoo! Finance were marked with relevant stocks, identified through two different tagging methods. A first type of annotations was provided by a team of editors, who manually assessed the articles published on the Yahoo! Finance website and marked them with companies which the content of the article was relevant for. In addition to that, relevant companies were also identified by applying a proprietary entity-recognition tool. We discarded all the articles that were tagged with more than 4 companies. This threshold was chosen heuristically in a preliminary assessment, where we verified that articles tagged with 5 or more companies generally corresponds to aggregated periodic reports, which mention long lists of stocks without being really specific about any of them. Among the obtained articles, we further processed those tagged with more than one company, introducing some additional filters to make sure we would consider the news for a company when the content of the article was really relevant to it, and not in the case where a company was only tagged due to a casual mention in the text. Such additional filters consisted in only retaining the stocks that were mentioned either in the title or in the first paragraph of the article, which typically contains a summary of the key concepts discussed in the article. Each article then contributed to the time series of all the stocks it was tagged for.
As explained in the main text, we have used SentiStrength adding a list of sentiment keywords of special interest and significance for the financial domain. We have tested the classification in good, bad, or neutral news using the general dictionary. In 84% of the cases the classification is the same by using both dictionary. This result indicates that our classification and the consequent analysis is pretty robust to the choice of dictionary. Interestingly, 17% of the news classified as neutral by the general dictionary are classified as positive or negative by using the financial dictionary, while only 8% of the news classified as neutral by the financial dictionary are classified as positive or negative when using the general dictionary. This suggests that the use of a financial dictionary sharpens the capability of giving a positive or negative sign to a news.

Click and sentiment time series
Consistently with the financial time series, we consider only trading time and trading days. This implies that we neglect the clicks that occur out of this time window, because we are only interested in the click behaviour during the trading hours evolution of markets. From the click-through history of each news we create two time series: the first one is the total amount of clicks for each minute for each company, and the second one is the number of clicks multiplied by the sentiment score of the associated news.
The news that are not viewed in the time interval have zero clicks. We filter out the intra-day pattern from clicks by means of a simple methodology. Clicks at intra-day time t are rescaled by a factor ζ c t , which is computed as the average, over all days, of the click volume at time t normalized by the total number of daily clicks. More precisely, if c d,t is the raw click volume of day d and intra-day time t, we define the time series of rescaled clicks as: where and with Γ d the total number of clicks in a day.
2. Sentiment (S) time series. To construct this time series we consider the sentiment of the headlines of the news. With the same notation previously used where s i τ is the sign (−1, 0, 1) of the sentiment of a news headline published at time τ . Table 1. Tail exponent α and lower bound x min > 0 with standard errors α and xmin estimated from 100 highly capitalized stocks traded in the US equity markets. Following the procedure detailed in [1], we estimate the probability distribution associated with integer numbers larger than x min whose expression reads to p(x) = x −1−α /ζ(1 + α, x min ). The normalizing constant corresponds to the Hurwitz zeta function ζ. We report in bold the top 10 assets, as measured by the absolute number of associated news.
3. Weighted Sentiment (WS) time series. We multiply the click count of each news by its sentiment score. With the same notation of the click time series, we have: This way, we weight the sign of every news by the level of attention that the news has received. We remark that for each trading day we consider all the clicks of all the news clicked in that day, not only the news that have been released in that day. Then, we define the weighted sentiment as

Estimation of power law exponents
We estimate the power law tail exponent of the distribution of the number of clicks per news by employing the R package PoweRlaw developed and maintained by Colin Gillespie, and described in [1].

Multiple hypotheses testing
In Tables 2 and 3 we report the results for tests of zero Spearman correlation and zero Granger causality, respectively, under the very conservative correction proposed by Bonferroni. If N t tests are performed and the desired significance is p (in our case 5%, then only the tests with a p-value smaller than p/N t are rejected. Since we perform 100 tests, our corrected p-value is 0.05%.