Predicting Fluctuations in Cryptocurrency Transactions Based on User Comments and Replies

This paper proposes a method to predict fluctuations in the prices of cryptocurrencies, which are increasingly used for online transactions worldwide. Little research has been conducted on predicting fluctuations in the price and number of transactions of a variety of cryptocurrencies. Moreover, the few methods proposed to predict fluctuation in currency prices are inefficient because they fail to take into account the differences in attributes between real currencies and cryptocurrencies. This paper analyzes user comments in online cryptocurrency communities to predict fluctuations in the prices of cryptocurrencies and the number of transactions. By focusing on three cryptocurrencies, each with a large market size and user base, this paper attempts to predict such fluctuations by using a simple and efficient method.


Introduction
The ubiquity of Internet access has triggered the emergence of currencies distinct from those used in the prevalent monetary system. The advent of cryptocurrencies based on a unique method called "mining" has brought about significant changes in the online economic activities of users. Various cryptocurrencies have emerged since 2008, when Bitcoin was first introduced [1,2]. Nowadays, cryptocurrencies are often used in online transactions, and their usage has increased every year since their introduction [3,4].
Cryptocurrencies are primarily characterized by fluctuations in their price and number of transactions [2,3]. For instance, the most famous cryptocurrency, Bitcoin, had witnessed no significant fluctuation in its price and number of transactions until the end of 2013 [3], when it began to garner worldwide attention, and witnessed a significant rise and fluctuation in its price and number of transactions. Other cryptocurrencies-Ripple and Litecoin, for instancehave shown significantly unstable fluctuations since the end of December 2013 [5]. Such unstable fluctuations have served as an opportunity for speculation for some users while hindering most others from using cryptocurrencies [2,6,7].
Research on the attributes of cryptocurrencies has made steady progress but has a long way to go. Most researchers analyze user sentiments related to cryptocurrencies on social media, e.g., Twitter, or quantified Web search queries on search engines, such as Google, as well as fluctuations in price and trade volume to determine any relation [8][9][10][11][12]. Past studies have been limited to Bitcoin because the large amount of data that it provides eliminates the need to build a model to predict fluctuations in the price and number of transactions of diverse cryptocurrencies. Therefore, this paper proposes a method to predict fluctuations in the price and number of transactions of cryptocurrencies. The proposed method analyzes user comments on online cryptocurrency communities, and conducts an association analysis between these comments and fluctuations in the price and number of transactions of cryptocurrencies to extract significant factors and formulate a prediction model. The method is intended to predict fluctuations in cryptocurrencies based on the attributes of online communities.
Online communities serve as forums where people share opinions regarding topics of common interest [13][14][15][16][17]. Therefore, such communities mirror the responses of many users to certain cryptocurrencies on a daily basis. Cryptocurrencies are largely traded online, where many users rely on information on the Web to make decisions about selling or buying them [4,18]. In this paper, daily topics and relevant comments/replies in cryptocurrency communities are analyzed to determine how the opinions of community users are associated with fluctuations in the price and number of transactions of cryptocurrencies on a daily basis.
The proposed method is applicable to a range of cryptocurrencies, and can predict fluctuations in the prices of such cryptocurrencies as Bitcoin, Ripple, and Ethereum to a certain extent (approximately 74% weighted average precision). Moreover, the rise and fall in the number of transactions of Bitcoin and Ethereum can be predicted to some extent.

System Overview
For the proposed system, we crawled all comments and replies posted in online communities relevant to cryptocurrencies [19][20][21]. We then analyzed the data (comments and replies) and tagged the extent of positivity or negativity of each topic as well as that of each comment and reply. Following this, we tested the relation between the price and number of transactions of cryptocurrencies based on user comments and replies to select data (comments and replies) that showed significant relation. Finally, we created a prediction model via machine learning based on the selected data to predict fluctuations (Fig 1).

Crawling user comment data
We crawled data needed to create the prediction model. Once the environment for cryptocurrency trading among users is established, transactions between users lead to fluctuations in price [4]. We hypothesized that user comments in certain online cryptocurrency communities may affect fluctuations in their price and trading volume. Thus, we crawled the relevant data. Approximately 670 types of cryptocurrencies existed as of February 2016 [22]. Of the available ones, we crawled online communities for the top three in terms of market cap, i.e., Bitcoin, Ethereum, and Ripple. We did not include Litecoin in this study because its online communities seemed not to be sufficiently active to be considered in this experiment, despite its large market cap and broad user base.
Since Bitcoin was the first cryptocurrency, it has a large user community. In the Bitcoin community [19], data items were collected starting from December 2013, when the cryptocurrency became widely available. In the Ethereum community [20], data were collected from August 7, 2015, since when the community stabilized to the extent that at least one topic has since been posted every day and transaction data are available. From the Ripple community [21], all data since the creation of the community were gathered. In all communities of interest, we collected data in a legitimate manner, in compliance with their terms and conditions. Moroever, the collected data did not involve any personally identifiable information.
The cryptocurrencies of interest in this paper had online communities where users shared opinions on the relevant topics. The Bitcoin community [19] is divided into four sections, i.e., a "Bitcoin" section on Bitcoin-related topics, an "Economy" section on transactions, an "Alternate cryptocurrencies" section concerning other cryptocurrencies, and an "Other" section for other topics. Each section has three-five subsections. The "Bitcoin" section consisted of "Bitcoin Discussion," "Development & Technical Discussion," "Mining," "Technical Support," and "Project Development." The "Alternate cryptocurrencies" section had a similar structure. For this paper, we crawled the discussion sub-sections for topics related to each of the cryptocurrencies.
Comments and relevant replies posted by users on bulletin boards in each community were crawled. Furthermore, the time when each comment and replies to it were posted, the number of replies to each comment, and the number of views were crawled as well. Replies quoting previous comments and replies were crawled excluding overlapping sentences. Each community's HTML page was crawled using Python [23]. Using Python's regex, we parsed the tags on HTML pages to extract the number of topics, the number of replies, the dates on which the topics and replies were posted, and the URL of each topic from the bulletin boards. Based on the URLs of extracted topics, their contents and replies to them were extracted. The extracted topics, the dates on which they were posted, topic contents, reply contents, and reply dates were saved in .json format, which was in turn converted into other formats (e.g. csv) appropriate for different purposes. The .json files of the communities crawled can be viewed in the supporting information. One researcher executed the crawling on a single PC for 48~72 hours, where the time varied with the size of the community. The Bitcoin and Ethereum forums were crawled on February 1 and 8, 2016, respectively, whereas the Ripple forum was crawled on January 21, 2016. Table 1 outlines the arrangement of the opinion data that were gathered.
The crawled data included garbage, e.g., ads and meaninglessly repetitive postings or replies. Quite a few spam filtering techniques were investigated to remove such garbage data [15, 24- 29]. Any posting of more than two sentences found more than five times a day was considered spam and treated as such.
Past research has mostly focused on classifying user comments in particular fields. Comments on online communities involve considerable use of neologisms, slang, and emoticons that transcend grammatical usage. C.J. Hutto and Eric Gilbert introduced an algorithm called VADER [44] to parse such expressions, and proposed a method to analyze social media texts by drawing on a rule-based model. Online communities of interest in this paper paralleled social media texts. Thus, user comment data were tagged based on this algorithm.
VADER normalizes positive and negative sentiments from -1 to 1. Based on the normalized figure, x, -1< = x < -0.6, -0.6< = x < -0.2, 0.2 < = x <0.6, and 0.6< = x < = 1.0 were tagged as very negative, negative, positive, and very positive, respectively. In this paper, each of the comments and replies was tagged (see the opinion analysis example in Table 2).

Prediction modeling
The crawled user comment data were tagged to create a prediction model. To create the prediction model, data selection was performed again. All opinions from very negative to very positive comments and replies could have been used. Yet, we intended to improve the qualitative results and minimize operation cost. For data selection, we performed an association analysis between the results of opinion analysis and fluctuations in cryptocurrency prices. In this paper, Table 2. Bitcoin Community Opinion Analysis Example.

Example topic sentences
Very Positive "I am selling for $100 a Starbucks Gift card with a loaded balance of $20 worth of BTC" / "Bitcoin is the global currency of the Earth" / "How can 1 BTC eventually be worth $11 M" Positive "We are in Bitcoin Heaven" / "Bitcoin to eventually replace Apps like Uber" / "Russians can Pay Internet and phone bills with Bitcoin without fees" Neutral "Do you think Bitcoin will disappear or sopt being used?" / "What you like the best about Bitcoin?" / "Can Bitcoin make banks disappear?" Negative "Bitcoin: Should you stay or should you go?" / "Is there a way to earn at least $1 in BTC per hour?" / "IMF fears cryptocurrencies may circumvent capital controls" Very Negative "Bitcoin used to be involved in money laundering-will it become a huge problem?" / "Bitcoin cold storage-Hacked easily" / "Russia's Finance Ministry wants to ban Bitcoin" doi:10.1371/journal.pone.0161197.t002 the Granger causality test, which is widely used in research on the value of shares and currencies, was adopted [45]. As shown in Eq 1, the results of opinion analysis based on the topics and replies (VADERbased tagged values), the number of topics posted, the number of replies posted, and the number of views of the entire topics posted on a certain day were transformed into z-scores for standardization against the previous 10 days. Likewise, the fluctuations in the price and number of transactions of cryptocurrencies were transformed into z-scores for standardization against the previous 10 days. On a certain date t (t = 10 in the paper), the z-score of a certain item E, denoted by Z E , was defined as: where xðEÞ and sðEÞ respectively represent the mean and standard deviation of each item for every date. The standardized z-scores underwent the Granger causality test to determine the significance of association. The Granger causality test relies on the assumption that if a variable X causes Y, then changes in X will systematically occur before changes in Y [46]. As demonstrated in previous studies, lagged values of X exhibit a statistically significant correlation with Y [15,46]. Correlation does not prove causation, however. We are not testing actual causation, but only whether the time series of a community of opinions contained predictive information regarding the fluctuations in cryptocurrency prices.
Our time series for the prices of cryptocurrencies and number of transactions, denoted by S t , reflected daily changes in the prices of cryptocurrencies and the number of transactions. To test whether the community opinions in the time series can predict changes in the fluctuations in cryptocurrency prices, we compared the variance explained by two linear models, as shown in Eqs 2 and 3. The first model uses only n lagged values of S t (i.e., S t−1 , Á Á Á, S t−n ) for prediction, whereas the second model uses the n lagged values of both S t and the selling prices of the item time series, denoted by X t−1 , Á Á Á, X t−n . We performed the Granger causality test according to models in Eqs 2 and 3.
Based on the results of the Granger causality test, we can reject the null hypothesis, whereby the community opinions time series does not predict fluctuations in cryptocurrency pricesi.e., β {1,2,Á Á Á,b} 6 ¼ 0-with a high level of confidence The community opinions with the highest Granger causality relation (p-value < 0.05) were extracted.
The Granger causality test was performed on each currency for a time lag of 1 to 13 days. Experimentally, a time lag of 14 days and longer proved insignificant. Depending on the difference in each time lag measurement, elements showing significant associations were identified. For the prediction, the fluctuations in cryptocurrency prices were determined in a binary manner. We generated and validated the prediction model based on averaged one-dependence estimators (AODE) [47]. Based on AODE, we estimated the probability of a binary class y, given that an item-related set of features was x 1 ,Á Á Áx n , P(y|x 1, Á Á Áx n ). This probability was estimated as follows:P wherePðÁÞ denotes an estimate of P(Á), F(Á) is the frequency, and m is the frequency limit set at 1 in this paper. In the next section, we discuss the results of the applied system.

Experimental Results
Using our model, we made predictions regarding three cryptocurrencies (Bitcoin, Ethereum, and Ripple). In consonance with the days for which data were collected from these communities, each cryptocurrency's daily price and number of transactions were crawled. Information concerning the price and number of transactions of Bitcoin was crawled via Coindesk [19], whereas price information for Ethereum was crawled via CoinMarketCap [22] and its transaction information was crawled via Etherscan [48]. Information regarding price for Ripple was crawled via rippleCharts [49], whereas its transaction information was not crawled. All data collected were in the public domain and excluded personal information. Table 3 outlines the arrangement of the market data that were gathered.
The elements that exhibited significant associations in modeling for predictions were used for learning (Tables 4-8). P-values in the table are only shown for elements with prices of 0.05 or less.
An example of applicable input data is shown in Table 9. The results of the predicted fluctuations in the price and number of transactions of each cryptocurrency are discussed below.
The accuracy rate, the F-measure and the Matthews correlation coefficient (MCC) were used to evaluate the performance of the proposed models. The computation of these evaluation measures required estimating precision and recall, which are evaluated from True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN). These parameters are defined in Eqs 5, 6, 7 and 8: Accuracy rate, weighted average of F-measure (F−Measure w ) and MCC are defined in Eqs 9, 10, 11, 12 and 13.
Of the Bitcoin-related data for 793 days, the first 88% (for 697 days) and the remaining 12% (for 94 days) were used for learning and verification, respectively. Fluctuations in the price of Bitcoin proved to be significantly associated with the number of topics, positive/very positive comments, and positive replies. The prediction result proved to be the highest when the time lag was six days with an accuracy of 79.57% (Table 10). Moreover, fluctuations in the number of transactions proved to be significantly associated with the section where a number of daily topics, very positive comments, and very positive replies were found. The predicted result of fluctuating numbers of transactions proved to be highest when the time lag was three days with an accuracy of 77.895% (Table 10).  Table 9. Example of a machine learning dataset.
The z-score (Z A 10-fold cross-validation was performed on Ethereum for the entire days (for 187 days). Unlike Bitcoin, Ethereum showed a significant association in the Granger causality test with the section where a number of negative/very negative comments were found. A significant association with a number of positive user replies was also found. The predicted result proved to be highest when the time lag was six days with an accuracy of 71.823% (Table 11). The fluctuation in the number of transactions showed insignificant associations with most sections, but was significantly associated with very negative replies when the time lag was 11~13 days. The predicted fluctuation in the number of transactions when the time lag was one day yielded an accuracy of 66.129% (Table 11).
Finally, Ripple underwent 10-fold cross-validation for the entire days (for 137 days). The predicted fluctuation in the price of Ripple proved to be highest when the time lag was seven days with an accuracy of 71.756% (Table 12).

Fluctuation Prediction in Cryptocurrencies
Like Ethereum, Ripple proved to be significantly associated with very negative comments, and with negative replies when the time lag was seven days and longer. The prediction of fluctuation in the number of transactions of Ripple could not be performed due to difficulties in acquiring relevant data.
To determine the effectiveness of the proposed prediction model, we performed a simulated investment in Bitcoin, using the simulated investment technique generally used in past studies on stock price prediction [50]. We invested in Bitcoin when the model predicted the price would rise the following day, and did not invest when the price was expected to drop the following day according to the model. The simulated investment was based on the rule whereby we would gain or lose from the investment (m) by r, which indicates the increment or decrement in the Bitcoin price (m = m + m × r or m = m−m × r, respectively). The six-day time lag, which corresponded to the best result in this study, was used in the prediction model. The prediction model was created based on data for the period from December 1, 2013 to November 10, 2015. The 84-day or 12-week data for the period from November 11, 2015 to February 2, 2016 were used in the experiment. Fig 3 shows the results of the simulated investment program based on the above conditions. The random investment average refers to the mean of 10 simulated investments based on the random Bitcoin price prediction. Over 12 weeks, the Bitcoin price increased by 19.29% while the amount of investment grew by 35.09%. In random investment, the amount of investment increased by approximately 10.72%, which was lower than the increment in Bitcoin price.

Discussion and Conclusion
This paper analyzed user comments in online communities to predict the price and the number of transactions of cryptocurrencies. The proposed method predicted fluctuations in the price of cryptocurrencies at low cost. In terms of the prediction rates for Bitcoin and other cryptocurrencies based on the limited resources in online communities, the proposed method paralleled previous studies designed for similar purposes [15,51]. Moreover, user comments and replies in online communities proved to affect the number of transactions among users. The proposed method proved applicable to buying and selling cryptocurrencies, and shed light on aspects influencing user opinions. Furthermore, the simulated investment demonstrated that the proposed method is applicable to cryptocurrency trading.

Fluctuation Prediction in Cryptocurrencies
Based on the learning data at the time of higher prediction rates, the types of comments that most significantly influenced fluctuations in the price and the number of transactions of each cryptocurrency were identified. Opinions affecting price fluctuations varied across cryptocurrencies. Positive user comments significantly affected price fluctuations of Bitcoin, whereas those of the other two currencies were significantly influenced by negative user comments and replies. Moreover, the association with the number of topics posted daily indicated that the variation in community activities could influence fluctuations in price. Further, unlike the price of cryptocurrencies, the number of transactions proved to be significantly associated with user replies rather than comments posted. Based on the prediction results, user opinions proved useful to predict the fluctuations in 6~7 days (Table 10).
The predicted fluctuations in the price of each cryptocurrency showed approximately 8% accuracy gaps. The predicted result was most precise in Bitcoin, which seems attributable to the amount of accumulated data and animated community activities (16.91 comments, 473.81 user replies, and 27443.18 views on average daily), which exerted a direct effect on fluctuations in the price of the cryptocurrency. The predicted result was least precise in Ripple, which had the smallest community regardless of its market size (3.41 comments, 29.14 user replies, and 1661.99 views on average daily). Ripple's online community started in September, 2015, with little data accumulated and few user activities. These findings suggest that the difference in community sizes may have direct effects on fluctuations in the price of cryptocurrencies.
Improving the precision of prediction requires a few improvements. Despite the association analysis used to filter user comments and replies, more qualitative selection criteria are needed to build a prediction model. This paper focused on online communities to determine Fluctuation Prediction in Cryptocurrencies associations and predict fluctuations. Yet, as with past studies, using data on the Web [52,53], analyzing social network data [46], and referring to search volumes on Google [10,12] are conducive to more precise results. Moreover, partly adopting the stock market prediction technique used in previous studies [54] might help increase precision rate.
In this paper, we acquired information from users in online communities as a viable source for research on cryptocurrencies. In the same vein, the sentiments expressed by user comments and replies in online communities seem applicable to further analysis and understanding of cryptocurrencies. Moreover, the propensities of online community users may help understand the attributes of the relevant cryptocurrency. In addition, the rich information in online communities can contribute to understanding cryptocurrencies from different perspectives.
Cryptocurrencies are increasingly being used, and their usability has drawn attention from different perspectives [2][3][4][5]. Research on cryptocurrencies is insufficient, in that hardly any currency other than Bitcoin has been investigated. The proposed method of predicting fluctuations in the price and trading volume of cryptocurrencies based on user comments and replies in online communities is likely to increase the understanding and availability of cryptocurrencies if a range of improvements and applications are implemented. Furthermore, different approaches to user comments and replies in online communities are expected to bring more significant results in diverse fields.
Supporting Information S1 File. Results of crawling Bitcoin forum, Ethereum forum, and Ripple forum (in .json format). (ZIP) S2 File. Python-based crawler source code for community data collection. (ZIP) S1 Table. The result of implementing opinion analysis from user opinion data (topic) on the Bitcoin forum (https://bitcointalk.org). (CSV) S2 Table. The result of implementing opinion analysis from user opinion data (topic) on the Ethereum forum (https://forum.ethereum.org/).