Predicting Virtual World User Population Fluctuations with Deep Learning

This paper proposes a system for predicting increases in virtual world user actions. The virtual world user population is a very important aspect of these worlds; however, methods for predicting fluctuations in these populations have not been well documented. Therefore, we attempt to predict changes in virtual world user populations with deep learning, using easily accessible online data, including formal datasets from Google Trends, Wikipedia, and online communities, as well as informal datasets collected from online forums. We use the proposed system to analyze the user population of EVE Online, one of the largest virtual worlds.


Introduction
Evolution in network technology and computing power has enabled people to interact with one another over the Internet. In the same vein, a growing number of users interact with one another and engage in economic, educational, and artistic activities in large virtual worlds (e.g., Second Life). Additionally, massively multiplayer online games (MMOGs)-e.g., World of Warcraft and EVE Online-have attracted an increasing number of active users who build communities and participate in a range of interactions [1][2][3][4][5][6].
Previous studies focused on analyzing virtual world users based on social theories; research into user dynamics in virtual worlds is difficult to find. By contrast, extensive research on predicting real-world population fluctuations has been conducted [27][28][29][30][31][32][33]. Still, most researchers have adopted long-term perspectives regarding predictions of real-world population dynamics. These methods could be applied to virtual environments and combined with a range of internal components to find meaning in long-term population trends. In this study, we focus on the characteristics of virtual worlds such as MMOGs, whose rapid changes correlate with increasing or decreasing user populations [34][35][36].
The present paper focuses on EVE Online, one of the largest MMOG virtual worlds, which has attracted approximately 0.5 million subscribers since it was released in May 2003 [37]. We predict the daily fluctuations in the number of users of EVE Online through deep learning [38,39], based on Google Trends [40], Wikipedia usage [41,42], online forum usage [43] and sentiment data from online forum postings. The proposed method achieved a certain level of success in predicting daily user population fluctuations by drawing on easily available data closely associated with virtual worlds.

System overview
For the proposed system, we collected data associated with MMOG EVE Online and tagged each posting and reply in its online forum with a positive or negative sentiment value. Based on this data and deep learning [38], we created a model for predicting fluctuations in the number of users (Fig 1).

Data collection
To generate prediction models, we selected data considered to be associated with an increase or decrease in the number of EVE Online users. The data selected was easily accessible online and was gathered from three sources: Google Trends, Wikipedia, and EVE Online forums [40][41][42][43].
Google Trends measures search interest in keywords searched on Google over a given period on a scale from 1 to 100. Google Trends data are widely used to analyze relevant phenomena in a variety of fields [44][45][46][47][48][49]. We gathered Google Trends data about the keyword "EVE Online" for this paper.
Second, Wikipedia usage data [41,42] was gathered, showing the number of page views for certain keywords on a given day. These data are also widely used to analyze phenomena on the Internet [46,50,51]. Again, we gathered the Wikipedia page data relevant to "EVE Online".
We also gathered data from EVE Online forums [43], which are used by community members to upload postings and exchange opinions regarding topics of common interest [25,[52][53][54]. Therefore, online forums provide good sources for monitoring the daily responses of many users to certain MMOGs. Communities or forums are widely used in MMOGs for information exchange [52]. According to one study, EVE Online linked its forums with economic activity among users [25], which we found to be relevant when predicting fluctuations in the current number of virtual world users. Comments and relevant replies posted by users on general discussion boards in EVE Online forums were crawled, along with post time, comment and reply post times, the number of replies to each comment, and the number of views. Replies quoting previous comments and replies were crawled to exclude overlapping sentences.
Each HTML page was crawled using Python regex to parse HTML tags and extract the number of topics, the number of replies, the dates on which the topics and replies were posted, and the URL of each topic from the general discussion boards. Based on the URLs of extracted topics, content and replies to them were also extracted. These data were saved in.json format, which was in turn converted to other formats (e.g. csv and xlsx) for different purposes. The. json files from the EVE Online forums crawled can be viewed in the supporting information. One researcher carried out this data collection on a single PC for approximately 72 hours.
We collected data over a period extending from September 12, 2011 to April 29, 2016 (see Table 1) in a manner that complies with the terms and conditions stipulated by each service. Moreover, the collected data do not include any personal information.

Tagging user comment data and correlation analysis
We tagged the collected user comment data with positive or negative sentiment values. Previous studies have mostly focused on classifying user comments in certain fields. Emoticons, neologisms, and ungrammatical expressions are frequently seen on Internet forums. C.J. Hutto and Eric Gilbert [55] proposed an algorithm (VADER) to complement such informal expressions and suggested a method of analyzing social media text using a rule-based model. The online forums of interest here are comparable to social media text. Thus, we utilized the VADER algorithm to tag user comment data from the forums.
VADER normalizes negative and positive sentiment on a scale of -1 to 1. A comment was tagged as very negative, negative, positive or very positive, for -1 x < -0.6, -0.6 x < -0.2, 0.2 x < 0.6, and 0.6 x 1.0, respectively (where x is a number). Each posting and reply was tagged (see the opinion analysis example in Table 2).
We verified the correlation between tagged sentiment values and the increase or decrease in user population. Here, the Pearson correlation coefficient [56] was used to determine the correlation between data sources and the increase or decrease in user population.
As shown in Eq (1), the results of opinion analysis based on the topics and replies (VADER-based tagged values) and population fluctuations were transformed each into zscores for standardization against those of the previous 20 days. On a given date t (t = 20 in the paper), the z-score of expected value E, denoted by Z E , was defined as: where " xðEÞ and sðEÞ represent the mean and standard deviation of each item for every date with a time granularity of 1 day. Fig 2 shows an example of test results comparing the fluctuations in population and opinion analysis z-scores. Table 3 shows the Pearson correlation coefficients between the z-scores for the opinion analysis results and the z-scores for population fluctuation.
Overall, a positive linear relationship was found. Notably, there was a clear correlation between user replies and very positive topics.

Prediction Modeling
We used the data collected and the tagged opinion data for the model with the intent to predict fluctuations in user population based on deep learning. Deep learning is widely used for solving a range of problems [38,39,[57][58][59][60][61][62]. Data sources have increased quantitatively and qualitatively in proportion to the history of virtual worlds. Still, little research on applying deep learning to virtual worlds for problem solving has been conducted. We developed a setting for applying deep learning based on collected data from a 4.5-year period.
First, we refined the data for the learning models. Specifically, the data gathered (as explained above) was standardized against its applicability to learning. We used long-term data and feature vectors by standardizing the data against the previous 20 days to lessen the impact of the significant changes in the range of data values over longer time periods. An example of applicable input data is shown in Table 4.
We built deep learning models to perform predictions based on the input data. We accumulated multiple hidden layers to learn the deep structure of the data. Here, we configured 1, 2, 3, 5, and 7 hidden layers; of these, we selected the one that returned the best prediction results. We allocated 1,500 neurons to the single hidden layer; when using 2 hidden layers, we allocated 1,024 neurons to each; when using 3 hidden layers, we allocated 1,024, 1,024, and 512 neurons; when using the 5 hidden layers, we allocated 2,048, 1,024, 1,024, 512 and 512 neurons to them; and finally, when using 7 hidden layers, we allocated 2,048, 1,024, 1,024, 1,024, 512, 512 and 512 neurons to the layers.
For the input layer, based on the input data in Table 4, we represented 15 input data as continuing vectors and allocated a different number of neurons to a different number of cumulative days used for learning (i.e., 45, 75, 105, 135, and 180 neurons were allocated to 3, 5, 7, 9, Table 2

Example topic sentences
Very Positive "Pretty happy with Crucible" / "Happy New Year from CCP Games!" / "The days of "Erotica 1" are gone and I AM GLAD" / "Great job CCP!" Positive "Nice launcher" / "Better Missile graphics are good" / "CCP Thank you for new Capital Rats" Neutral "Guide to Fleet Commanding" / "Drone ships completely useless especially for new players." / "A New Player Guide" / "Worth coming back to eve?" Negative "Why does Something Awful forums looks so old and terrible?" / "Back to play this terrible game terribly" / "Server crashed?" / "Awful download speed since launcher is released." Very Negative "Those WTF moments" / "AFK Cloaking in System is a Terrible Mechanic" / "Verification Failure-Still Happening" / "Eve Installer not working" doi:10.1371/journal.pone.0167153.t002   and 12 cumulative days, respectively). For the output layer, 2 neurons were allocated to represent the probabilities of population increase or decrease with the softmax function. We implemented the model using the Google Tensorflow library [38] and accelerated the deep learning model with GPU operation (nVIDIA CUDA). The gap in the prediction values between the optimal model and other model configurations is discussed in the following section.

Experimental Results
We performed prediction modeling by means of deep learning based on the collected and refined data, and predicted fluctuations in the EVE Online user population. We used 90% of the data from the period between September 12, 2011 and April 29, 2016 for learning and 10% for validation. The accuracy rate, F-measure, and Matthews correlation coefficient (MCC) were used to evaluate the performance of the proposed models. Table 5 and Fig 3 show the prediction results. The highest accuracy (87.57%) resulted from the 2-layer neural network model that used data from the previous 7 days for learning. Table 5 outlines the prediction results by layer configuration and learning data. If the number of hidden layers and days used were less than 2 and 7, respectively, learning was insufficient and prediction accuracy decreased slightly. Conversely, overfitting could occur (with the prediction accuracy failing to significantly improve) if these numbers exceeded 2 and 7, respectively.

Discussion and Conclusion
This paper proposed a new method for predicting fluctuations in the number of users in a massive virtual world, EVE Online, with a deep-learning prediction model based on data from a variety of sources. The proposed method successfully predicted fluctuations in the number of EVE Online users based on the easily accessible data relevant to the virtual world. User comments in online forums were found to affect user actions in the virtual world. The proposed method could be applicable to diverse fields, e.g., verifying newly added content in the creation and management of virtual worlds and solving network problems by forecasting the number of users. The proposed method could also be applied to previous findings, e.g., space and NPC management in virtual worlds [63,64] and to the optimization of virtual currency systems [25]. In addition, it could be used to apply social science theories to virtual worlds so as to understand the large, diverse user base.
Due to the paucity of previous findings in this field, our proposal has some limitations that need to be rectified in future work. First, the data need to be enriched further for better results. Diverse types of data would increase prediction accuracy. This research was limited to using data gathered for the prediction model only by means of deep learning; more diversified data would be applicable to feature selection as well. The VADER algorithm, optimized for social media analysis, performed well in this study. However, additional analysis of sarcastic or ironic language would improve our results. Moreover, Word2Vec [65] could also potentially improve these results. Sentiment analysis using the Word2Vec-based Doc2Vec [66] focuses on user reviews, making it difficult to apply directly to the analysis of online community users. Still, it might be applicable to and improve the analysis of user comments in online communities. As all data used for learning were formal data, data refinement methods should also be improved. Furthermore, other sizable virtual worlds, e.g., World of Warcraft and Second Life, could be used for prediction. Virtual worlds have been growing in size and diversity. Considerable research on them has already been conducted. Still, the size of the virtual world user population has not been elucidated. With ongoing improvement for wider application, the proposed method of applying data from different sources to virtual world user population dynamics will contribute to enhancing the understanding of virtual worlds and their user bases.