Recurrent convolutional neural kernel model for stock price movement prediction

Stock price movement prediction plays important roles in decision making for investors. It was usually regarded as a binary classification task. In this paper, a recurrent convolutional neural kernel (RCNK) model was proposed, which learned complementary features from different sources of data, namely, historical price data and text data in the message board, to predict the stock price movement. It integrated the advantage of technical analysis and sentiment analysis. Different from previous studies, the text data was treated as sequential data and utilized the RCNK model to train sentiment embeddings with the temporal features. Besides, in the classification section of the model, the explicit kernel mapping layer was used to replace several full-connected layers. This operation reduced the parameters of the model and the risk of overfitting. In order to test the impact of treating the sentiment data as sequential data, the effectiveness of explicit kernel mapping layer and the usefulness integrating the technical analysis and sentiment analysis, the proposed model was compared with the other two deep learning models (recurrent convolutional neural network model and convolutional neural kernel model) and the models with only one source of data as input. The result showed that the proposed model outperformed the other models.


Introduction
Stock price movement prediction has long been a hot issue in both academic and industrial fields. Accurate prediction can not only help shareholders to make appropriate decisions and obtain excess return but also can be an important indicator for stock portfolio selection [1,2]. The efficient market hypothesis (EMH) [3] presumes that all the participants in the market are informationally efficient and all deals are traded in fair values. In other words, the stock market responds quickly and accurately to new market information and the stock price fully reflect all information, thus the stock market is unpredictable. Nevertheless, the presumption in the EMH is not practical in real world. The adaptive market hypothesis (AMH) [4] is proposed by behavioural economists to reconcile the EMH. The AMH considers the interaction of the market and participants and holds that the excess return comes from information asymmetry, which means the stock market is predictable. Besides, with the advent of information overload, a new theory, heterogeneous agent models (HAM) [5], comes out. It holds that the ability to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 prediction. They extracted consecutive nouns in sentences as a topic and utilized SentiWord-Net [22] to extract opinion words and accumulated sentiment value of opinion words for each topic. However, the text representation mentioned above cannot capture the semantic similarity when expressing one meaning by different phrases, which may cause the dimension explosion. In addition, the quality of text representation relies deeply on man-made tricks to select features. With the development of neural network-based language model [23,24], a new text representation, word embedding, has emerged to overcome these disadvantages. It transforms a word into a dense and fixed-dimensional vector which is a distributed and compact representation of the word [25]. Each dimension of the word embedding could be regarded as an abstract semantic feature and the similarity of two words could be assessed by the cosine of angles between these two embeddings. Usually, word embedding representation is accompanied by deep learning classification model. Ding et al. [26] used convolutional neural networks (CNN) to train event embeddings at the base of word embedding representation for stock price movement prediction. Peng and Jiang [27] applied the word embedding methods and deep neural networks to leverage financial news to predict stock price movements. The results had shown that it can significantly improve the prediction accuracy on a standard financial database over the baseline system using only the historical price data. Xu et al. [28] proposed a recurrent convolutional neural network (RCNN) to extract key features and context-dependent relations in financial news to predict the stock movement. It can achieve improvement in individual stock prediction. Vargas et al. [29] also proposed a similar recurrent convolutional neural network (RCNN) for predicting the direction of stock price movement, but it extracted features from both financial news and financial time series data. Yang et al. [30] proposed a dual-layer attention-based neural network and applied sentence embedding and day embedding for stock prediction. The model paid more attention to the more influential news and showed much better explainable result than traditional neural networks.
As for the classification section of the model, according to the literature [31], supporting vector machine (SVM) was the most popular classification model among previous researches on stock forecasting. As the most commonly used kernel method, it has a firm mathematical foundation. By the non-linear kernel function, input vectors can be implicitly projected into a high dimensional space so that a hyperplane could be constructed to separate the inputs linearly. However, the accuracy of prediction is largely dependent on the quality of input features as well as the selection of kernel function and pre-defined parameters. In contrast, the neural network model can automatically learn the advanced features from input data for prediction. Especially the recurrent neural network (RNN) model can extract context-feature or temporal features from sequential data [24]. Convolutional neural network (CNN) can also learn effective features for text classification [32,33]. The results of deep learning models were not less than those of traditional machine learning methods. But the neural network also has the disadvantage that cannot be ignored. The gradient will disappear or explode with the increase of neural network layers, which makes the model unable to update. In addition, with the increase of parameter numbers, the overfitting problem is easy to occur, which makes the performance of the model on the test set very poor. Therefore, it is better to construct a hybrid model, which combines the advantages of both the kernel method and neural network models and overcomes their shortcomings. Some studies have attempted to combine these two methodologies. Le and Xie [34] proposed a deep embedding kernel (DEK) and test the performance on serval classification tasks. The DEK utilized deep learning to train a kernel, which in turn implicitly mapped data to a high dimensional feature space. In other words, the DEK applied the core idea of the kernel method on deep learning models and regarded the output of DEK model as the output of kernel function. It showed good performance on predicting stock daily price direction with financial series data. Mehrkanoon [35,36] and Suykens [35] introduced another hybrid deep neural kernel framework which replaced the activation function of neural networks by random Fourier feature maps. The results showed an improvement of classification accuracy over standard neural networks on several real-life digits or images datasets. However, none of these hybrid models has been applied on text classification or considered the features from text data. Compared with numerical financial time series data and images, text data is more difficult to be processed and understood by computers. In this paper, we tried to build a new hybrid model that can learn features from financial time series data as well as text data.
Our motivation is to build a model that combines technical analysis and sentiment analysis. In our proposed model, the temporal feature of sentiment in text data will be considered and the classification section of the model will be optimized to improve the accuracy for stock price movement prediction. We used the historical stock price data and posts in the message board as two sources of datasets. Historical price data give an objective description of a stock's performance and the posts in the message board represent the subjective public mood towards the stock, thus exploring these two sources of data could generate complementary features and bring improvement for prediction. Furthermore, we assumed that the influence of the public mood on the stock price would last for a period and gradually recede over time. Thus, we aimed to build a model that can learn the temporal features of sentiment data. In this paper, we proposed a model named Recurrent Convolutional Neural Kernel (RCNK) model to train sentiment embedding with temporal and contextual features extracted from user comments. It is a hybrid model of neural network and kernel method. In the classification section of the model, we utilized the explicit kernel mapping layer to replace the full-connected layers in traditional neural networks. It maps the feature vectors into a new space where the vectors could be linearly separable. This technique decreased the number of neural network layers and improved the prediction accuracy of the model.

Problem formalization
Stock price movement prediction is regarded as a classification task. As described in Eq 1, it is assigned a label for each transaction date of the stock i. If the closing price pc t on the transaction date t is higher than that on the transaction date t−1, the sample would be labelled 1, otherwise 0.
( It is aimed to predict the right label of a stock by analysing two types of input data. The first part is the financial time series data. It denotes a sequence of vectors: where vector T i t includes the stock price and technical indicators on transaction date t of the stock i. The second part is a sequence of user-generated posts within m days before the transaction date t. It is denoted as: where S i t denotes a collection of selected top posts at date t of the stock i. How to transform the text data into numerical vectors will be described in detail in data preparation section. The output y i (t) of our model is the prediction result. It represents the probability that the stock price will rise at date t, which is computed as: where F(�)will be described in detail in model design section. The learning objective of our model is to minimize the cross-entropy loss. It is computed as: where N denotes the numbers of stocks, T denotes the numbers of transaction date. In this study, the random gradient descent algorithm and back propagation were utilized to find out the most optimal parameters.

Data preparation
The datasets were all crawled from http://guba.eastmoney.com, which is a well-known website about stocks in China. It not only displays real-time trading information, but also provides a platform for users to comment on the stock. The collection method complied with the terms and conditions for the website http://guba.eastmoney.com. The data did not contain any identifying information, and they were accessed from publicly available posts. The process of data preparation is illustrated in Fig 1. The message board is a platform for stock investors to share their experience, discuss stock market quotations and vent their emotions. Thus, the themes of the posts are varied and it was necessary to select posts that are meaningful for prediction. In theory, the more meaningful the posts are, the higher the reading volumes are. Besides, posts with objective analysis of events and stock performance have a greater impact on the accuracy of stock price movement prediction than that only with subjective wishes or extreme emotions. Thus, after word segmentation, a trigger dictionary containing technical terms of stock markets to filter posts with effective features and a stop-word vocabulary to block posts that vented extreme negative emotions were used. After filtering and statistics, it was found that the mode of the number of comments per stock per day was 20. Thus, those posts were ranked according to the reading volume and the top 20 posts were selected for each stock on each transaction date (padding with zero vectors if there were less than 20 posts).
Word embeddings have been pre-trained with Word2Vec tools. After word segmentation and elimination of stop words, most of the posts have no more than 10 words. Considering the maximum retention of semantic features and computational complexity, it was decided to represent each post as the average of the word embeddings of the first 10 words. Finally, the Inpu-t_2 i (t) was transformed into a sequence of word embedding matrixes. Each matrix S i t represents the selected posts for stock i on transaction date t. Each row of S i t is an average of word embeddings of a post. In other words, each row of S i t represents a post for stock i on transaction date t. As for Input_1 i (t), the time series data used mainly include stock historical data and technical indicators as input features. The most commonly used historical data are open price, close price, highest price, lowest price, trading volume, etc. The technical indicators describe additional information derived from stock price in different ways. They include moving average (MA), rate of change (ROC), relative intensity index (RSI), Williams indicator and so on. The input features have been described in Table 1.

Model design
In this section, we will introduce our proposed model named Recurrent Convolutional Neural Kernel model (RCNK) to predict stock price movement. It is a hybrid model that combines the advantages of the kernel method and deep learning.
A price trend indicator calculated as the exponential average of stock close price over n days Relative strength index of price trend over n days that compares the magnitude of recent gains to recent losses The architecture of the model is shown in Fig 2. From the perspective of the task, it can be divided into three parts: data representation, sentiment embedding training and classification; from the perspective of model structure, it can be divided into five layers: input layer, convolutional neural network (CNN) layer, long-term and short-term memory (LSTM) neural network layer, an explicit kernel mapping layer and the output layer. These layers will be described as following in detail.

CNN layer
The convolutional neural network has long been used in computer vision and achieved remarkable results. In recent years, it had also proven to be useful in natural language processing especially in text classification [24,25]. Hence, in this study, CNN layer was used to extract daily sentiment features from posts matrix S i t . As is shown in Fig 3, the filter strides along the matrix to produce a feature map whose elements are denoted by a i,j . Specifically, x i,j denotes the element of i-th row, j -th column of S i t and the w m,n denotes the weight of m-th row, n-th column of filter. The feature a i,j is computed according to Eq 6.
where M, N 2 R represent the length and width of the filter respectively, w b 2 R is a bias term and f is a non-linear activation function such as the hyperbolic tangent, rectified linear unit or sigmoid. Because each row of S i t describes the abstract semantic features for one piece of post, it is reasonable to set N to be equal to the width of input matrix. As for M (also called window size), it usually has multiple values so extract combinatorial features could be extracted. It is noteworthy that the sentiment embedding is not trained for one post but for the totally selected posts in one day. In other words, the feature map a i,j represents the comprehensive emotion of investors in the day t. Therefore, multiple values of M can help to extract more combinatorial features. Next, a max-pooling operation was applied to the feature map. It takes the maximum value of the feature map to form a new vector. This operation is to capture the most important feature, which is the highest value, for each input matrix. It can also reduce the risk of overfitting by cutting down some parameters [11]. Obviously, it can naturally deal with varying length of feature maps.
To sum up, one filter extracts one feature for one input matrix after the convolution and max-pooling operation. The CNN layer uses multiple filters with varying window sizes to obtain multiple features. These features are concatenated at the end of this layer. It is worth mentioning that the weights of filters are shared among the input matrix S i t in Input_2 i (t).

LSTM layer
As it was mentioned above, Input_2 i (t) is a sequential of text data, which describes investors' sentiment information about the stock within several days before transaction date t. These sentiment information does not exist independently but has an impact on subsequent sentiment and weakens over time. Therefore, the purpose of this layer is to learn the continuous impact of previous sentiment on stock price on transaction date t. This impact can be called temporal features.
As is shown in Fig 4, LSTM unit uses a memory cell C t and three gates to control the update of hidden state. The forget gate f t decides which values of the previous memory cell C t−1 should be reserved for the current memory cell C t . The input gate I t decides which values of input should be passed to the current memory cell C t . Finally, the output gate O t controls which values of cell state C t will have a further impact on the next time step. Thus, long-term and shortterm memory neural network is effective in capturing temporal features on sequential data. The LSTM transition equations are following: where x t is the input at current time step, sigmoid and tanh are non-linear activation function, Θ denotes element wise multiplication,

Explicit kernel mapping layer
In general, several fully-connected layers will be stacked over the feature extraction layer to get the probability distribution of classification. This operation brings a large number of parameters hence increases the computation complexity and the risk of overfitting problem. It is necessary to find a way to reduce the number of parameters and remain the accuracy of the prediction model at the same time. Inspired by [37], an explicit kernel mapping layer was constructed to replace the fully-connected layers. This is an approximate computation of kernel method. Due to the random projection process of constructing explicit mapping function Z (x), this layer does not bring any trainable parameters. Besides, it can bridge the deep learning model and the kernel method, which enables us to stack a mapping layer over the neural networks to achieve the best of two methodologies. The common kernel method Support Vector Machine (SVM) applies a kernel function implicitly mapping the input features into a high-dimensional Hilbert space where the mapped data could be separated linearly. The simplicity of this method is that it is unnecessary to compute the coordinates of the mapped data but rather simply to compute the inner products of all pairs of data in the original space. Unfortunately, with the increase of training examples, the training speed of this method will decrease significantly. Rahimi and Recht [37] aimed to construct the explicit mapping function Z(x) that mapped the points in the original space to a relatively low-dimensional randomized feature space which is a lower-dimensional space compared with Hilbert space. This operation makes the inner product between a pair of transformed points approximating to their kernel evaluation. They proved their method could compete favourably in speed and accuracy with the kernel-based algorithms. The detailed process of constructing Z(x) is described as following.
This approach is inspired by Bochner's theorem that if the kernel k(δ) is a positive definite shift-invariant kernel and is properly scaled, its Fourier transform p(ω) is a proper probability distribution. If we define z ω (x) = e jω 0 x , we will get After transforming two points x and y in this way, their inner product z ω (x) z ω (y) is an unbiased estimator of k(x,y). It is proved that z o ðxÞ ¼ ffi ffi ffi 2 p cosðo 0 x þ bÞ satisfy the Eq (13) where ω is the random projection direction drawn from p(ω) and b is bias term drawn uniformly from [0, 2π]. Thus, Z(x) can be defined as Eq (14) where D is the number of random projection directions. Combing the Eq (13) and Eq (14), we have the Eq (15), description of approximating calculation for kernel evaluation k(x,y).
kðx; yÞ ¼ ZðxÞZðyÞ ð15Þ In this paper, the random Fourier features were used to construct an explicit feature map Z (x) in order to transform the extracted features into a linearly separable space. After the kernel mapping layer, a new mapped feature vector is obtained that could be separated by a linear model, thus the output layer is just a fully-connected layer with softmax as activation function whose output is the probability distribution over classifications.

Data distribution
14 stocks were randomly selected in the Chinese A-share market covering the period from 2016-11-01 to 2020-03-31. The training data includes the period from 2016-11-01 to 2019-10-31 and the test data includes the period from 2019-11-01 to 2020-03-31. The label distribution of training data and test data was shown in Table 2. The label distribution is balanced in training data and test data, which indicates that the model after training will not be biased to any label.

Comparative models
Firstly, in order to assess the usefulness integrating the technical analysis and sentiment analysis, the results of the proposed RCNK model were compared with the models adopting only time series data or sentiment data as input. Secondly, the comparative experiment with convolutional neural kernel (CNK) model [36] was carried out to evaluate the impact of treating the sentiment data as time series data. The CNK model, which contained no LSTM layer, did not consider the temporal features of sentiment data. But it had the same explicit kernel mapping layer with the RCNK model. Finally, the performance of RCNK model and recurrent convolutional neural network (RCNN) model [28,29] were compared to test the effectiveness of explicit kernel mapping layer. Compared with RCNK model, the RCNN model had no explicit kernel mapping layer. It used several full-connected layers in the classification section of the model like traditional neural networks. For brevity, abbreviations in Table 3 identified those models mentioned. The same training data was used to train different models.

Model performance metrics
To evaluate the performance of different models, three metrics were used: accuracy, Matthews Correlation Coefficient (MCC) and accumulated return. The accumulated return is the difference between the final account value and the initial account value after a period of trading simulation. The other two metrics measured the predictive power of the model. Accuracy describes the ability of the model to correctly predict all classes of labeled data. MCC 2 [-1, 1] is usually used for measuring the quality of binary classification, even if the label distribution in datasets is unbalanced [38]. It is essentially a coefficient describing the correlation between real labels and predicted labels. It means the perfect prediction when the value is 1. These two metrics are computed as: MCC ¼ TP � TN À FN � FP ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ðTP þ FPÞ � ðTP þ FNÞ � ðTN þ FPÞ � ðTN þ FNÞ p ð17Þ where TP is the number of correctly predicted positive samples; TN is the number of correctly predicted negative samples; FN is the number of positive samples that were predicted to be negative; FP is the number of negative samples that were predicted to be positive.

Hyper parameter selection
Before the experiment, two main issues must be settled. The first one is the selection of the time frame parameter which is also called input window length for text data. As it was assumed before, the influence of sentiment on the stock price would last for a period and gradually recede over time. This hyper parameter explains how long the emotional impact lasts. The second one is the time slot selection for text data. The stock market is open from 9:30 to 11:30 in the morning and from 13:00 to 15:00 in the afternoon. In previous studies, most researchers chose the posts of m whole days before transaction date t to predict the stock price movement of transaction date t. This was called strategy A. But Kim and Kim [39] found that investor sentiment was positively affected by prior stock price performance and doubted its predictive power. This finding reminds us to pay attention to the immediacy feature of posts in the message board. It was observed that many topics about predicting the current direction of the stock price trend were posted in current trading time whereas more topics about predicting the trend of the next trading day were posted after the stock market closed. Therefore, another strategy B to select the time slot is to consider the posts in current trading date t and the posts after trading time at date t−1. To have an insight on this process, two strategies for selecting the time slot are showed in Fig 5. The shaded area in Fig 5 is the time slot during which the posts were collected.
The real stock trading was simulated by simplifying the trading strategy proposed by [40]. The initial account value was set to be RMB 10,000 and the transaction cost was ignored. For strategy A, if the model predicts the stock close price of the next trading date will go up, the trader will hold or buy the stock at the open price of current date. Otherwise the trader will sell the stock at the open price of the current date. For strategy B, if the model predicts the stock close price of the current date will go up, the trader will hold or buy the stock at the open price of the current date. Otherwise the trader will sell the stock at the open price of the current date.
It is of same importance to predict the positive samples and the negative samples correctly because the results directly affect whether we will buy or sell stocks. Thus, the accuracy was chosen to evaluate the model with different input window lengths. The average prediction accuracies of 14 stocks with different input window length are showed in Fig 6. It is found that the prediction accuracy of strategy B is generally higher than that of strategy A, which proves that the posts in message boards really have immediacy feature. Adding part of the posts in current trading date to inputs and adopting appropriate trading strategy can help improve prediction accuracy and accumulated returns. Though the most optimal value of input window length for two strategies are different, the prediction accuracy reaches highest within three days before the transaction date, which further illustrates the sentiment extracted from message board should be treated as sequential data and it is appropriate for predicting the shortterm stock price movement.
After determining the most optimal value of input window length, trade simulation was conducted with two strategies. The results are shown in Table 4. Average accumulated returns of strategy B are higher than that of strategy A. Both metrics indicate that the model with strategy B performs better. Hence, it is chosen to conduct experiment in next section.

Results and discussion
We test our proposed model and other comparative models. The performance of MCC was consistent with that of accuracy. The detailed experimental results are shown in Table 5. When evaluating trading simulation, we compared these models with simple Buy & Hold strategy which reflected the individual stocks price movement and can be treated as the baseline. The results are shown in Table 6. From the results, we can learn that: 1. RCNK model performed better than RCNK-T model and RCNK-S model. This indicated that these two data sources (the financial time series data and posts in the message board) complemented each other, and the model with two data sources as inputs can achieve better results in stock price movement prediction. It is noteworthy that RCNK-S model performed much better than RCNK-T model. This is likely caused by the inputs of two models containing different amount of information. The public posts contain much explicit and predictive sentiment data which directly indicates whether users think the stock price will rise or fall. It is much easier to model the correlation between the explicit sentiment data and the trend of stock price movement. Whereas the historical price data and technical indicators do not contain explicit information for prediction, it needs more sophisticated model to extract implicit features for prediction. Our proposed RCNK model involves only one CNN layer and one LSTM layer, which means it is a shallow network and may not work well when the input contains too many implicit and abstract features.
2. RCNK model was superior to CNK model. The difference between RCNK model and CNK model is that the former regards sentiment data of several trading days as sequential data but the latter only accumulates them together. It indicated that the temporal feature of sentiment information did exist and can effectively improve the performance of the model. 3. Results of RCNK model outperformed that of RCNN model which had no kernel mapping layer, which proved the effectiveness of the explicit kernel mapping technique.
4. Though not every stock made profits in trading simulation, our proposed RCNK model achieved the most average accumulated return. It performed much better than baseline (Buy & Hold strategy). This is mainly because even in a long-term falling stock market, it can still make short-term profits by good prediction.
In order to demonstrate the effectiveness of explicit kernel mapping layer more intuitively, the outputs of LSTM layer and explicit kernel mapping layer were fetched separately and were projected into two-dimensional space. Explicit kernel mapping layer utilized the random Fourier features to construct an explicit mapping function Z(x) that can map the outputs of LSTM layer into a higher dimensional feature space. The t-SNE visualization results of the fetched outputs were shown in Fig 7. The points with different colours refer to the samples with different labels after computation. It can be seen that the classification plane of the outputs from the explicit kernel mapping layer is clearer and more linear than that from LSTM layer. The experiment results also showed that the explicit kernel mapping technique can help to improve the prediction accuracy.

Conclusions
In this paper, a model named Recurrent Convolutional Neural Kernel (RCNK) model was proposed for stock price movement prediction. In order to improve the prediction accuracy of the model and the accumulated returns of trading simulation, the RCNK model was optimized from three aspects: data collection, text data processing and classifier of the model. The main contributions of this study can be summarized as following: 1. The financial time series data and posts in stock message board were used as two data sources for extracting complementary features. The proposed model combined the advantages of technical analysis and sentiment analysis. It learned sentiment embeddings with temporal features from posts in stock message board as well as financial embeddings from financial time series data. The word embedding technique, which considered semantic features as well as structural features of text data, was utilized as data representation method. It performed better than that with a single analysis.
2. It was assumed that the impact of public mood on the stock price would last for a period and gradually recede over time. Thus, the posts were treated in stock message board as sequential text data and long-term and short-term memory neural network (LSTM) was used to extract temporal features. This technique does help improve the prediction accuracy in stock price movement.
3. The explicit kernel mapping layer was used to replace serval full-connected layers in traditional deep learning models. Random Fourier features were used for constructing explicit mapping function which could project the input into a high-dimensional space. This approach is an approximate computation of the kernel method. It reduced the parameter of the model and the risk of overfitting. In this way, the proposed model bridged the deep learning model and kernel method. It showed better performance in prediction accuracy and accumulated returns compared with other deep learning models in the same dataset.
When selecting hyper parameters for model, it was found that the accumulated return and accuracy of the model were inconsistent. That indicated high accuracy of the model did not mean high profit. Thus, how to evaluate the performance of the model comprehensively is an interesting research direction for future work. From the result of trading simulation, we can see that some individual stocks fall in a long term. If we want to get good returns in the real stock market, we need not only good prediction for stock price movement but also portfolio optimization and better trading strategies. Therefore, portfolio optimization and trading strategies are worthy of further study. Besides, except for the financial series data and sentiment data, there are still many factors influencing the movement of the stock price. Integrating more sources of data and extracting various features for stock price movement prediction is worth further study.