Deep reinforcement learning stock market trading, utilizing a CNN with candlestick images

doi:10.1371/journal.pone.0263181

Fig 1.

Candle stick images are generated based on the open, close, high, low prices for each day, for each stock.

The CNN in the RL system receives candlestick images as input and outputs actions of long, short, or no position.

More »

Expand

Fig 2.

S&P 500 Index January 2013—June 2020.

Training data consists of stock prices from January 2013 through December 2019. Testing data consists of stock prices from January 2020 through June 2020. The coronavirus stock market crash from Feb 20, 2020 to Mar 23, 2020 is a greater decline, 33%, than any event in the training data. The China tariffs dispute stock market crash in 2018 was closest, a 20.9% decline in 91 days.

More »

Expand

Fig 3.

Feature map visualizations are generated from fully connected layers in the DDQN.

A candlestick image is input, and each neuron in the fully connected layer receives the image. The neuron is excited on various parts of the image. The excited regions are stored as a 2D array, and shown here as a heatmap.

More »

Expand

Fig 4.

Adobe candlesticks images representing the day before, and the first four days of the coronavirus stock market crash.

Each row represents the day the input image was supplied to the DDQN. The first column represents the input image. The second column displays the regions of the image excited by each neuron. The size of each blue dot represents the level of excitement. The blue is made darker by overlapping dots, indicating multiple neurons were excited by the same region of the image. The bars indicate the total excitement value.

More »

Expand

Fig 5.

Daily high, low, open, and close stock prices are converted to a candlestick image.

Gray candles indicate an upward price movement, and black candles indicate a downward price movement.

More »

Expand

Fig 6.

DDQN structure and workflow with RL environment.

1. An action is received by the RL environment. 2. The reward is calculated. 3. the next state candlestick image, is generated. 4. The reward and next state are returned by the RL Environment. 5. The next state is stored in the Replay memory. 6. The replay memory stores 1000 previous state, action, reward, next state observations. The target network is trained using these randomly selected previous states. Every 100 steps, the target network weights are copied to the evaluation network. 7. The next state is also directly inputted into the evaluation network. 8. The next action is outputted by the DDQN and sent to the RL Environment.

More »

Expand

Fig 7.

Each DDQN is able to fit a function on the training data.

The training data consists of candlestick images representing stock market prices from Jan 01, 2013 through Dec 31, 2019. Final episode training rewards are shown above each training curve.

More »

Expand

Table 1.

DDQN training rewards on final episode, for 30 stocks.

More »

Expand

Fig 8.

Feature map visualizations are generated from fully connected layers in the DDQN.

A candlestick image is input, and each neuron in the fully connected layer receives the image. The neuron is excited on various parts of the image. The neuron is excited on various parts of the image. The excited regions are stored as a 2D array, and shown here as a heatmap. The region in yellow indicates the highest neuron excitement value on the image.

More »

Expand

Fig 9.

The highest level of neuron excitement, shown in yellow, for each the feature map generated from each neuron, is summed revealing the highest levels of neuron excitement by day for a single candlestick image.

The size of the blue dot corresponds to the neuron excitement value. The darkness of the blue dots is caused by multiple dots overlapping. This indicates multiple neurons have produced neuron excitement on the same region. Dark blue dot clusters indicate high levels of neuron excitement. The level of neuron excitement can be seen by the overlaid blue bar chart.

More »

Expand

Fig 10.

Testing geometric returns for 30 stocks.

The average geometric returns is 13.2%. All but one (Exxon Mobile -6%) yield higher geometric daily returns than the S&P 500 Index. S&P 500 Index geometric daily returns are shown in the bold red line. S&P 500 Index geometric daily returns for the testing data set are -4%.

More »

Expand

Fig 11.

Cross Sectional 20 day T tests on daily returns of the DDQN, minus daily returns of S&P 500 Index.

Each black dot is the 20 day T test p-value. The line of 22 black dots indicates that during the 22 day period following the corona stock market crash, the daily returns of the DDQN are statistically significant and different than the daily returns of the S&P 500 Index.

More »

Expand

Fig 12.

Adobe sum of neuron excitement on the recent seven regions, and the other 13 regions.

Following the corona stock market crash, neuron excitement increases on the most recent seven regions and decreases on the other 13 regions.

More »

Expand

Fig 13.

Adobe neuron excitement 7 recent regions—13 older regions.

Following the corona stock market crash the sum of neuron excitement increases on the 7 recent regions and decreases on the 13 other regions.

More »

Expand

Fig 14.

30 stocks average neuron excitement 7 recent regions—13 other regions.

Following the corona stock market crash neuron excitement increases on the 7 recent regions and decreases on the 13 other regions, on average for the 30 preliminary stocks.

More »

Expand

Fig 15.

The increase in regression coefficients following the corona stock market crash.

Values shown are Column B—Column A from Table 4. The neuron excitement regression coefficients for the recent regions increase while the older regions decrease. It is also notable that the neuron excitement regression coefficient with the highest increase is region 19. This region consists of the candles representing yesterday and two days ago.

More »

Expand

Table 2.

Regression results for neuron excitement.

Region 20 is the most recent region, and region 2 is the oldest region. Dummy variables are used to isolate the effect of neuron excitement for each region. Region 1 is removed to avoid the dummy variable trap. The largest coefficient is region 19 (x18), the second most recent region, with a value of 0.055.

More »

Expand

Table 3.

Regression results for the second regression, the 22 days following the corona stock market crash.

Coefficients for neuron excitement increase for regions 9 through 20, and decrease for regions 2 through 8. The largest coefficient is region 19 (x18), the second most recent region, with a value of 0.0822. An increase of 0.0272 from the overall testing data.

More »

Expand

Table 4.

Summary table of neuron excitement region regression coefficients.

Column A consists of the regression coefficients for neuron excitement for all testing data. Column B consists of the regression coefficients for neuron excitement for the 22 days following the corona stock market crash. The third column shows the increase in regression coefficients following the corona stock market crash.

More »

Expand

Table 5.

Logistic regressions are run with the daily returns sign as the dependent variable and change in neuron excitement as the independent variable.

The first and second columns show the most recent regions and other regions. Near-zero p-values and Z statistics greater than 2 or less than -2 indicate the regressions are able to successfully fit a function with daily returns sign as the dependent variable, and change in neuron excitement as the independent variable. The two regressions with the highest coefficients are the seven most recent regions at 1.5082 and six most recent regions at 1.48. Change in neuron excitement in the six or seven most recent regions of the candlestick image, is the best predictor of positive returns among those tested.

More »

Expand