Innovative deep matching algorithm for stock portfolio selection using deep stock profiles | PLOS One

Advertisement

Browse Subject Areas

?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1 — Fig 1.

Flow diagram of deep stock-profiling method.
As it is shown centering on the mechanisms affecting stock price formation in standard and behavioral finance (A), from the perspective of modern asset pricing theory (B), the stock features are extracted from two types of data sources (C) and are used for constructing deep stock profiles (D), which consisted of both traditional and new quantitative factors (E).

More »

Fig 2.

Process of generating weekly stock feature vector and feature matrix.
The lower half in this figure shows the backtracking window, which is generated by time series. In the upper half of this figure, and represent weekly stock feature vector and feature matrix at the trading date t_i, respectively. and represent all stock feature matrices in the stock universe and a stock selection target matrix generated by Algorithm 1 at the trading date t_i, respectively.

More »

Fig 2.

Process of generating weekly stock feature vector and feature matrix.
The lower half in this figure shows the backtracking window, which is generated by time series. In the upper half of this figure, and represent weekly stock feature vector and feature matrix at the trading date t_i, respectively. and represent all stock feature matrices in the stock universe and a stock selection target matrix generated by Algorithm 1 at the trading date t_i, respectively.

More »

Fig 3 — Fig 3.

Framework of Chinese text sentiment classification model based on CNN.
As it is shown the end-to-end framework is started with the labeling of financial texts, followed by custom financial Chinese text segmentation dictionary and main network structure of a deep CNN algorithm.

More »

Fig 4 — Fig 4.

Architecture of deep text matching algorithm.

More »

Fig 5.

TS-Deep-LtM algorithm design framework.
As it is shown the framework is composed of a model tuning module and a model selection module. A⁽ⁱ⁾ denotes the set of scores for different parameter combinations of algorithm i. B represents the set of different . The finally “Best model” refers to the selected algorithm and the optimal feature combination of the training data.

More »

Fig 5 — Fig 5.

TS-Deep-LtM algorithm design framework.
As it is shown the framework is composed of a model tuning module and a model selection module. A⁽ⁱ⁾ denotes the set of scores for different parameter combinations of algorithm i. B represents the set of different . The finally “Best model” refers to the selected algorithm and the optimal feature combination of the training data.

More »

Fig 6 — Fig 6.

Distribution of stocks by China securities regulatory commission industries.
The stacked bar chart represents the number of stocks in an industry, and short line on the bar indicates the ratio of the number of stocks to corresponding the total number of constituent stocks.

More »

Table 1 — Table 1.

Feature combination from traditional quantitative factors in deep stock profiling.

More »

Table 2 — Table 2.

Feature combination from new social quantitative factors in deep stock profiling.

More »

Table 3 — Table 3.

Baseline feature combination.

More »

Fig 7 — Fig 7.

Weekly stock feature representation and weekly stock selection target representation.
(A) denotes the stock profiles and yield rankings of all stocks on trading date t_i. (B) denotes the feature vectors of stock s_j on weekly trading day series. (C) and (D) represent the feature matrices of stock s_j on weekly trading date series and stock selection target matrices on trading date t_i, respectively. The parameters in this study were set to m = 25, w = 24, k = 10.

More »

Fig 8 — Fig 8.

Flowchart of feature extraction and data preprocessing.
From left to right, data preprocessing is performed around three modules including multi-source heterogeneous big data, stock profiles, and training data.

More »

Table 4 — Table 4.

Dividing the training set, validation set, and testing set by year.

More »

Fig 9 — Fig 9.

Dividing the training set, validation set, and testing set by samples, throughout the sample period.
As it is shown starting from the last sample in 2012, a new sample set is obtained for each additional sample by a rolling window in turn, with the last sample in each sample set as the testing set, the backward 12 samples starting from the last sample as the validation set, and the remaining samples as the training set.

More »

Fig 10 — Fig 10.

Performances of different ranking models trained on different feature combinations.
(A)-(D) represents the performances of models trained with nine classical LtR algorithms on four different feature combinations. The feature combinations corresponding to (A)-(D) are traditional factors, social factors, Song et al. factors, and profile factors, respectively. (E)-(I) represents the performances of models trained on four feature combinations using five algorithms. The algorithms corresponding to (E)-(I) are RFRanker, Coordinate Ascent, MART, RankNet, and RankBoost, respectively.

More »

Fig 11 — Fig 11.

Performances of models trained on four feature combinations using five classical LtR algorithms.
The different colors in figure represent corresponding NDCG@10 values. The right side of figure lists algorithms and feature combinations used in model training. The left side of figure shows corresponding clustering result of model performances. The models with similar performances are grouped under the same cluster.

More »

Fig 12 — Fig 12.

Performances of different trading strategies by annualized indicators.
((A) and (B)) and ((C) and (D)) represent the performances of trading strategies constructed by the models that were trained on the training datasets of rolling window by year and by samples, respectively, in both annualized return and Sharpe ratios. The legend in each chart shows the algorithms that are used to train ranking models and construct trading strategies.

More »

Fig 13 — Fig 13.

Performances of different trading strategies in cumulative returns (A), drawdown (B), monthly returns (C), Sharpe ratio, volatility, information ratio, alfa, and beta (D) during the entire backtesting period.

More »

Fig 14 — Fig 14.

Performances of TS-Deep-LtM models.
(A) evaluates the generalization ability of TS-Deep-LtM models. The trading date in (A) corresponds to a group of training data (training set, validation set, and testing set). We select the first and last set of training data to illustrate the training loss and NDCG@10 during model training. The average loss represents the mean of training losses on all training sets. (B) shows the performances of models trained on 246 training data sets based on TS-Deep-LtM and RFRanker algorithms during the entire sample period. The standard deviation σ is used to evaluation the dispersion degree of 246 NDCG@10 values for each type of model. Among the models trained based on TS-Deep-LtM algorithm, DRMM-tks, MV-LSTM, and CONV-KNRM are selected for 44, 114, and 88 times, respectively.

More »

Table 5 — Table 5.

Evaluation of long-only trading strategies based on different ranking prediction models.

More »

Fig 15 — Fig 15.

Overall performance of trading strategies based on TS-Deep-LtM and RFRanker on the eight evaluation metrics.

More »

Fig 16 — Fig 16.

Intelligent system framework for stock selection.
The framework has a 5-layer architecture, which is developed from left to right and from bottom to top.

More »