Entropy-Based Financial Asset Pricing

We investigate entropy as a financial risk measure. Entropy explains the equity premium of securities and portfolios in a simpler way and, at the same time, with higher explanatory power than the beta parameter of the capital asset pricing model. For asset pricing we define the continuous entropy as an alternative measure of risk. Our results show that entropy decreases in the function of the number of securities involved in a portfolio in a similar way to the standard deviation, and that efficient portfolios are situated on a hyperbola in the expected return – entropy system. For empirical investigation we use daily returns of 150 randomly selected securities for a period of 27 years. Our regression results show that entropy has a higher explanatory power for the expected return than the capital asset pricing model beta. Furthermore we show the time varying behavior of the beta along with entropy.


Introduction
We build an equilibrium capital asset pricing model by applying a novel risk measure, the entropy. Entropy characterizes the uncertainty or measures the dispersion of a random variable. In our particular case, it characterizes the uncertainty of stock and portfolio returns. In modern Markowitz [1] portfolio theory and equilibrium asset pricing models [2] we apply linear regressions. This methodology supposes that the returns are stationary and normally distributed; however, this is not actually the case [3]. Entropy, on the other hand, does not have this kind of boundary condition. The main goal of this paper is to apply entropy as a novel risk measure. As a starting point even the density function itself has to be estimated. In the traditional asset pricing model there is equilibrium between expected return the beta parameter, which is the covariance-variance ratio between the market portfolio and the investigated investment opportunity. If the random variable is normally distributed then the entropy follows its standard deviation; thus in the ideal case there is no difference between the two risk measures. However; our results show that there is a significant difference between the standard deviation, or beta, and the entropy of a given security or portfolio. In this paper we show that entropy offers an ideal alternative for capturing the risk of an investment opportunity. If we explain the return of a wide sample of securities and portfolios with different risk measures then on an ordinary least squares (OLS) regression setting the explanatory power is much higher in the case of the entropy measure of risk than in the case of the traditional measures, both insample and out-of-sample. We show that entropy reduction in line with diversification behaves similarly to standard deviation; however at the same time it captures a beta-like systematic risk of single securities or non-efficient portfolios as well. For well-diversified portfolios the explanatory power of entropy is 1.5 times higher than that of the capital asset pricing model (CAPM) beta.
We also test and compare entropy with standard risk measures for market circumstances that are increasing and decreasing, and find that the explanatory power of entropy is significantly higher in a bullish market, but lower for a bearish market. Our results for bullish and bearish regimes show that the different risk measures behave similarly in terms of the positive and negative relationship between risk and return. This behavior underlines the fact that the entropy-based risk measure can give contradictory results in the same way as traditional risk estimations in upward and downward regimes.
We also compare the entropy-based risk measures with the CAPM beta in and out of sample, which gives information on the predictive power of the different methods. As the CAPM beta measures the systematic risk only, while entropy based risk measures and the standard deviation captures the total risk of the investment our results are shocking, that entropy gives almost twice as high an average explanatory power as the beta, with an average of 40% less standard deviation. A further contribution of the paper is that we introduce a simple method to estimate the entropy of a security or portfolio return.

Data
In our empirical analysis we apply daily returns from the Center for Research in Security Prices (CRSP) database for the period from 1985 to the end of 2011. We randomly select 150 securities from the S&P500 index components that are available for the full period. The market return is the CRSP value-weighted index return premium above the risk-free rate. The index tracks the return of the New York Stock Exchange (NYSE), the American Stock Exchange (AMEX) and NASDAQ stocks. The risk-free rate is the return of the one-month Treasury bill from the CRSP. We use daily returns because they are not normally distributed (see S1 Table). Erdős and Ormos (2009) [3] and Erdős et al. (2011) [4] describe the main difficulties of modeling asset prices with non-normal returns. The daily return calculation enables us to compare different risk measures.

Methodology
Entropy is a mathematically-defined quantity that is generally used for characterizing the probability of outcomes in a system that is undergoing a process. It was originally introduced in thermodynamics by Rudolf Clausius [5] to measure the ratio of transferred heat through a reversible process in an isolated system. In statistical mechanics the interpretation of entropy is the measure of uncertainty about the system that remains after observing its macroscopic properties (pressure, temperature or volume). The application of entropy in this perspective was introduced by Ludwig Boltzmann [6]. He defined the configuration entropy as the diversity of specific ways in which the components of the system may be arranged. He found a strong relationship between the thermodynamic and the statistical aspects of entropy: the formulae for thermodynamic entropy and configuration entropy only differ in the so-called Boltzmann constant. There is an important application of entropy in information theory as well, and this is often called Shannon [7] entropy. The information provider system operates as a stochastic cybernetic system, in which the message can be considered as a random variable. The entropy quantifies the expected value of the information in a message or, in other words, the amount of information that is missing before the message is received. The more unpredictable (uncertain) the message that is provided by the system, the greater the expected value of the information contained in the message. Consequently, greater uncertainty in the messages of the system means higher entropy. Because the entropy equals the amount of expected information in a message, it measures the maximum compression ratio that can be applied without losing information.
In financial applications, Philippatos and Wilson [8] find that entropy is more general and has some advantages over standard deviation; in their paper they compare the behaviors of standard deviation and entropy in portfolio management. Kirchner and Zunckel [9] argue that in financial economics entropy is a better tool for capturing the reduction of risk by diversification; however, in their study they suppose that the assets are Gaussian. Dionisio et al. [10] argue that entropy observes the effect of diversification and is a more general measure of uncertainty than variance, since it uses more information about the probability distribution. The mutual information and the conditional entropy perform well when compared with the systematic risk and the specific risk estimated through the linear equilibrium model. Regarding the predictability of stock market returns, Maasoumi and Racine [11] point out that entropy has several desirable properties and is capable of efficiently capturing nonlinear dependencies in return time series. Nawrocki and Harding [12] propose applying state-value weighted entropy as a measure of investment risk; however, they are dealing with the discrete case.
All the above academic papers recognize that entropy could be a good measure of risk; however, it seems to be difficult to use this measure. Our main motivation is to show that an entropy-based risk measure is, on the one hand, more precise, and, on the other hand, no more complicated to use than variance equilibrium models.

Discrete entropy function
Entropy functions can be divided into two main types, discrete and differential entropy functions.
Let X * be a discrete random variable. The possible outcomes of this variable are denoted by o 1 ,o 2 ,::,o n , and the corresponding probabilities by p i 5Pr(X * 5o i ), p i $0 and P n i~1 p i~1 . The generalized discrete entropy function [13] for the variable X * is defined as: where a is the order of entropy, a$0 and a?1, and the base of the logarithm is 2.
The order of entropy expresses the weight taken into account in each outcome; if the order of entropy is lower, the more likely outcomes are underweighted, and vice versa. The most widely used orders are a51 and a52. a51 is a special case of generalized entropy. However the substitution of a51 into (1) results in a division by zero. It can be shown, using l'Hô pital's rule for the limit of a51, that H a converges to the Shannon entropy: The case of a52 is called collision entropy and similarly to the literature we refer to this special case as ''Rényi entropy'' further in the paper: H a (X) is a non-increasing function in a, and both entropy measures are greater than zero provided that there is a finite number of possible outcomes: Differential entropy function Let X be a continuous random variable taking values from R with a probability density function f(x). Analogously to (1), the continuous entropy is defined as: One can see that the bases of the logarithms in (1) and (5) are different. Although the entropy depends on the base, it can be shown that the value of the entropy changes only by a constant coefficient for different bases. We use the natural logarithm for all differential entropy functions. The formulas for the special cases (a51 and a52) are the following: An important difference between discrete and continuous entropy is that while discrete entropy takes only non-negative values, continuous entropy can also take negative values: In practice, standard risk measures like the CAPM beta or standard deviation are calculated on daily or monthly return data. We also follow this practice, and use a formula that is able to capture risk using this kind of data. Since the return on securities can take values from a continuous codomain, we primarily focus on the differential entropy function. However, by grouping return values into bins the discrete entropy function may also be used; this solution is outside the scope of this paper.

Entropy estimation
For the estimation of differential entropy, the probability density function of the return values needs to be estimated. Let x 1 ,x 2 ,:::,x n be the observations of the continuous random variable X, and H a,n (X) the sample-based estimation of H a (X). The plug-in estimations of entropy are calculated on the basis of the density function estimation. The probability density function f(x) is estimated by f n (x), the integral estimate of entropy, in the following way: where A n is the range of integration, which may exclude small and tail values of f n (x). We propose to select A n 5(min(x), max(x)).

Histogram
One of the simplest methods of density estimation is the histogram-based density estimation. Let b n 5(max(x), min(x)) be the range of sample values; partition the range into k bins of equal width and denote the cutting points by t j . The width of a bin is constant: h~b n k~t jz1 {t j . The density function is estimated by using the following formula: if x(t j , t j+1 ), where n j is the number of data points falling in the j th bin. Based on the properties of the histogram, a simpler non plug-in estimation formula can be deduced for Shannon and Rényi entropy using (6), (7), (9) and (10): The parameter of this method is the number of equal width bins (k). However, there are several methods for choosing this parameter (e.g. the square root choice, Scott's normal reference rule [14], or the Freedman-Diaconis rule [15]); the detailed descriptions of these are outside the scope of this paper.

Kernel density estimation
The kernel-based density estimation is another commonly used method. It applies the following formula: where KðÞ is the kernel function, and h is the bandwidth parameter. There are several kernel functions that can be used (see S2 Table); for practical reasons (computational time), we propose using the indicator-based Epanechnikov kernel function: where I is the indicator function.
Härdle [16] shows that the choice of the kernel function is only of secondary importance, so the focus is rather on the right choice of bandwidth (h). One of the most widely used simple formulas for the estimation of h is Silverman's rule of thumb [17]: where IQR(x) is the interquartile range of x.
As the formula assumes a normal distribution for X it gives an approximation for optimal bandwidth; despite this, Silverman's rule of thumb can be used for a good initial value for more sophisticated optimization methods [18].

Sample spacing estimation
Let x n,1 #x n,2 #…#x n,n be the corresponding order of x 1 , x 2 ,…,x n , assuming that this is a sample of i.i.d. real-valued random variables. x n,i+m 2x n,i is called a spacing of order m (1#i,i+m,n). The simple sample spacing density estimate is the following [19]: Wachowiak et al. [20] introduced another variation of the sample spacing density estimation, called the Correa estimator: if x j , and 1#j#n.
The parameter for sample spacing methods is the fixed order m. For practical reasons (e.g. different sizes of samples) we suggest using m n , which depends on the size of the sample and is calculated by the following formula: where k is the number of bins, and the braces indicate the ceiling function. Beirlant et al. [19] overview several additional entropy estimation methods, such as resubstitution, splitting-data and cross-validation; however, our paper focuses on the applications that are used most often.

Risk estimation
Let the following be a given set of data: The elements are the set of securities S:{S 1 , S 2 ,…,S l }, with the corresponding observations being R:{R 1 , R 2 ,…,R l }, where R i 5(r i1 , r i2 ,…,r in ). The observation for the market return is R M 5(r M1 , r M2 ,…,r Mn ), and the observation for the risk free return is R F 5(r F1 , r F2 ,…,r Fn ) where l is the number of securities and n is the number of samples. Let us recall that the main goal of this paper is to apply entropy as a novel risk measure. In order to handle the risk measure uniformly, we introduce k as a unified property for securities. Let k(S i ) be the risk estimate for the security i.
In the economic literature the most widely used risk measures are the standard deviation and the CAPM beta. Let us denote these by k s and k b , respectively. The estimation of these risk measures for the security i is the following: where b is the CAPM beta, covðÞ is the covariance of the arguments and s is the standard deviation.
Our hypothesis is that uncertainty about the observation values can be interpreted as a risk of the security, and for this reason we apply entropy as a risk measure. Because the differential entropy function can also take negative values, for better interpretability we apply the exponential function to the entropy, and we define the entropy-based risk measure by the following formula: One can see that k H takes values from the non-negative real numbers, k H [[0,+').

Explanatory and predictive power
In order to compare the efficiency of the risk estimation methods, we introduce two basic evaluation approaches, the measurement of in-sample explanatory power and the measurement of out-of-sample predictive power.

In-sample
Let V be a target variable, with sample v~v 1 ,v 2 ,:::,v l ð Þ , and let U be a single explanatory variable with sample u~u 1 ,u 2 ,:::,u l ð Þ . To estimate the explanatory power of the variable U for the variable V, we use the following method. The linear relationship between the two variables can be described using the linear regression model: V~a 0 za 1 Uz".
The parameters of the model (a 0 and a 1 ) are estimated by ordinary least squares (OLS), and the estimation for the target value is the following:v i~â0 zâ 1 u i wherê a 0 andâ 1 are the estimations of a 0 and a 1 , respectively. One of the most commonly applied estimations of the explanatory power is the R 2 (goodness of fit, or coefficient of determination) of the linear regression: We are curious as to how efficiently the different risk measures describe the expected return of a security, and we denote this measure by g(k). Let the explanatory variable U be the risk measure of the securities, where the sample is: and the target variable T is the expected risk premium of the securities, where the sample is: where k is the unified risk measure function, and E½ is the expected value of the argument. We define the estimation of the in-sample explanatory power (efficiency) as the R 2 of the previously defined variables (24) and (25): Out of sample Let us create a split of samples for a given D:{S, R, R M , R F } data set (19): where the corresponding samples for the securities are R I : R I 1 ,R I Based on (26), (28) and (29), the estimation of the out-of-sample explanatory (predictive) power is the following: Both in-and out of sample we test whether the difference between the explanatory power of the investigated risk measures (standard deviation, CAPM beta, Shannon-and Rényi entropy) are significant by applying bootstrapping method. In our bootstrap iteration we remove 25 random stocks from the investigated 150 ones and measure the R 2 s of the four different models. We apply 1000 iterations to approximate the distribution of R 2 values on random selection, and we test the equality of means of R 2 s by applying t-test on the generated samples.

Results and Discussion
We present the empirical results in four parts. First, we show how the entropy behaves in the function of securities involved into the portfolio. Second, we present the long-term explanatory power of the investigated models. Third we examine and compare the performance of different risk measures in in upward and downward market trends. Fourth we apply the different risk parameters to predict future returns, thus we test the out of sample explanatory power of the well-known risk parameters and compare their efficiency to the entropy based risk measures.

Characterizing the diversification effect
We investigate whether entropy is able to measure the reduction of risk by diversification. We generate 10 million random equally-weighted portfolios with different numbers of securities involved (at most 100,000 for each size), based on the 150 randomly selected securities from the S&P500. The risk of portfolios is estimated by standard deviation, and by the Shannon and Rényi entropies using risk premiums for the full period. Because the CAPM beta measures the systematic risk only, we exclude it from the investigation of risk reduction. Both types of entropy functions are calculated by the histogram-based density function estimation, with 175 bins for the Shannon entropy and 50 bins for the Rényi entropy. (We tested the histogram, sample spacing and kernel density estimation methods, and the histogram-based method proved to be the most efficient in terms of explanatory and predictive power and simplicity. See our results in S3 Table.) Fig. 1 shows the diversification effects that are characterized by the entropic risk measures and by the standard deviation. For 10 random securities involved in the portfolio, approximately 40% of risk reduction can be achieved compared to a single random security, based on all of the three risk estimators under investigation. Fig. 1 suggests that entropy shows behavior that is similar to but not the same as standard deviation, so it can serve as a good measure of risk. We also investigate how the different portfolios behave in the expected return -risk coordinate system in the function of diversification. We generate 200-200 random equallyweighted portfolios with 2, 5 and 10 securities involved, and compare these to single securities using standard deviation, the CAPM beta, the Shannon entropy and the Rényi entropy as risk measures; the results are presented in Fig. 2. Fig. 2 shows the performance of random portfolios by diversification using different risk estimation methods. One can see that the characteristics of standard deviation and entropy are quite similar, with the portfolios being situated on a hyperbola as in the portfolio theory of Markowitz [1]. Different characteristics can be observed by using the CAPM beta; the more securities that are involved in a portfolio, the closer they are situated in the center of the coordinate system.

Long term explanatory power
In order to evaluate how efficiently the risk measures explain the expected risk premium over a long period, we estimate the risk for each security using standard deviation, the CAPM beta, and the Shannon and Rényi entropies based on the full period (denoted by P1). The single explanatory variable is the risk measure; the target variable is the expected risk premium of the security. We apply the explanatory power estimation by calculatingĝ k ð Þ (R 2 ) for each risk measure. Fig. 3 shows the efficiency of explaining the expected risk premium by the different risk measures; the expected daily risk premium is presented as a function of risk measure. The CAPM beta performs the worst, with 6.17% efficiency. Entropy-Based Financial Asset Pricing However, the explanatory power of standard deviation (7.83%) is higher than that of the CAPM beta, and both entropies perform significantly better, with efficiency of 12.98% for the Shannon entropy and 15.71% for the Rényi entropy. Based on the equation of linear regressions, the average unexplained risk premium (intersect on the Y-axis or Jensen alpha [21]) for the entropy methods (0.0091, 0.0059) is lower than that for the standard methods (0.0170 for standard deviation and 0.0209 for the CAPM beta).

Entropy-Based Financial Asset Pricing
We also measure the explanatory power for different numbers of securities involved in the portfolio, by generating at most 100,000 samples for each; we present these results in Fig. 4. Fig. 4 illustrates how the explanatory power changes with diversification. One can see that the explanatory power of standard deviation and entropy decreases with an increase in the number of securities involved in the portfolio, while the performance of the CAPM beta is nearly constant. While the CAPM beta models the systematic risk only, the standard deviation and entropy are capable of measuring specific risk, which gives additional explanatory power for lessdiversified portfolios. Despite the decreased explanatory power of both entropy Entropy-Based Financial Asset Pricing functions, they perform better than the CAPM beta in all the cases that were investigated. For well-diversified portfolios the explanatory power of the Rényi entropy is 1.5 times higher than that of the CAPM beta.

Explanatory power by primary market trends
We split the original 27-year sample by primary market trend into a ''bullish'' and a ''bearish'' sample (denoted by P1+ and P1-), containing returns for upward and downward periods, respectively (for the labels of the periods see S4 Table). For these two sample sets we investigate the explanatory power for standard deviation, the CAPM beta, and the Shannon and Rényi entropies using the same parameter for the histogram-based entropy estimation as for the previous experiments. Fig. 5 and Fig. 6 show the results in the expected risk premium -risk coordinate system.
Our results for the bullish and bearish regimes show that the different risk measures behave similarly in terms of the positive and negative relationships between risk and return. This behavior underlines the fact that an entropy-based risk measure can give contradictory results in a similar way to traditional risk estimations in different regimes. In bullish market circumstances we find a very high explanatory power for all kinds of risk measures: 33.90%, 36.67%, 43.45% and 42.36% with standard deviation, the CAPM beta, the Shannon entropy and the Rényi entropy, respectively. As for the full sample tests, the slopes of the regression lines are positive, meaning that higher risk-taking promises higher returns. In contrast to the bullish market, during downward trends higher risktaking does not result in higher returns and, indeed, the higher the risk the higher the negative premium achieved by the investor. We have to mention that the explanatory power of the CAPM beta is higher than that of the entropy-based risk measures. Our entropy results are in line with those for the CAPM beta; and the regime dependency is clear as well. On the other hand, the explanatory power is again much higher for this regime than for the full sample. Altogether, we argue that the test results for the full sample give a better comparison opportunity, as the sample sizes of the bullish and bearish markets are different and at the present moment the investor cannot decide whether there is an upward or a downward trend.

Short term explanatory and predictive power
Although attractive results are achieved within the sample, this does not necessarily mean high efficiency outside the sample. Therefore we took several ten-year periods, shifting the starting year by one year for each, with the first period being 1985 to 1994 and the last 2002 to 2011. As the full data set covers 27 complete years, we used 18 ten-year periods. We split each ten-year period into two shorter five-year periods (P2i and P2o), with the risk measures being estimated based on the first period and the predictive efficiency being measured in the second period. In the previous sections, we have presented the results for insample for the full sample and for the different regimes, and here we summarize these and we also compare the long-term in-sample results with the short-term insample and out-of-sample results. Table 1 summarizes the explanatory power of the investigated risk measures for the different samples.ĝ P1 ,ĝ P1 z , andĝ P1 { show the results of the long-term analysis   for the full period and during the upward and downward trends, respectively; g P2i ,andĝ P2o stand for the average efficiency measured for short-term in-sample and out-of-sample, respectively; and s RĝP2i ð Þ, and s RĝP2o ð Þ measure the relative standard deviation of the efficiency when applying the in-sample and out-ofsample test for short periods (For the detailed results for all periods see Table 2 and Table 3). While the standard deviation risk measure performs almost the same in the long and the short run (7.83% vs. 7.94), its predictive efficiency is surprisingly good (9.70%). The explanatory power of the CAPM beta in the long period is low (6.17%), while the average efficiency in the short periods is more than twice as high (13.31%). We use arithmetic averages [22]. Comparing the results for in-sample and the out-of-sample, the predictive power of the beta is relatively low (6.45%), which suggests that the model may be over-fitted for the training sample. The Shannon entropy performs better than the standard deviation and the CAPM beta in each sample. The Rényi entropy shows the highest explanatory power in the long run; however, in short periods the Rényi entropy performs worse than the Shannon entropy. Comparing the reliability of the risk estimators, the standard deviation of the in-sample and out-of-sample results is the lowest for the entropy risk measures, and the highest for the CAPM beta. Summarizing our results, we state that the beta can beat the entropy only in the case of bearish market circumstances. In any other situation, entropy seems to be a better and more reliable risk measure.

Conclusions
Entropy as a novel risk measure combines the advantages of the CAPM's risk parameter (beta) and the standard deviation. It captures risk without using any information about the market, and it is capable of measuring the risk reduction effect of diversification. The explanatory power for the expected return within the sample is better than the beta, especially in the long run covering bullish and  (1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994) to period (2002-2011). We estimate risk measures of 150 randomly selected securities from the S&P500 index using standard deviation (s), CAPM beta (b), Shannon entropy (H 1 ) and Ré nyi entropy (H 2 ) risk estimation methods by daily risk premiums in the first 5 years (P2i) and measure the predicting power on the next 5 years (P2o) by estimating the goodness of fit of linear regression (R 2 ). Both types of entropy functions are calculated by histogram based density function estimation, with 175 bins for Shannon entropy and 50 bins for Ré nyi entropy. We apply t-statistics by bootstrapping method to measure whether differences in R 2 s are significant. We use *s to designate that the entropy based risk measure is significantly higher than the standard deviation and CAPM beta; ***, ** and * stands for 1%, 5% and 10% significance level respectively. doi:10.1371/journal.pone.0115742.t003 Entropy-Based Financial Asset Pricing bearish periods; the predictive power for the expected return is higher than for standard deviation. Both the Shannon and the Rényi entropies give more reliable risk estimation; their explanatory power exhibits significantly lower variance compared to the beta or the standard deviation. If upward and downward trends are distinguished, the regime dependency of entropy can be recognized: this result is similar to that for the beta. Among the entropy estimation methods reviewed, the histogram-based method proved to be the most efficient in terms of explanatory and predictive power; we propose a simple estimation formula for the Shannon and the Rényi entropy functions, which facilitates the application of an entropy-based risk measure.
Supporting Information