Curve Fitting of the Corporate Recovery Rates: The Comparison of Beta Distribution Estimation and Kernel Density Estimation

Recovery rate is essential to the estimation of the portfolio’s loss and economic capital. Neglecting the randomness of the distribution of recovery rate may underestimate the risk. The study introduces two kinds of models of distribution, Beta distribution estimation and kernel density distribution estimation, to simulate the distribution of recovery rates of corporate loans and bonds. As is known, models based on Beta distribution are common in daily usage, such as CreditMetrics by J.P. Morgan, Portfolio Manager by KMV and Losscalc by Moody’s. However, it has a fatal defect that it can’t fit the bimodal or multimodal distributions such as recovery rates of corporate loans and bonds as Moody’s new data show. In order to overcome this flaw, the kernel density estimation is introduced and we compare the simulation results by histogram, Beta distribution estimation and kernel density estimation to reach the conclusion that the Gaussian kernel density distribution really better imitates the distribution of the bimodal or multimodal data samples of corporate loans and bonds. Finally, a Chi-square test of the Gaussian kernel density estimation proves that it can fit the curve of recovery rates of loans and bonds. So using the kernel density distribution to precisely delineate the bimodal recovery rates of bonds is optimal in credit risk management.


Introduction
Credit risk is the distribution of financial losses caused by unexpected changes in compliance of financial agreements. The recovery rate is an important measure of how much we can retrieve from bad debts. So it is crucial to figure out what kind of distribution recovery rates of the sample comply with. An important key in building the credit risk model is the recovery rate in default or loss given default (LGD) function, expressed as a ratio (dollar recovery/amount invested) [1].
Recently, research on recovery rate has mainly focused on factors that impact recovery rate, correlation between recovery rate and default rate, its distribution, etc. [2]. Compared with default rate, the influencing factors of recovery rate are more complex. The most representative model is Losscalc Model of Moody's. The correlation between recovery rate and default rate turns out to be positive. Hu & Perraudin [3], Rosch & Scheule [4] find that it will underestimate the portfolio's loss if the correlation is neglected.
Histogram is common and is widely used to depict the distribution, but it can't be a smooth curve [5]. Beta distribution estimation is widely used for simulating the curve of the recovery rates [6]. Besides, Frye [7] assumes that the recovery rate follows a normal distribution and Pykhtin [8] establishes a log-normal distribution and Andersen & Sidennius [9] discuss the probitnormal distribution. Dullmann & Trapp [10] utilize a logit-normal distribution and empirically analyse the recovery rates. After comparing the results with two other extended models, Frye [7] and Pykhtin [8], the log-normal distribution is not found to be as good as the other two distributions.
How to precisely depict the randomness of recovery rate is essential to the estimation of the loss of portfolio and economic capital. In this paper, we mainly focus on distribution of recovery rate. Beta distribution is widely used in many credit risk models such as CreditMetrics by J.P. Morgan, Portfolio Manager by KMV and Losscalc by Moody's. However, sometimes it ceases to be effective because of the fact that the recovery rates curve always has two peaks while Beta distribution estimation can only demonstrate one peak. So in this paper we adopt the Gaussian kernel density estimation, a nonparametric estimation, to solve this problem and justify our hypothesis. Kernel density estimation put forward by Emanuel Parzen (1955) and Murray Rosenblatt (1962),also named as the Parzen-Rosenblatt window method, is a non-parametric way to estimate the probability density function of a random variable. Ruppert and Cline proposed revision of the kernel density estimation method based upon data density function and clustering algorithm. We can build the prediction model of value at risk through the kernel density estimation of a random variable. Moreover, weighted processing of the estimated variation coefficient can help build different risk prediction models. In order to avoid producing the setting model error caused by the preset distribution of stock returns, Liu & He [11] used the kernel density estimation to fit stock returns, and then tested the result by Monte Carlo simulation. Their study shows kernel density estimation result is a good approximation of the real stock returns distribution. Zhen & Li [12] study the distribution of Hang Seng index returns by the nonparametric kernel density estimation, which exhibits the fat tail characteristic. Shi & Huang [13] use the kernel density estimation method to analyze the dynamic economic growth at the provincial level in China from 1978 to 2007. Their research indicates that the growth distribution at provincial level has the trend of bimodal distribution, which is especially obvious in 1999. In this paper, we use kernel density estimation to fit the distribution of the bimodal or multimodal samples of recovery rates of corporate loans and bonds.
This paper is organized as follows. In Part 2.1 the recovery rate and sources of data are introduced. Part 2.2 introduces the Beta distribution estimation method [6] with five different shapes. In Part 2.3 we introduce properties of estimation of the kernel density and several ways to figure out the bandwidth of the kernel density. In Part 2.4, the Chi-square test is introduced [14]. Part 3 shows the results. It contains the study of fitting the curves to the recovery rates of corporate loans and bonds (from Moody's Investors Service) for the years 2009-2011 by histogram, Beta distribution estimation and kernel density estimation. We favor Beta distribution estimation over histogram in our study. Kernel density estimation depicts the bimodal or multimodal distributions of recovery rates while beta distribution estimation cannot. In addition, we test whether the kernel density estimation can fit the distribution of the recovery rate. Part 4 concludes.

Data of Recovery Rate
Recovery is usually measured by two indicators: the ultimate recovery rate -value of assets that the creditors eventually retrieve, and the price of debt after default. Prices of defaulted assets only exist in the defaultable security market in public, but some of the tradable bonds can also offer the data of their prices after default in private. In addition, default debt prices should be discounted, however, as default bonds generally lack liquidity, it is very difficult to calculate the ultimate recovery rates. In this paper we collect the data of recovery rates of corporate loans from year 2010 to year 2011 [15], [16] and those of corporate bonds from year 2009 to year 2011 for analysis.

Beta Distribution Estimation
The common Beta density function is defined as follows: means a Gamma function. The shape of Beta distribution is based on values of parameters a,b. 5) if a~1,b~1, the distribution is an uniform distribution; Figure 1 Shows the different shapes of Beta distribution from simulation by Matlab7.6 software.
The relationships between statistics such as mean u and variance s 2 and the parameters a,b are stated as follows:

Kernel Density Estimation
3.1 Definition of kernel density estimation. Kernel density estimation is a particular nonparametric technique to estimate the underlying density as a weighted average of local functions centered at each sample point. It can asymptotically converge to any density function. Suppose X 1 ,X 2 , ::: ,X n are samples from population X , n is the number of samples. If there is a bounded function k(y) satisfying: 1) f (x) is the kernel density estimation, where K is a kernel function, h.0 is a smoothing parameter called the bandwidth and n is the number of samples. There are many common types of kernel function in Table 1.
In this paper, we choose the Gaussian kernel density, and its expression is Putting Equation (5) into Equation (4), we have Equation (6) is the final Gaussian kernel density function. 3.2 Bandwidth estimation. The bandwidth h plays a paramount role in the estimation compared with the form of K.
Martin [17] believes that if h is large, the estimation will be oversmooth, and vice versa. There are three main standards to decide the bandwidth 1) Mean square error, MSE 2) Integral mean square error is adopted when the density is continuous, MISE 3) Asymptotic integral mean square error, AMISE Also there are many other methods to choose the bandwidth. Most of these methods are based on the idea of minimizing the MSE or the MISE. The following methods [18] are often used to select the bandwidth for: 4) Least-squares cross-validation. 5) Biased cross-validation. 6) Plug-in bandwidth selection.

7) Smoothed cross-validation 8) Root-n bandwidth selection 9) The Contrast Method
As n become larger, h becomes smaller. However, if h is too small, it reduces the accuracy of the estimation. On the other hand, if h is too large, its estimated curve will be too smooth. In most cases, MSE and MISE are used most commonly.

Chi-square Goodness-of-fit Test
In order to assess whether the qualitative data fit the kernel density distribution, We adopt Chi-square goodness-of-fit test to test the results. The test is performed by grouping the data into bins, calculating the observed and expected counts for the bins, and computing the chi-square test statistic shown as Where k means the sample can be divided into k bins or intervals, O i are the observed counts and E i are the expected counts in bin i.
The null hypothesis means that the frequency in each interval equals that of corresponding kernel distribution. If the p-value is below the significance level (we often choose 5% or 10%), the null hypothesis  can be rejected and the kernel density estimation doesn't quite fit the curve of the recovery rates of municipal bonds. If the result is above the significance level, we have no reason to reject the null hypothesis Where b i stand for the value of bin edges, n is the number of observations in the sample. F ( : ) is the cumulative density function of the kernel distribution.

Histograms of Recovery Rates
By sorting the data from the Moody's analysis report, we get 58 effective observations of recovery rates of corporate loans from year 2010 to 2011 and 282 ones of corporate bonds from year 2009 to 2011. The results are shown in Table 2. Table 2 exhibits the main characteristics of the sample of recovery rates. Meanwhile their histograms are given as follows (See Fig. 2 and Fig. 3).
The histogram of defaulted loans' recovery rates (Fig. 2) demonstrates two peaks, where a bimodal characteristic can be seen that the probabilities of full recovery rates ranging from 0.9 to 1 and low ones from 0.1 to 0.2 are both very high. Also Figure 3 shows that 4 peaks exist at the intervals of 0 to 0.1, 0.2 to 0.3, 0.4 to 0.5 and 0.8 to 0.9. As we know, the number of default events has increased since the financial crisis, and individuals have different abilities to repay their loans and bonds. However, it used to be rare to have so many default events.

Beta Distribution Estimation of the Recovery Rate
A common method to estimate the distribution of recovery rates is Beta distribution, which forms a smooth curve compared with the histogram. Through the data of recovery rates of defaulted corporate loans, we get the mean which equals 0.5574 and the variance is 0.1003. Inputting the outcome above into the Equations (2) and (3) leads to values of parameters a,b, being 0.9797 and 0.7712, respectively. By Matlab7.6 software [19], we get the simulated distribution of the real recovery rate (Figure 4). The Beta distribution estimation cannot fit the bimodal distribu-tion of defaulted loans' recovery rates. Meanwhile, we can get the value of parameters a,b of defaulted bonds' recovery rates, which are 0.9797 and 0.7712 respectively. Besides, we get simulated distribution of recovery rates in Figure 5. It also illustrates that Beta distribution estimation can partly describe the distribution of recovery rates but cannot fit its multiple peaks characteristic.
As is shown above, in figure 4, distribution of corporate loans' recovery rates has two peaks but it is known that the five kinds of beta estimation results (Figure 1) don't fit the bimodal distribution. When it comes to distribution of corporate bonds' recovery rates, it is obvious that the multiple-peaks in Figure 5 are beyond the ability of the Beta estimation which can only simulate one peak. So it can't be a perfect tool to depict bimodal and multimodal distributions of the defaulted corporate recovery rates. The Gaussian kernel density estimation is introduced to solve this problem.

Kernel Density Estimation of the Recovery Rate
The finger rule has been used in the analysis of recovery rates of bank loans, see Servigny [1] for example. It is stated as: whereŝ s is the sample standard deviation and n is the observed number. In this paper, the finger rule is adopted to estimate the bandwidth we need and then we put the value of h into Equation (6). As to defaulted corporate loans' recovery rates, the number of observations is 58 and the standard deviation is 0.3168. So the bandwidth h of this sample is 0.0624, derived through the finger rule. The bandwidth of the kernel is a free parameter which exhibits a strong influence on the resulting estimator. To illustrate its effect, the three kinds of Gaussian kernel density estimations are demonstrated in Figure 6 with bandwidths of 0.0208, 0.0624 and 0.1873 respectively. Figure 6 leads to a conclusion that the larger the bandwidth is, the smoother the curve is. The green curve with the largest bandwidth is much flatter than the other two with 1/3 h and h. The curve with bandwidth of 1/h is much steeper and exhibits multiple peaks while the one with 3 h is too flat to depict the bimodal characteristics of corporate loans' recovery rate. So the curve with bandwidth h best demonstrates the recovery rate's distribution with two peaks and this proves that the bandwidth selection method we choose is appropriate. As for defaulted corporate bonds, number of observations is 282 and standard deviation is 0.2997, so the bandwidth h is 0.0314. Through Matlab7.6 software, we get its fitting curve ( Figure 7). As we can see, the bonds' recovery rates exhibit several peaks according to its density histogram. Combining the kernel density estimation curve into it, we find that the curve of the kernel density estimation has perfect fit to the distribution of the bonds' recovery rate.  Figure 8 and Figure 9 illustrate the differences between the two methods in terms of fitting of the curve to recovery rates of corporate loans and bonds.
In Figure 8, compared with Beta distribution estimation, Gaussian kernel density distribution estimation result exhibits two peaks while the Beta distribution estimation results only show a J shape curve, which contradicts the original data of recovery rates. And also in Figure 9, the beta distribution estimation result is an inverse J curve which cannot depict the multimodal distribution of recovery rates. So the kernel density estimation really better fits the distributions of the bimodal or multimodal samples with data of recovery rates of corporate loans and bonds.

Chi-square Goodness-of-fit Test Result
To assess whether the qualitative data fit the kernel density distribution, a Chi-square goodness-of-fit test is used. In the case of recovery rates of corporate loans and bonds, effective observations for the two samples are 58 and 282. At the same time, we choose 5 bins and 10 bins for each sample, because there exists one variable h, their degrees of freedom are n{k{1~5{1{1~3 and n{k{1~10{1{1~8 respectively. We get the results in Table 3 and Table 4 by R.
As we can see in Tables 3 and 4, p-values are larger than 5% significance level, so we can not reject the null hypothesis. It means that the kernel density estimator can quite fit the curve to recovery rates of corporate loans and bonds.

Discussion
Our results find that the distribution of recovery rates display a new characteristic in the recent data and offer a new method, kernel density estimation, to estimate the distribution of the recovery rates. As we all know, the recovery rate is essential for managers and government officials to estimate the loss of their portfolios and the amount of insurance reserves. To ignore the randomness of the recovery rate by assuming it to be a constant may lower the default risk. Nowadays, financial agents have started paying more attention to management of recovery rates.
We find that corporate loans' recovery rates follow a bimodal distribution and corporate bonds' recovery rates have multiplepeaks. In the past years, especially before 2008, since economies were growing well, there were not too many default events, so the default probability was low and recovery rates were quite high. However, since the outbreak of the financial crisis, default events have increased, different individuals have different abilities to repay the debts. So the data we study shows a bimodal and multimodal characteristics.
We find the former two common ways to estimate the recovery rates are out of use because of the new characteristic of the recovery rates, so we introduce a new kernel density estimation to overcome it. In extant research, people always treat it as a statistical category and relatively haven't paid too much attention. In our paper we test the two common ways on research of the distribution of recovery rate and then raise the new method, Gaussian Kernel density estimation. One is to utilize statistical methods such as histogram to estimate the distribution. The other is to parametrically estimate its density function that is set initially. Beta distribution is the most common distribution assumption in many models such as CreditMetrics by J.P.Morgan, Portfolio Manager by KMV and Losscalc by Moody's. Combining with the data from Moody's investor services, we find that histogram is a rough depiction and Beta distribution estimation has a main defect that it loses efficacy with the bimodal and multimodal curves. However, the Gaussian kernel density estimation, a nonparametric estimation methodology, offsets the imperfection. Finally, we prove this perspective and testify the hypothesis by the Chi-square test.
So by statistical simulation of the distribution of recovery rates with the data from Moody's investor report, we compare the kernel density estimation with Beta distribution. We find that the kernel density estimation is better than the Beta distribution estimation while the data show bimodal or multiple-peaked characteristics, especially when the economy is in distress. And while the default events are increasing, the kernel density estimation will be beneficial for supervisors to precisely evaluate the loss of portfolio and allocate economic capital.