Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Evaluating borrowers’ default risk with a spatial probit model reflecting the distance in their relational network

  • Jong Wook Lee,

    Roles Conceptualization, Data curation, Formal analysis, Methodology, Software, Visualization, Writing – original draft

    Affiliation Department of Information and Industrial Engineering, Yonsei University, Seoul, Republic of Korea

  • So Young Sohn

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

    sohns@yonsei.ac.kr

    Affiliation Department of Information and Industrial Engineering, Yonsei University, Seoul, Republic of Korea

Abstract

Potential relationship among loan applicants can provide valuable information for evaluating default risk. However, most of the existing credit scoring models either ignore this relationship or consider a simple connection information. This study assesses the applicants’ relation in terms of their distance estimated based on their characteristics. This information is then utilized in a proposed spatial probit model to reflect the different degree of borrowers’ relation on the default prediction of loan applicant. We apply this method to peer-to-peer Lending Club Loan data. Empirical results show that the consideration of information on the spatial autocorrelation among loan applicants can provide high predictive power for defaults.

1. Introduction

Credit risk management is very important for service firms in the lending business. To predict the probability of default of loan applicant that is essential for credit risk management, machine learning models use two types of borrower information: standard “hard” information and nonstandard “soft” information [1]. The former directly reflects the loan applicants’ financial status or creditworthiness, while the latter includes those that do not have a direct relationship to the credit applicant’s financial status or creditworthiness such as age or residence. Existing studies have shown that not only hard information but also soft information, which is less relevant to their financial condition, is helpful in predicting default risk [15]. While both hard and soft information has been used in most credit scoring models, what is missing is the potential relation among loan applicants. Relationship among loan applicants that are at high risk of default can also provide valuable information for evaluating default risk [68].

In this study, we use a borrower relationship network based on the borrowers’ information provided for loan applications. This network is utilized as a spatial weight matrix for a spatial probit model that reflects different degrees of borrowers’ relation for the prediction of a loan default. Our proposed approach is applied to peer-to-peer (P2P) lending.

Online P2P lending allows individuals to lend money to other individuals through online platforms without the intervention of a financial institution. These online P2P lending platforms are gaining popularity due to their low operating costs compared with traditional lending programs [9]. However, online P2P lending faces a significant problem, such as information asymmetry between borrowers and lenders, that is, the reliability of a borrower’s credit is unknown to the lender [10]. Therefore, the use of relationship information among borrowers beyond those provided on the P2P platform is necessary. As it is difficult to discover realistic relationship information between borrowers in a P2P landing platform, this study defines the data-driven latent relationships between borrowers in terms of the similarity of their hard and soft information. We expect that the data-driven latent relationships information between borrowers can improve default risk prediction.

This paper is organized as follows. Section 2 reviews prior studies on default prediction in online P2P lending. Section 3 explains the methodologies employed, and Section 4 explores the Lending Club Loan (LCL) dataset used for this study. Finally, Section 5 presents the results, and Section 6 discusses the results, limitations, and suggestions for improvement.

2. Literature review

Models for default risk prediction in P2P lending services are divided into three categories: the probability of default (PD), exposure at default (EAD), and loss given default (LGD). Among them, PD models have been explored steadily [11]. The PD model predicts borrower’s default using classification models based on the statistical or machine learning approaches. Statistical methods have the advantage of being able to quantitatively show the effect of each factor on the borrowers’ default [12]. Emekter et al. [13] used a logistic regression model to predict the default probability of borrowers and found that Fair, Isaac and Company scores are a very important factor. However, statistical methods have the disadvantage of requiring strong assumptions in the observed data [14]. Meanwhile, machine learning methods have strong default prediction performance without requiring any statistical assumptions. These models include neural network [15], support vector machine [16, 17], and random forest [18]. However, these models have a fatal drawback, that is, individual factors do not directly show the effect on borrowers’ default.

It is also important to choose the optimal features used to predict default risk. Generally, hard information can reflect borrowers’ repayment ability [19], while soft information can reflect borrowers’ repayment willingness [20]. Hard information plays an important role in explaining default risk because it directly represents the borrowers’ financial status. However, online P2P lending platforms have difficulty collecting sufficient hard information. To overcome these limitations, the importance of soft information that is not related to the borrowers’ financial status is increasingly emphasized. Lin et al. [21] discovered that information on gender, age, educational level, and marital status play a significant role in predicting default. Recently, unstructured data, such as text and image information, as well as structured data, have been used as soft information. Dorfleitner et al. [22] used textual soft information containing a description of the loan purpose such as text length, spelling errors, and the presence of positive emotion-evoking keywords. Jiang et al. [23] used a topic model to extract representative features from descriptive text concerning loans.

However, few studies have used information on the relationship among individual borrowers in online P2P lending services. Calabrese et al. [24] defined bank networks by estimating interbank relationships as aggregate claims to predict bank contagion. Agosto et al. [6] defined business networks by estimating inter-company relationships as aggregate trade volumes to predict business default from P2P platforms that specialize in business lending. Unlike for banks and companies, obtaining quantitative indicators of relationships among individuals is difficult. In this study, we propose a network definition among individual borrowers and use this relationship information as independent information.

3. Methodology

3.1 Spatial probit model

Generally, the latent response model is the method used to fit the binary response variable Y as a regression model [25]. The model used in this study is a spatial probit model, which has a spatial autoregressive structure and can be used with a binary response variable. Taking the latent underlying quantity as being represented by a continuous variable , we consider the observation mechanism as (1) with i = 1, 2, ⋯, n where n is the number of observations. We implement the spatial structure with an autoregressive model specification, such that (2) where Y* is a continuous latent vector; X represents an n × k matrix of explanatory variables with related coefficient vector β; W is a spatial lag weights matrix with ρ as the associated coefficient; and ε is the error term.

This spatial probit model implies heteroskedastic errors e as follows: (3) where e = (IρW)−1ε with variation: .

Calabrese and Elkink [26] reviewed various methods for estimating parameters ρ and β in Eq (3). Among them we performed parameter estimation using the generalized method of moments (GMM) proposed by Pinkse and Slade [27], which derive the GMM equations from the likelihood function. This method is extended by Klier and McMillen [28] to the logit model. It is more robust than the maximum likelihood estimation because it does not depend on the assumption that the error term follows a normal distribution [27].

A GMM estimator is defined as follows: (4) where θ = [ρ, β], ui = yi − pi, ; σi is a diagonal element of covariance matrix [(IρW)′(IρW)]−1; Z is a matrix of instruments; and M is a positive definite matrix that is generally initialized to an identity matrix. We define the instrument matrix Z = {X, WX, W2X, W3X}, as proposed by Kelijian and Prucha [29].

To estimate the parameter, θ, we use a two-step estimation procedure:

  1. First, fix ρ = ρ0, then estimate the β0 with GMM and
  2. Find the optimal value of through GMM as the initial value of θ0 = [ρ0, β0] found in (1).

The estimated spatial lag is used to test the statistical significance of ρ by the Lagrange Multiplier (LM) test proposed by Anselin [30]. The LM statistic for spatial lag is defined as: (5) where with , e0 = y − X(X′X)−1X′y, and eL = y − X(X′X)−1X′Wy.

The spatial lag weights matrix between borrowers on the P2P platform, W, is defined in Section 3.2.

3.2 Borrowers`relation network

In this study, we construct a network with each borrower as a node and the distance between them as an edge to represent the relationship between the borrowers. The distance between them is defined as the degree of similarity in terms of their hard and soft information. Similarity between numeric information is easily defined by Euclidean distance, but defining similarity between categorical information is a challenge. We use a method proposed by Ahmad and Dey [31] to calculate the distance between borrowers with mixed numeric and categorical information.

Let us assume Bi and Bj are two borrowers with m hard and soft information attributes: X1, …, Xm. The two borrowers may be represented as Bi = {Xi1, Xi2, …, Xim} and Bj = {Xj1, Xj2, …, Xjm} where the first mr attributes are numeric, the next mc attributes are categorical, and mr + mc = m. The distance between Bi and Bj, denoted by Dist(Bi, Bj) is computed as follows: (6) where st is the significance of the t-th numeric attribute, and δ(Xit, Xjt) is a distance function between the t-th categorical attributes in Bi and Bj. The distance between two distinct values, c1 and c2, of any categorical attribute Xt is given by: (7) where δtt` (c1, c2) = Pt(c` |c1) + Pt(~c` |c2) − 1, c` denotes a subset C of values of Xt` that maximizes the quantity Pt(c` |c1) + Pt(~c` |c2); ~c` denotes the complementary set of values occurring for attribute Xt`; and Pt(c` |c1) denotes the conditional probability that an element having value c1 for Xt` has a value belonging to c` for Xt`. To compute the significance of normalized numeric attributes, we discretize them to have L equal intervals: u[1], u[2], ⋯, u[l]. The significance of the t-th numeric attribute, st, is computed as: (8)

The relationship between two borrowers (Bi and Bj) is mapped so that the closer the distance is, the stronger the relationship. We use double-power distance weights, and the degree of relationship between Bi and Bj is evaluated as follows: (9) where d donates the maximum radius of influence (bandwidth). To use Wij as a spatial weight matrix, row normalization is performed.

3.3 Evaluation metric

To measure the performance of the proposed spatial probit model, we used the following evaluation metrics: accuracy, precision, recall, F1 score, and area under the receiver operator characteristic (ROC) curve. These 4 indicators are the most used indicators for performance evaluation of binary classification tasks such as default prediction. The accuracy is the most intuitive performance indicator of a classification model and is defined as the ratio of correct to total predictions. The precision is the percentage of borrowers that actually defaulted out of those who were predicted to default. The recall is the percentage of borrowers predicted to default out of those actually defaulted. The F1 score is the harmonic mean of the precision and recall. Precision, recall, and f1 score are used as important indicators in a credit scoring task where borrowers with default is much less than borrowers with fully paid [32]. The ROC curve for a binary classification problem represents the true positive proportion as a function of the false positive proportion.

4. Data

We used LCL data from Lending Club, the largest online credit marketplace offering P2P lending worldwide. This data is open to public and provides 2.26 million loan records from June 2007 to December 2018. There are 36-month and 60-month long loans provided by LCL data. Therefore, there exist quite a few borrowers who belong to the “Current” category out of those who received the loan after 2013. Their default record is unknown. Because of these data problems, we only used loans issued in 2012. In the 2012 loan record, Fully Paid, Default, and Charged Off status existed, and in this study, Fully Paid was defined as a good result and the other two were defined as bad results.

In sum, our dataset consists of 51,314 issued loans, including 8,241 defaults. The LCL dataset describes 145 attributes of borrowers but like previous studies, selected only the important attributes with several references [18, 33, 34]. Brief descriptions of the seven numeric and five categorical attributes used in this study are presented in Table 1. Employment length and home ownership are soft information not directly representing borrowers’ financial status. We removed the missing values for the 12 variables and obtained 37,012 borrowers with fully paid loans and 7,080 borrowers with defaulted loans.

We performed preprocessing, taking into account the dispersion of each attribute. “Annual income,” “Loan amount,” and “Revolving balance” are log-transformed to reduce variance. Since 77% of all borrowers are classified as A, B, or C in the "Grade" attribute, classifications D to G are combined together as D or less. Since 78% of all borrowers are also concentrated under the categories debt consolidation and credit card in the "Loan purpose" attribute, we combined the remaining categories into the category other. The "Employment length" attribute is newly categorized as short, representing less than five years; middle, five to nine years; and long, 10 years or more. Thus, the categorical variables increased to nine, and their distribution is shown in Fig 1.

thumbnail
Fig 1. The distribution of categories for each categorical attribute.

https://doi.org/10.1371/journal.pone.0261737.g001

We performed the Welch`s T test on the difference between borrowers with fully paid loans and borrowers with defaulted loans for numeric attributes, as shown in Table 2. There were no statistically significant differences in the "Revolving balance" attribute under the significance level of 0.05. However, for attributes related to income, borrowers with fully paid loans are observed to be more stable than borrowers with defaulted loans.

thumbnail
Table 2. Result of the Welch`s T test for numeric attributes.

https://doi.org/10.1371/journal.pone.0261737.t002

We performed a chi-square test to check if being in default in a categorical attribute is independent of its categories. Table 3 shows for each category the number of borrowers with fully paid loans and those with defaulted loans, the ratio of borrowers with defaulted loans to borrowers with fully paid loans, and the chi-square statistic with the corresponding p-value. Depending on the “Grade” and the “Loan length,” the default-to-fully-paid ratio was quite different. The “Employment length” did not show a statistically significant difference under the p-value of 0.05.

thumbnail
Table 3. Result of the chi-squared test for categorical attributes.

https://doi.org/10.1371/journal.pone.0261737.t003

5. Experiment

In our dataset, borrowers with defaulted loans account for 16% of the total; thus, there is a class imbalance problem. This leads to a problem whereby the classification model is trained to be biased to predict a major class, and significantly reduces the performance of the prediction of a minor class [35]. To alleviate this problem, we utilized the under-sampling method [36]. We sampled 5,000 borrowers with fully paid loans and 5,000 borrowers with defaulted loans. We limited the range of some numeric attributes to control the dispersion of their min-max normalization. Values greater than 3 for "Inquiries in the last 6 months" and 26 for "Open accounts" were excluded from the sampling process. The spatial weight matrix, W, has been built from the sampled dataset, as described in section 3.2. Numeric variables were divided into three sections of equal length (L). The bandwidth (d) was set to 0.06059, which was the third quantile value of distances between borrowers.

To consider the allowable computation time for parameter estimation, we sampled 2,000 borrowers from the sample dataset, which was divided into 1,500 train datasets and 500 test datasets. Using the train dataset, the parameters: were estimated by GMM. To find the initial ρ0, we observed a change in the “area under the curve” (AUC) for the test dataset by increasing the ρ0 from 0 to 1 at intervals of 0.1. As shown in Fig 2, with an initial ρ0 of 0.5, the test AUC was the highest, at 0.6855. This shows that borrowers are not independent in the borrowers’ relation network, and that there is sufficient spatial autocorrelation between borrowers with defaulted loans.

Table 4 compares the baseline model, logistic regression model without spatial component, with the model presented in this study. In the baseline model, ten attributes were statistically significant at the significance level of 0.1. The default probability of the borrower has a strong negative correlation with the “log(Annual income)” and “log(Revolving balance)” attributes. However, it has a positive correlation with the “Debt to income,” “Revolving utilization rate,” “Grade,” “Loan length,” and “Loan purpose.” In the spatial probit model proposed in this study, seven attributes were statistically significant at the significance level of 0.1. The “log(Annual income)” and “log(Revolving balance)” attributes were underestimated over the baseline model and were not statistically significant. Instead, “log(Loan amount)” and “Revolving utilization rate” have negative coefficients. In addition, the spatial autocorrelation component between borrowers with defaulted loans was 0.505, which was very significant under the significance level of 0.05. Compared to the baseline model, there was an increase in accuracy and AUC. In particular, the proposed model has remarkably increased recall and F1-score, which can be expected to have significant spatial autocorrelation between borrowers with defaulted loans. The additional consideration of spatial autocorrelation in the borrower relation network significantly improved the performance of logistic regression.

thumbnail
Table 4. Result of the estimation of the baseline and SAR models.

https://doi.org/10.1371/journal.pone.0261737.t004

We sampled the training and test dataset 500 times and observed changes in the test performance differences of the baseline and spatial probit models in the entire dataset. To observe the strength of autocorrelation between borrowers with defaulted loans, the initial ρ0 was set to 0.2, 0.5, and 0.8. The results are shown in Table 5. The larger the initial rho, the higher the recall, which means the higher the predictability of the borrowers with defaulted loans. However, too large an initial value creates the risk of reduced accuracy and AUC. In our experiment, when the initial rho is 0.5, the AUC is slightly higher, and the F1-score is significantly higher than the baseline model. Therefore, a consideration of the appropriate level of spatial autocorrelation is expected to contribute significantly to the prediction of the default risk of a borrower.

thumbnail
Table 5. Result of the estimation of the SAR model with 500 repetitions.

https://doi.org/10.1371/journal.pone.0261737.t005

6. Conclusion

This study proposed a spatial probit model to improve default prediction by reflecting the relationship between borrowers, which is defined by the similarity of their characteristics.

We applied this method to 2012 LCL data. We found an evidence of a high level of spatial autocorrelation between borrowers with defaulted loans. Reflecting the spatial autocorrelation among loan applicants did not result in an overall improvement in the accuracy of the default prediction but instead, a significant improvement in the F1-score. An increase in the F1 score is a very significant contribution, since finding borrowers with high default risk is a more important issue than finding normal borrower. This study showed that the additional information of spatial autocorrelation between borrowers with high default risk can alleviate the class imbalance problem in the loan dataset and provide a high predictive power for high default risk borrowers.

However, this study has some limitations. Since the spatial weighting matrix increases enormously in proportion to the square of the number of observations, there are time and memory difficulties in using all the data. In addition, the calculation of the inverse of (IρW) in the parameter estimation process using GMM requires a large amount of computation. Because of these constraints on the spatial weighting matrix, we sampled a small number instead of the entire dataset. If the computing power is complemented and the constraints on the spatial weighting matrix are relaxed, then more robust default predictive modeling can be expected.

References

  1. 1. Angilella S., & Mazzù S. (2015). The financing of innovative SMEs: A multicriteria credit rating model. European Journal of Operational Research, 244(2), 540–554.
  2. 2. Kim Y., & Sohn S. Y. (2007). Technology scoring model considering rejected applicants and effect of reject inference. Journal of the Operational Research Society, 58(10), 1341–1347.
  3. 3. Jeon H., & Sohn S. Y. (2008). The risk management for technology credit guarantee fund. Journal of the Operational Research Society, 59(12), 1624–1632.
  4. 4. Sohn S. Y., Doo M. K., & Ju Y. H. (2012). Pattern recognition for evaluator errors in a credit scoring model for technology-based SMEs. Journal of the Operational Research Society, 63(8), 1051–1064.
  5. 5. Ju Y. H., & Sohn S. Y. (2015). Stress test for a technology credit guarantee fund based on survival analysis. Journal of the Operational Research Society, 66(3), 463–475.
  6. 6. Agosto A., Giudici P., & Leach T. (2019). Spatial regression models to improve P2P credit risk management. Frontiers in artificial intelligence, 2, 6. pmid:33733095
  7. 7. Wei Y., Yildirim P., Van den Bulte C., & Dellarocas C. (2016). Credit scoring with social network data. Marketing Science, 35(2), 234–258.
  8. 8. Óskarsdóttir M., Bravo C., Sarraute C., Vanthienen J., & Baesens B. (2019). The value of big data for credit scoring: Enhancing financial inclusion using mobile phone data and social network analytics. Applied Soft Computing, 74, 26–39.
  9. 9. Zeng X., Liu L., Leung S., Du J., Wang X., & Li T. (2017). A decision support model for investment on P2P lending platform. PloS one, 12(9), e0184242. pmid:28877234
  10. 10. Serrano-Cinca C., Gutiérrez-Nieto B., & López-Palacios L. (2015). Determinants of default in P2P lending. PloS one, 10(10), e0139427. pmid:26425854
  11. 11. Lessmann S., Baesens B., Seow H. V., & Thomas L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1), 124–136.
  12. 12. Crook J. N., Edelman D. B., & Thomas L. C. (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research, 183(3), 1447–1465.
  13. 13. Emekter R., Tu Y., Jirasakuldech B., & Lu M. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54–70.
  14. 14. Kruppa J., Ziegler A., & König I. R. (2012). Risk estimation and risk prediction using machine-learning methods. Human Genetics, 131(10), 1639–1654. pmid:22752090
  15. 15. Ma Z., Hou W., & Zhang D. (2021). A credit risk assessment model of borrowers in P2P lending based on BP neural network. PloS one, 16(8), e0255216. pmid:34343180
  16. 16. Harris T. (2013). Quantitative credit risk assessment using support vector machines: Broad versus Narrow default definitions. Expert Systems with Applications, 40(11), 4404–4413.
  17. 17. Yao X., Crook J., & Andreeva G. (2015). Support vector regression for loss given default modelling. European Journal of Operational Research, 240(2), 528–538.
  18. 18. Malekipirbazari M., & Aksakalli V. (2015). Risk assessment in social lending via random forests. Expert Systems with Applications, 42(10), 4621–4631.
  19. 19. Paul S. (2014). Creditworthiness of a borrower and the selection process in micro-finance: A case study from the urban slums of India. Margin: The Journal of Applied Economic Research, 8(1), 59–75.
  20. 20. Abdou H. A., & Pointon J. (2011). Credit scoring, statistical techniques and evaluation criteria: a review of the literature. Intelligent Systems in Accounting, Finance and Management, 18(2–3), 59–88.
  21. 21. Lin X., Li X., & Zheng Z. (2017). Evaluating borrower’s default risk in peer-to-peer lending: evidence from a lending platform in China. Applied Economics, 49(35), 3538–3545.
  22. 22. Dorfleitner G., Priberny C., Schuster S., Stoiber J., Weber M., de Castro I., et al. (2016). Description-text related soft information in peer-to-peer lending–Evidence from two leading European platforms. Journal of Banking & Finance, 64, 169–187.
  23. 23. Jiang C., Wang Z., Wang R., & Ding Y. (2018). Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annals of Operations Research, 266(1–2), 511–529.
  24. 24. Calabrese R., Elkink J. A., & Giudici P. S. (2017). Measuring bank contagion in Europe using binary spatial regression models. Journal of the Operational Research Society, 68(12), 1503–1511.
  25. 25. Verbeek M. (2008). A guide to modern econometrics. John Wiley & Sons.
  26. 26. Calabrese R., & Elkink J. A. (2014). Estimators of binary spatial autoregressive models: A Monte Carlo study. Journal of Regional Science, 54(4), 664–687.
  27. 27. Pinkse J., & Slade M. E. (1998). Contracting in space: An application of spatial statistics to discrete-choice models. Journal of Econometrics, 85(1), 125–154.
  28. 28. Klier T., & McMillen D. P. (2008). Clustering of auto supplier plants in the United States: generalized method of moments spatial logit for large samples. Journal of Business & Economic Statistics, 26(4), 460–471.
  29. 29. Kelejian H. H., & Prucha I. R. (1998). A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. The Journal of Real Estate Finance and Economics, 17(1), 99–121.
  30. 30. Anselin L. (1988). Lagrange multiplier test diagnostics for spatial dependence and spatial heterogeneity. Geographical Analysis, 20(1), 1–17.
  31. 31. Ahmad A., & Dey L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2), 503–527.
  32. 32. Li W., Ding S., Chen Y., & Yang S. (2018). Heterogeneous ensemble for default prediction of peer-to-peer lending in China. Ieee Access, 6, 54396–54406.
  33. 33. Li Z., Tian Y., Li K., Zhou F., & Yang W. (2017). Reject inference in credit scoring using semi-supervised support vector machines. Expert Systems with Applications, 74, 105–114.
  34. 34. Szwabe A., & Misiorek P. (2018, September). Decision Trees as Interpretable Bank Credit Scoring Models. In International Conference: Beyond Databases, Architectures and Structures (pp. 207–219). Springer, Cham.
  35. 35. Longadge, R., & Dongre, S. (2013). Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707.
  36. 36. Kotsiantis S. B., & Pintelas P. E. (2003). Mixture of expert agents for handling imbalanced data sets. Annals of Mathematics, Computing & Teleinformatics, 1(1), 46–55.