Credit Risk Prediction for Peer-To-Peer Lending Platforms: An Explainable Machine Learning Approach

  • Chong Pei Swee Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
  • Farid Meziane School of Computing and Engineering, University of Derby, UK
  • Jane Labadin Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia
Keywords: Credit Risk Evaluation, Peer-to-Peer Lending, Logistic Regression; Explainable Machine Learning; Explainable AI.


Small and medium enterprises face the challenge of obtaining start-up fund due to the strict rules and conditions set by banks and financial institutions. The plight yields to the growth in popularity of online peer-to-peer lending platforms which are an easier way to obtain loan as they have fewer rigid rules. However, high flexibility of loan funding in peer-to-peer lending comes with high default probability of loan funded to high-risk start-ups. An efficient model for evaluating credit risk of borrowers in peer-to-peer lending platforms is important to encourage investors to fund loans and justify the rejection of unsuccessful applications to satisfy financial regulators and increase transparency. This paper presents a supervised machine learning model with logistic regression to address this issue and predicts the probability of default of a loan funded to borrowers through peer-to-peer lending platforms. In addition, factors that affect the credit levels of borrowers are identified and discussed. The research shows that the most important features that affect probability of default are debt-to-income ratio, number of mortgage account, and Fair, Isaac and Company Score.


Avery, R. B., Brevoort, K. P., & Canner, G. (2012). Does Credit Scoring Produce a Disparate Impact? Real Estate Economics, 40. https://doi:10.1111/j.1540-6229.2012.00348.x

Bachmann, A., Becker, A., Buerckner, D., Hilker, M., Kock, F., Lehmann, M., & Tiburtius, P. (2011). Online Peer-to-Peer Lending – A Literature Review. Journal of Internet Banking and Commerce, 16(23).

Blackburn, M. L., & Vermilyea, T. (2012). The prevalence and impact of misstated incomes on mortgage loan applications. Journal of Housing Economics, 21(2), 151–168.

Chowdhury, M. Z. I., & Turin, T. C. (2020). Variable selection strategies and its importance in clinical prediction modelling. Family Medicine and Community Health, 8(1), e000262.

Coenen, L., Verbeke, W., & Guns, T. (2021). Machine learning methods for short-term probability of default: A comparison of classification, regression and ranking methods. Journal of the Operational Research Society, 73(1), 191–206.

Davis, J., & Goadrich, M. (2006). The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning - ICML ’06.

Diaz-Serrano, L. (2005). Income volatility and residential mortgage delinquency across the EU. Journal of Housing Economics, 14(3), 153–177.

Dong, G., Lai, K. K., & Yen, J. (2010). Credit scorecard based on logistic regression with random coefficients. Procedia Computer Science, 1(1), 2463–2468.

Emekter, R., Tu, Y., Jirasakuldech, B., & Lu, M. (2014). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Applied Economics, 47(1), 54–70.

Fisher, R. A. (2022b). Statistical Methods for Research Workers, 12th Ed. Rev. (Twelfth Edition). Oliver and Boyd.

George, N. (2018). All Lending Club loan data 2007 through current Lending Club accepted and rejected loan data. Kaggle.

Kim, H., & Devaney, S. A. (2001). The Determinants of Outstanding Balances Among Credit Card Revolvers. Journal of Financial Counseling and Planning, 12(1).

Meyer, T. (2007, July 10). Online P2P lending nibbles at banks’ loan business. Retrieved from

Namvar, E. (2013). An Introduction to Peer to Peer Loans as Investments. SSRN Electronic Journal.

Saito, T., & Rehmsmeier, M. (2015). The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE, 10(3), e0118432.

Scully, M. (2017, June 14). Biggest online lenders don't always check key borrower data. Retrieved August 29, 2022, from

Setiawan, N., Suharjito, & Diana. (2019). A Comparison of Prediction Methods for Credit Default on Peer to Peer Lending using Machine Learning. Procedia Computer Science, 157, 38–45.

Wang, H., Xu, Q., & Zhou, L. (2015). Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble. PLOS ONE, 10(2), e0117844.

Wang, Z., Jiang, C., Ding, Y., Lyu, X., & Liu, Y. (2018). A Novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electronic Commerce Research and Applications, 27, 74–82.

Yen, S. J., & Lee, Y. S. (2006). Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset. Intelligent Control and Automation, 731–740.

Zhou, J., Li, W., Wang, J., Ding, S., & Xia, C. (2019). Default prediction in P2P lending from high-dimensional data based on machine learning. Physica A: Statistical Mechanics and Its Applications, 534, 122370.

Zhou, Y., & Wei, X. (2020). Joint liability loans in online peer-to-peer lending. Finance Research Letters, 32, 101076.

How to Cite
Pei Swee, C., Meziane, F., & Labadin, J. (2022). Credit Risk Prediction for Peer-To-Peer Lending Platforms: An Explainable Machine Learning Approach. Journal of Computing and Social Informatics, 1(2), 1-16.