Banks and loan institutions have been facing increasingly stronger competitive pressure. At the same time, a multitude of factors have been reducing their profitability: falling interest rates lead to lower margins, the bank tax takes its toll, whereas the pandemic reduces the creditability of a large group of customers (lower wages, loss of job). When such conditions prevail, every fraction of the profit margin is at stake.
Loan products are a very important source of profits in the financial industry. Especially in situations where low-interest rates significantly limit the possibility of earning on accounts and deposits. At the same time, it seems that financial institutions have run out of almost all traditional methods of competing to increase or maintain profit margins on credit products. It is quite a common practice these days to precisely adjust the credit/loan price to the customer’s risk level. Now the time has come to define credit risk even more precisely. The ability to accurately predict whether a given applicant will pay off a loan or not has been the heart of lending since time immemorial. It may be a decisive factor in either a grand failure or a success. Growth in the correctness in this respect, even by 1%, may exert a significant impact on the profit and loss account of a financial institution. So-called credit scoring has been used to assess credit risk for almost 70 years. Scoring means an assessment of the creditability of a customer applying for a loan or credit, that is made based on data about such customer (mainly from the loan application and accompanying documents – a credit report, personal account history), using statistical methods (identification of the significance of individual features in assessing the likelihood of repayment of a loan/credit by examining the characteristics of those customers who repay a loan/credit or not). Originally, the credit scoring was only applied to individual customers. It turned out to be extraordinarily helpful, hence it was also used for business customers. To create credit scoring it was necessary to make common use of numerous statistical models such as logistic regression, linear discriminant analysis, and decision trees. Best practices additionally apply the methods of feature engineering (WOE transformation) and feature selection (selection of the representation of co-linear variables, selection of features using the stepwise method from general to particular, marginal informational value). The development of statistical modelling methods results in the creation of increasingly advanced techniques. They give more and more accurate scores, yet at the expense of more and more complex algorithms. Recent years have seen the increasingly frequent replacement by many industries of traditional statistical methods with the so-called machine learning, or artificial intelligence. As for machine learning, the algorithm is not predefined but it improves automatically through experience. The mathematical model is built by artificial intelligence on the basis of sample data (training data) without direct human involvement. Machine learning can bring extremely effective outcomes when dealing with a large volume of data and variables.
Artificial intelligence methods are not yet widely used in credit risk modelling. This is entailed by two barriers. First, it has not yet been broadly tested whether these methods are better for assessing credit risk than traditional methods. There are many theoretical studies, yet the availability of data on the results of their application in the operation of financial institutions is poor. The second challenge is that they are considered “black box” methods. It is a common belief that it is impossible to clearly trace how the artificial intelligence algorithm works and from which it follows that a given loan/credit application is accepted or rejected. This is not entirely true, as the use of XAI (eXplainable Artificial Intelligence) methods allows for looking inside the model. Nevertheless, the conviction that the operation of such a model is unclear to the user creates a barrier for managers in the risk departments of financial institutions, on whom supervisory and personal data protection regulations impose that each applicant should be given an explanation why the model has assessed the customer’s credit risk as high or low.
Biuro Informacji Kredytowej (BIK) has decided to verify both of the above barriers, using the huge potential of the credit and loan customer base at its disposal. For this purpose, a study was conducted which engaged Przemysław Biecek, PhD (specialising in eXplainable Artificial Intelligence Human Oriented ML Evidence-Based Machine Learning) and Marcin Chlebus, PhD, and Dominik Ogonowski – founders of Data Juice Lab sp. z o.o. specialising in the construction of machine learning models and business models, mainly for the financial sector.
The task involved testing the predictive effectiveness of credit scoring built based on the machine learning method and comparing it to the effects obtained with traditional statistical methods that have been used for years. Additionally, XAI (eXplainable Artificial Intelligence) methods were used to “look” into the model created with the machine learning method and to examine what customer characteristics/parameters in this case have an impact on either the approval or rejection of a loan/credit application.
Currently, BIK has in place a database containing over 159 million account history entries (data on loans/credits and borrowers) for 24 million individual customers and over 1 million small- and medium-d enterprises. The BIK database is fed with data on borrowers by all banks in Poland and by the majority of loan institutions. Such a huge database renders it possible to obtain a high-quality statistical model that determines the likelihood of whether a customer repays a loan/credit or not.
The experiment was carried out for BIK data originating from loan institutions for which the banking sector regulations do not limit the use of machine learning in the assessment of credit risk (as opposed to banks where such a limitation is imposed on). 5 million observations (loan accounts) and 1,729 variables characterising them were used for the period from 11/10/2018 to 20/05/2019. It was assumed that the problem in repayment of a loan/credit is most often revealed in the sixth month from its granting (the so-called credit peak), hence the observations were limited to loans granted until 30/11/2018. The data was fully anonymised.
The Gini coefficient (prediction correctness measure) was used to assess the models and its minimum acceptable value of 0.6 (based on expert knowledge) was adopted. The scores obtained with all models met the above criterion. For the random forest method (one of the machine learning methods), the highest value of this indicator was obtained compared to the other models. For the training set, this value was close to 1 (this is the maximum possible value of this coefficient), whereas for the test set it was 0.76. A score of “1” for the training set is the result of the extreme depth of the “trees” in the “random forest”. Another machine learning method, namely the Gradient Boosting Machine technique, also produced very high Gini coefficients reaching 0.68 for both datasets. Among the classic statistical methods, the best results were obtained by logistic regression with WOE transformation – a coefficient of 0.65 for both datasets. Other standard statistical models were also tested, nonetheless, their effects were weaker than the above.
The second goal of the experiment involved “looking” into the model created with the machine learning method and examining the customer’s features/characteristics that affect either the approval or rejection of their loan/credit application. XAI (eXplainable Artificial Intelligence) methods are used for this purpose. They allow for both identifying the most important features and performing a “what-if” analysis, as well as for gaining a better insight into the decision-making process of the algorithm. Among the various methods and tools, a few of them deserve more attention paid to. The simplest method is to determine the permutation feature importance (PFI). Upon multiple and random shuffling of each characteristic and using them in the model, we shall observe an average decrease in the accuracy of the model. The features that have recorded the highest average decrease are of greatest significance, namely they have the greatest impact on whether the customer will be classified as a bearer of high or low credit risk. Customer features used in the model were kept secret for security reasons but it can be said that the three key ones turned out to exert an impact of 7%, 3% and 2% on the final credit decision, respectively.
In conclusion, the conducted study showed that machine learning methods may exert a significant impact on boosting the effectiveness of determining credit risk, and thus increasing the profits to be generated by a financial institution.
The growth in the predictive quality of the model by as little as 1% is of immense importance in the scale of business – it may bring down financial benefits of PLN 3.5 million per year (assuming the value of loans granted of PLN 10 billion per year). The usefulness of machine learning models increases especially when the number of variables is very large – for classic models, this may pose a challenge, whereas it does not make any problem whatsoever for machine learning methods.
At the same time, the addition of XAI gives the opportunity to comprehend how the model works, which allows identification of key risk factors, the direction of their impact on the assessment, monotonicity of impact, as well as positive qualitative verification of the model for the purposes of validation, audit and regulators (the possibility of explaining the reasons for the credit decision required by Article 70a of the Banking Law, as the implementation of the GDPR in national regulations).
We are on the eve of a revolution in the areas of credit risk in the financial industry. The highly competitive environment, and at the same time growing unpredictability of the economic situation, mean the necessity to further increase the effectiveness of the already “stretched” credit risk assessment models. Classic methods may no longer allow for additional growth, hence the winners will be those institutions that first use innovative methods, such as machine learning. The use of such solutions will speed up the processes and increase their accuracy, nonetheless, the final analysis of the scores and the decision should rest with experts supervising the models, who will properly interpret the results when dealing with such unpredictable situations as, for example, an outbreak of a global pandemic.
Piotr Wojewnik, PhD – Director of Scoring Development at BIK S.A., responsible for the development of models and credit risk analysis in the banking and non-banking sectors. He has a dozen or so years of experience in building forecasting models in the areas of credit risk, sales and treasury.
Sławomir Grzybek – director of the Business Intelligence Department at BIK S.A., his responsibility rests with management in the area of portfolio analyses and scoring models. Over 20 years of experience in designing, implementing and monitoring credit policies, scoring and rating models in the banking and e-commerce industry.
Przemysław Biecek, PhD, Eng. – professor at the Warsaw University of Technology and the University of Warsaw, dean for the development of the Faculty of Mathematics and Information Sciences of the Warsaw University of Technology. Research interests: responsible artificial intelligence.
Dominik Ogonowski – founder and president of Data Juice Lab sp. z o.o., specialising in building business and statistical models for the financial sector. He has over 20 years of experience in leading banks in the areas of credit risk, product management, process optimisation and marketing.
Marcin Chlebus, PhD – founder and vice-president of Data Juice Lab sp. z o.o., specialising in building business and statistical models for the financial sector. Data Science programme manager at the University of Warsaw, lecturer specialising in econometrics, statistics and machine learning.