Loan Default Prediction
- juliamtw20
- Feb 8, 2024
- 1 min read
In this project, my goal is to predict whether the customers will default on their loan.
Why is default predicting important?
Loan defaults are instances where borrowers are unable to meet their obligations as specified in the lending agreement. Such defaults not only affect the borrowers but also the originating financial institution. Both parties face a loss of the intended benefits of the loan. Borrowers risk losing the asset tied to the loan, incurring late fees, facing litigation, damaging their credit scores, and may find it challenging to secure future loans. On the other hand, financial institutions experience direct and adverse impacts on their profitability. From a financial institution's perspective, loan defaults disrupt the expected and necessary cash flows that are vital for maintaining liquidity.
Data Description
The data set consists of 255,347 records, each corresponding to a specific loan. The dataset can be found on kaggle. The dataset includes the following variables: loan ID, age, income, loan amount, credit score, months employed, number of credit lines, interest rate, loan term, debt-to-income ratio, education level, employment type, marital status, mortgage, dependents, loan purpose, co-signer, and default, the target variable.
Modeling
I built 4 models for comparison: naïve model (for comparison), logistic regression, random forest, and logistic regression based on Lasso.
Lasso model had a true positive rate of 59.3%. If this was the only metric I am using, this model would appear highly effective, compared to the others.
Comments