NATE: A New Approach for Explainable Credit Scoring
Author Information
Author(s): Han Seongil, Jung Haemin
Primary Institution: School of Computing & Mathematical Sciences, University of London, Birkbeck College, London, United Kingdom
Hypothesis
The proposed NATE models will enhance classification performance by capturing non-linearity in imbalanced datasets while providing clear reasons for credit scoring predictions.
Conclusion
NATE significantly outperforms traditional logistic regression in credit risk classification, improving predictive performance and interpretability.
Supporting Evidence
- NATE improves AUC by 19.33%, MCC by 71.56%, and F1 Score by 85.33% compared to logistic regression.
- NATE enhances interpretability by providing insights into feature contributions.
- SMOTE oversampling outperforms NearMiss undersampling in improving classification performance.
Takeaway
This study created a new method to help banks better understand who might pay back loans by using smart computer models that are easier to explain.
Methodology
The study used a dataset of 150,000 samples, applying oversampling and undersampling techniques to balance classes and employing tree-based ensemble models for classification.
Potential Biases
The use of SMOTE may introduce overlapping data points, potentially leading to overfitting.
Limitations
The study is limited to a single dataset, which may affect the generalizability of the findings.
Participant Demographics
The dataset includes demographic information, payment behavior, and delinquency data for credit applicants.
Digital Object Identifier (DOI)
Want to read the original?
Access the complete publication on the publisher's website