NATE: Non-pArameTric approach for Explainable credit scoring on imbalanced class
2024

NATE: A New Approach for Explainable Credit Scoring

Sample size: 150000 publication Evidence: high

Author Information

Author(s): Han Seongil, Jung Haemin

Primary Institution: School of Computing & Mathematical Sciences, University of London, Birkbeck College, London, United Kingdom

Hypothesis

The proposed NATE models will enhance classification performance by capturing non-linearity in imbalanced datasets while providing clear reasons for credit scoring predictions.

Conclusion

NATE significantly outperforms traditional logistic regression in credit risk classification, improving predictive performance and interpretability.

Supporting Evidence

  • NATE improves AUC by 19.33%, MCC by 71.56%, and F1 Score by 85.33% compared to logistic regression.
  • NATE enhances interpretability by providing insights into feature contributions.
  • SMOTE oversampling outperforms NearMiss undersampling in improving classification performance.

Takeaway

This study created a new method to help banks better understand who might pay back loans by using smart computer models that are easier to explain.

Methodology

The study used a dataset of 150,000 samples, applying oversampling and undersampling techniques to balance classes and employing tree-based ensemble models for classification.

Potential Biases

The use of SMOTE may introduce overlapping data points, potentially leading to overfitting.

Limitations

The study is limited to a single dataset, which may affect the generalizability of the findings.

Participant Demographics

The dataset includes demographic information, payment behavior, and delinquency data for credit applicants.

Digital Object Identifier (DOI)

10.1371/journal.pone.0316454

Want to read the original?

Access the complete publication on the publisher's website

View Original Publication