A comparative study of five classification algorithms on the German Credit Dataset — evaluating predictive power, calibration quality, and business-relevant risk metrics across 1,000 loan applicants.
| Model | Accuracy | AUC-ROC | F1 Score | Precision | Recall | KS Stat | CV AUC |
|---|
German Credit Dataset — 1,000 applicants, 20 features covering checking account status, loan duration, credit history, purpose, savings, employment tenure, personal status, and more. Target: binary good/bad credit risk.
Ordinal encoding for categorical variables. StandardScaler applied within pipeline for scale-sensitive models (LR, SVM, GBM, KNN). Class imbalance handled via class_weight='balanced' in RF.
80/20 stratified train-test split. 5-fold stratified cross-validation for AUC. KS statistic computed as max(TPR - FPR) on ROC curve — a key metric in credit scoring contexts.
LR (C=0.1), RF (200 trees, max_depth=8), Gradient Boosting (200 estimators, lr=0.05), SVM (RBF kernel, calibrated), KNN (k=11). All wrapped in sklearn Pipelines.
Best AUC (0.703) and KS statistic (0.378). Ensemble tree methods handle the non-linear interactions in credit features well without heavy tuning.
Nearly identical F1 (0.664 vs 0.664) but slightly lower AUC (0.697). With hyperparameter tuning, GBM could close the gap or surpass RF.
Solid interpretable baseline with AUC 0.681 — competitive, easily explainable to business stakeholders, and GDPR-friendly for credit decisions.
KS of 0.155 is well below acceptable credit scoring threshold. Distance-based methods struggle with the mixed-type features in this dataset.