Predict Loan Defaulter Using Machine Learning

Business Problem

Build a machine learning model that predicts if someone might be a defaulter or a non-defaulter.

Data source: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)

Abstract: This dataset classifies people described by a set of attributes as good or bad credit risks. Comes in two formats (one all numeric). Also comes with a cost matrix

Data Set Characteristics:  

Multivariate

Number of Instances:

1000

Area:

Financial

Attribute Characteristics:

Categorical, Integer

Number of Attributes:

20

Date Donated

1994-11-17

Associated Tasks:

Classification

Missing Values?

N/A

Number of Web Hits:

677271

Independant variables

  • checking_balance
  • months_loan_duration
  • credit_history
  • purpose
  • existing_loans_count
  • phone

Independant variables

  • amount
  • savings_balance
  • employment_duration
  • percent_of_income
  • job

Independant variables

  • years_at_residence
  • age
  • other_credit
  • housing
  • dependents

Data Science Libreries

  • pandas
  • numpy
  • sklearn
  • matplotlib
  • seaborn

Data Science Concepts

  • Exploratory Data Analysis
  • Regularization/Reducing Over fitting
  • Bagging & Boosting
  • Random Forest Classifier
  • Decision Tree Classifier

What I did

  • See hows data looks, is it a problem that can be answered using machine learning?
  • Under exploratory data analysis, see data's nature
  • Create a DECISION TREE MODEL and visualize tree at http://webgraphviz.com
  • As always DECISION TREE overfit, please find accuracy as below
    • Train data accuracy: 100%
    • Test data accuracy: 69.33%
    • So I tried regularization to minimize overfit, please find accuracy as below
      • Train data accuracy: 75.28%
      • Test data accuracy: 74.33%
    • Lets see which are top five important features
    • Imp
      checking_balance 0.492510
      months_loan_duration 0.169806
      credit_history 0.166109
      savings_balance 0.064467
      purpose_business 0.051129
    Disicion Tree Classifier Confusion Matrix's Heat Map
    Model Score: 74.33%
    >Image
    Random Forest Classifier (With Bagging) Confusion Matrix's Heat Map
    Model Score: 77.33%
    >Image
    Random Forest Classifier (With ADA Boosting) Confusion Matrix's Heat Map
    Model Score: 74.00%
    >Image
    Random Forest Classifier (With XG Boosting) Confusion Matrix's Heat Map
    Model Score: 74.00%
    >Image
    Random Forest Classifier Confusion Matrix's Heat Map
    Model Score: 77.66%
    >Image

    Let me see code

    © Copyright MyPortfolio. All Rights Reserved

    Designed by BootstrapMade