social science

Project: Bank Loan Default Prediction
Problem Statement:
Many banks believed lending to individuals is the risk-free given they are better placed with credit scores and sometimes the loans are backed by collateral. But recently the banking system has witnessed an increase in the loan default i.e. the borrower is not able to pay back the instalment on time. These loan defaults directly impact the revenues of a banking system.
Now a days, banks are scrutinizing each loan application to identify potential loan default cases so that they can predict which client is going to default the loan repayment and at which step.
Based upon the given data from a bank, build a model to predict default loan that will help the bank to take required actions.
The first set of project notes must describe the data and align it to the analytical problem, the Project addresses. It should also include the rationale of the project (why the problem is important and what value it will add to the company). It should be of 5-7 pages and shall be simple and straightforward depicting the objective of the project, description of data, EDA report and insights driven from EDA.
Below are the guidelines for the Project Notes-I:
 
1) Introduction
 
a) Defining problem statement
 
b) Need of the study/project
 
c) Understanding business/social   opportunity
 
 
2)Data Report
 
a) Understanding how data was   collected in terms of time, frequency and methodology
 
b) Visual inspection of data   (rows, columns, descriptive details)
 
c) Understanding of attributes   (variable info, renaming if required)
 
 
3) Exploratory data analysis
 
 
a) Univariate analysis   (distribution and spread for every continuous attribute, distribution of data   in categories for categorical ones)
 
b) Bivariate analysis   (relationship between different variables , correlations)
 
a) Removal of unwanted variables
 
b) Missing Value treatment
 
d) Outlier treatment
 
e) Variable transformation (if   applicable)
 
f) Addition of new variables
 
 
4) Insights from EDA
 
a) Is the data unbalanced? If so,   what can be done?
 
b) Any insights using   clustering  (if applicable)
 
c) Any other Insights
Below are the guidelines for the Project Notes-II:
 
1). Review Parameters
 
1).Model building and   interpretation
 
a. Build various models
 
b. Test Various performance   metrics (Confusion matrix, AOC, RMSE, etc.) as applicable.
 
 
2). Model Tuning
 
a. Ensemble modelling, wherever   applicable
 
b. Any other model tuning   measures(if applicable)
 
c. Interpretation from the best   model
Please note the following:
You have to submit 2 files:
· Business Report: In this, you should cover all the topics given in the rubric in a sequential manner. It should include a detailed explanation of the approach used, insights, inferences, all outputs of codes like graphs, tables, etc. Your report should not be filled with codes.
· R code file: This is a must and will be used for reference while evaluating