信用卡评分模型构建数据

信用卡评分模型构建

背景描述

目前拥有用户年龄,信用卡和个人信贷额度的总余额,过去2年借款人逾期,预测借款人是否会预期次数,月收入,负债比率,家属等信息,通过这些信息建立风控,信用评分模型,预测预测借款人是否会预期。

一.导入数据和库

导入相应库

import datetime
import pandas as pd
import numpy as np
import os
import seaborn as sns
import re
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
sns.set(style="darkgrid")
plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False  # 用来正常显示

/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)


time: 1.57 s

导入数据

train = pd.read_csv('/home/kesci/input/kaggle4396/cs-training.csv')
test = pd.read_csv('/home/kesci/input/kaggle4396/cs-test.csv')
time: 248 ms
train.drop(columns=["Unnamed: 0"], inplace=True)
test.drop(columns=["Unnamed: 0"], inplace=True)
time: 9.83 ms

数据维度

train.shape
(150000, 11)



time: 3.97 ms

有无缺失值

train.isnull().sum()
SeriousDlqin2yrs                            0
RevolvingUtilizationOfUnsecuredLines        0
age                                         0
NumberOfTime30-59DaysPastDueNotWorse        0
DebtRatio                                   0
MonthlyIncome                           29731
NumberOfOpenCreditLinesAndLoans             0
NumberOfTimes90DaysLate                     0
NumberRealEstateLoansOrLines                0
NumberOfTime60-89DaysPastDueNotWorse        0
NumberOfDependents                       3924
dtype: int64



time: 30.9 ms

有无重复值

train.duplicated().sum()
609



time: 61.2 ms

整体分布

train.info()

RangeIndex: 150000 entries, 0 to 149999
Data columns (total 11 columns):
SeriousDlqin2yrs                        150000 non-null int64
RevolvingUtilizationOfUnsecuredLines    150000 non-null float64
age                                     150000 non-null int64
NumberOfTime30-59DaysPastDueNotWorse    150000 non-null int64
DebtRatio                               150000 non-null float64
MonthlyIncome                           120269 non-null float64
NumberOfOpenCreditLinesAndLoans         150000 non-null int64
NumberOfTimes90DaysLate                 150000 non-null int64
NumberRealEstateLoansOrLines            150000 non-null int64
NumberOfTime60-89DaysPastDueNotWorse    150000 non-null int64
NumberOfDependents                      146076 non-null float64
dtypes: float64(4), int64(7)
memory usage: 12.6 MB
time: 32.2 ms

看下数据

train.head()
SeriousDlqin2yrs RevolvingUtilizationOfUnsecuredLines age NumberOfTime30-59DaysPastDueNotWorse DebtRatio MonthlyIncome NumberOfOpenCreditLinesAndLoans NumberOfTimes90DaysLate NumberRealEstateLoansOrLines NumberOfTime60-89DaysPastDueNotWorse NumberOfDependents
0 1 0.766127 45 2 0.802982 9120.0 13 0 6 0 2.0
1 0 0.957151 40 0 0.121876 2600.0 4 0 0 0 1.0
2 0 0.658180 38 1 0.085113 3042.0 2 1 0 0 0.0
3 0 0.233810 30 0 0.036050 3300.0 5 0 0 0 0.0
4 0 0.907239 49 1 0.024926 63588.0 7 0 1 0 0.0
time: 12.1 ms
cor=train.corr()
fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(cor, xticklabels=cor.columns, yticklabels=cor.columns, annot=True, ax=ax);
信用卡评分模型构建数据_第1张图片
time: 1.2 s

二.数据前处理

train_clean = train.copy()
time: 6.31 ms

去重

train_clean.drop_duplicates(inplace=True)
time: 198 ms

缺失值处理

通过众数填充缺失值

def fill_na(df):
    na_list = [i for i in df.isnull().sum().index if df.isnull().sum()[i] > 0]
    for n in na_list:
        train_fillna = train_clean[n][train_clean[n].isna() == False]
        train_clean[n].fillna(train_fillna.median(), inplace=True)
time: 1.13 ms
fill_na(train_clean)
train_clean.isnull().sum()
SeriousDlqin2yrs                        0
RevolvingUtilizationOfUnsecuredLines    0
age                                     0
NumberOfTime30-59DaysPastDueNotWorse    0
DebtRatio                               0
MonthlyIncome                           0
NumberOfOpenCreditLinesAndLoans         0
NumberOfTimes90DaysLate                 0
NumberRealEstateLoansOrLines            0
NumberOfTime60-89DaysPastDueNotWorse    0
NumberOfDependents                      0
dtype: int64



time: 360 ms

贷款人的年龄分布

plt.figure(figsize=(16, 6))
sns.distplot(train_clean["age"], color = "black");
信用卡评分模型构建数据_第2张图片
time: 665 ms
train_clean["age_label"] = pd.cut(train_clean["age"], np.arange(20, 110, 10))
time: 9.82 ms
# 重新分组,合并样本太少或者违约率过于接近的分组
bins = [0, 30, 40, 50, 60, 70, 110]
labels = ['0-29', '30-39', '40-49', '50-59', '60-69', '70+']
train_clean['age_grouped'] = pd.cut(train_clean['age'], bins, right=0, labels=labels)
train_clean.drop(columns="age", inplace=True)
time: 13.2 ms
def plot_age(col, fun):
    data = pd.concat([train_clean[col], train_clean["age_label"]], axis = 1)
    if fun == "s":
        df = data.groupby("age_label")[col].sum()
    elif fun == "m":
        df = data.groupby("age_label")[col].mean()

    df.plot(kind="bar", figsize=(16, 6))

time: 1.14 ms

贷款人的信用卡和个人信贷额度的总余额和年龄关系

plot_age("RevolvingUtilizationOfUnsecuredLines", "m");
信用卡评分模型构建数据_第3张图片
time: 294 ms
# RevolvingUtilizationOfUnsecuredLines项离散化
bins = [0, 0.15, 0.30, 0.45, 0.60, 0.75, 0.90, 1.05,
        train_clean['RevolvingUtilizationOfUnsecuredLines'].max()*1.05]
labels = [
    '0-0.15',
    '0.15-0.30',
    '0.30-0.45',
    '0.45-0.60',
    '0.60-0.75',
    '0.75-0.90',
    '0.90-1.05',
    '1.05+']

train_clean['ru_grouped'] = pd.cut(train_clean['RevolvingUtilizationOfUnsecuredLines'],
                                   bins, right=0, labels=labels)
train_clean.drop(columns='ru_grouped', inplace=True)
time: 12.8 ms

负债率是否有异常值

plt.figure(figsize=(16, 6))
sns.distplot(train_clean['DebtRatio'].apply(np.log1p), color="r");
信用卡评分模型构建数据_第4张图片
time: 748 ms
train_clean["dr_log"] = train_clean["DebtRatio"].apply(np.log1p)
train_clean.drop(columns="DebtRatio", inplace=True)
plot_age("dr_log", "m")
信用卡评分模型构建数据_第5张图片
time: 452 ms
# 对NumberOfOpenCreditLinesAndLoans分组
bins = [0, 2, 4, 6, 10, 14,
       train_clean['NumberOfOpenCreditLinesAndLoans'].max()*1.05]
labels = ['0-1', '2-3', '4-5', '6-9', '10-13', '14+']
train_clean['num_oc_grouped'] = pd.cut(train_clean['NumberOfOpenCreditLinesAndLoans'], \
                                       bins, right=0, labels=labels)
train_clean.drop(columns='NumberOfOpenCreditLinesAndLoans', inplace=True)
time: 13.2 ms
# 对NumberOfDependents分组
bins = [0, 1, 2, 4, 
       train_clean['NumberOfDependents'].max()*1.05]
labels = ['0', '1', '2-3', '4+']
train_clean['num_dep_grouped'] = pd.cut(train_clean['NumberOfDependents'], \
                                        bins, right=0, labels=labels)
train_clean.drop(columns='num_dep_grouped', inplace=True)
time: 10.6 ms

过去2年,借款人逾期次数

PastDueNotWorse = [i for i in train_clean.columns if "NumberOfTime" in i]
plot_age(PastDueNotWorse, fun = "m")
信用卡评分模型构建数据_第6张图片
time: 566 ms
cor = train_clean[PastDueNotWorse].corr()
cor
NumberOfTime30-59DaysPastDueNotWorse NumberOfTimes90DaysLate NumberOfTime60-89DaysPastDueNotWorse
NumberOfTime30-59DaysPastDueNotWorse 1.000000 0.980489 0.984535
NumberOfTimes90DaysLate 0.980489 1.000000 0.991409
NumberOfTime60-89DaysPastDueNotWorse 0.984535 0.991409 1.000000
time: 12 ms
#30-59天逾期,60-89天逾期,90天以上逾期周三列相关系数较高,保留一列建模
train_clean.drop(columns=["NumberOfTime30-59DaysPastDueNotWorse", \
                          "NumberOfTime60-89DaysPastDueNotWorse"], inplace=True)
time: 3.13 ms

借款人是否逾期和年龄关系

plt.figure(figsize=(16, 6))
sns.countplot(data=train_clean, x="age_label", hue="SeriousDlqin2yrs");
信用卡评分模型构建数据_第7张图片
time: 376 ms

逾期的分布

train_clean['income_log'] = (train_clean['MonthlyIncome']/10000).apply(np.log1p)
train_clean.drop(columns=['MonthlyIncome'], inplace=True)
time: 8.29 ms

三.训练模型

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import f1_score, roc_auc_score, confusion_matrix, accuracy_score, fbeta_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
time: 776 µs

首先使用逻辑回归建模

attributes = train_clean.columns.drop(['SeriousDlqin2yrs'])
sol = ['SeriousDlqin2yrs']
df = pd.get_dummies(train_clean, drop_first=True)
X = pd.get_dummies(train_clean[attributes], drop_first=True)
y = train_clean[sol]

X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.25, shuffle=True)

time: 77.2 ms
def plot_est_score(Range):
    score_list = pd.DataFrame({}, index=np.arange(
        Range.shape[0]+1), columns=[["train_score", "test_score"]])
    for i in Range:
        lg = LogisticRegression(C=i, solver='lbfgs')
        pred = lg.fit(X_train, y_train).predict(X_valid)
        ascore = lg.score(X_train, y_train)
        fscore = lg.score(X_valid, y_valid)
        score_list.loc[i-1, "train_score"] = ascore
        score_list.loc[i-1, "test_score"] = fscore
    score_list.dropna(inplace=True)
    score_max = score_list.max()
    score_max_index = score_list[score_list == score_list.max()].dropna().index[0]
    print(
        "nC={}\nmax =\n{}".format(
            score_max_index,
            score_max))
    score_list.plot(figsize=(16, 4))
time: 1.95 ms
plot_est_score(np.array([0.01, 0.03, 0.1, 0.3, 1, 3, 10]))
nC=-0.99
max =
train_score    0.933534
test_score     0.932660
dtype: float64
信用卡评分模型构建数据_第8张图片
time: 32.7 s

这里开始网络调参

params_LR = {'C': [0.01, 0.03, 0.1, 0.3, 1, 3, 10],
            'solver': ['lbfgs', 'liblinear']}
gs = GridSearchCV(LogisticRegression(max_iter=1000), 
                  param_grid = params_LR,
                  scoring = 'f1',
                  cv=5).fit(X_train, y_train)
gs.best_params_
{'C': 0.01, 'solver': 'lbfgs'}



time: 7min 41s
model_lr = LogisticRegression(C=gs.best_params_['C'], solver=gs.best_params_['solver']).fit(X_train, y_train)
print('train Score: %.6f' % model_lr.score(X_train, y_train))
print('valid Score: %.6f' %  model_lr.score(X_valid, y_valid))
train Score: 0.933534
valid Score: 0.932660
time: 4.63 s

使用XGBOOST建模预测

import xgboost as xgb
params_xgb = {'max_depth': 6,
              'eta': 1,
              'silent': 1,
              'objective': 'binary:logistic',
              'eval_matric': 'f1'}

# 借用GaryMulder的参数:
params_xgb2 = {'max_depth': 5,
               'eta': 0.025,
               'silent':1,
               'objective': 'binary:logistic',
               'eval_matric': 'auc',
               'minchildweight': 10.0,
               'maxdeltastep': 1.8,
               'colsample_bytree': 0.4,
               'subsample': 0.8,
               'gamma': 0.65,
               'numboostround' : 391}
time: 91.4 ms
regex = re.compile(r"\[|\]|<", re.IGNORECASE)
feature_name = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in X.columns]
time: 1.16 ms
dtrain = xgb.DMatrix(X_train, y_train, feature_names=feature_name)
dvalid = xgb.DMatrix(X_valid, y_valid, feature_names=feature_name)
evals = [(dtrain, 'train'), (dvalid, 'valid')]
model_xgb = xgb.train(params_xgb2, dtrain, 1000, evals, early_stopping_rounds=100);
[0]	train-error:0.066403	valid-error:0.068973
Multiple eval metrics have been passed: 'valid-error' will be used for early stopping.

Will train until valid-error hasn't improved in 100 rounds.
[1]	train-error:0.065118	valid-error:0.066858
[2]	train-error:0.065725	valid-error:0.067259
[3]	train-error:0.066742	valid-error:0.067714
[4]	train-error:0.066751	valid-error:0.067741
[5]	train-error:0.066751	valid-error:0.067741
[6]	train-error:0.066751	valid-error:0.067741
[7]	train-error:0.066751	valid-error:0.067741
[8]	train-error:0.066733	valid-error:0.067768
[9]	train-error:0.066751	valid-error:0.067741
[10]	train-error:0.066751	valid-error:0.067741
[11]	train-error:0.066742	valid-error:0.067741
[12]	train-error:0.066671	valid-error:0.067741
[13]	train-error:0.066742	valid-error:0.067741
[14]	train-error:0.066751	valid-error:0.067741
[15]	train-error:0.066751	valid-error:0.067741
[16]	train-error:0.066751	valid-error:0.067741
[17]	train-error:0.066751	valid-error:0.067741
[18]	train-error:0.066751	valid-error:0.067741
[19]	train-error:0.066751	valid-error:0.067741
[20]	train-error:0.066751	valid-error:0.067741
[21]	train-error:0.066751	valid-error:0.067741
[22]	train-error:0.066751	valid-error:0.067741
[23]	train-error:0.066751	valid-error:0.067741
[24]	train-error:0.066751	valid-error:0.067741
[25]	train-error:0.066751	valid-error:0.067741
[26]	train-error:0.066751	valid-error:0.067741
[27]	train-error:0.066751	valid-error:0.067741
[28]	train-error:0.066751	valid-error:0.067741
[29]	train-error:0.066742	valid-error:0.067741
[30]	train-error:0.066733	valid-error:0.067741
[31]	train-error:0.066671	valid-error:0.067741
[32]	train-error:0.066635	valid-error:0.067714
[33]	train-error:0.066689	valid-error:0.067741
[34]	train-error:0.066742	valid-error:0.067741
[35]	train-error:0.066742	valid-error:0.067741
[36]	train-error:0.066742	valid-error:0.067741
[37]	train-error:0.066698	valid-error:0.067741
[38]	train-error:0.066635	valid-error:0.067741
[39]	train-error:0.066599	valid-error:0.067634
[40]	train-error:0.066617	valid-error:0.067688
[41]	train-error:0.066608	valid-error:0.067634
[42]	train-error:0.066635	valid-error:0.067714
[43]	train-error:0.066653	valid-error:0.067741
[44]	train-error:0.066689	valid-error:0.067741
[45]	train-error:0.066644	valid-error:0.067714
[46]	train-error:0.066689	valid-error:0.067741
[47]	train-error:0.066644	valid-error:0.067714
[48]	train-error:0.066635	valid-error:0.067714
[49]	train-error:0.066617	valid-error:0.067661
[50]	train-error:0.066582	valid-error:0.067607
[51]	train-error:0.06651	valid-error:0.067527
[52]	train-error:0.066457	valid-error:0.067527
[53]	train-error:0.066341	valid-error:0.067473
[54]	train-error:0.066323	valid-error:0.067447
[55]	train-error:0.06626	valid-error:0.067473
[56]	train-error:0.066332	valid-error:0.067473
[57]	train-error:0.06626	valid-error:0.0675
[58]	train-error:0.066189	valid-error:0.067447
[59]	train-error:0.066135	valid-error:0.067447
[60]	train-error:0.066189	valid-error:0.067473
[61]	train-error:0.066144	valid-error:0.067447
[62]	train-error:0.066117	valid-error:0.067366
[63]	train-error:0.066082	valid-error:0.067313
[64]	train-error:0.066028	valid-error:0.067206
[65]	train-error:0.0661	valid-error:0.06734
[66]	train-error:0.066001	valid-error:0.067259
[67]	train-error:0.065984	valid-error:0.067152
[68]	train-error:0.065903	valid-error:0.067018
[69]	train-error:0.065796	valid-error:0.066938
[70]	train-error:0.065876	valid-error:0.066965
[71]	train-error:0.065939	valid-error:0.067045
[72]	train-error:0.065984	valid-error:0.067152
[73]	train-error:0.065894	valid-error:0.066992
[74]	train-error:0.065805	valid-error:0.066965
[75]	train-error:0.065868	valid-error:0.067018
[76]	train-error:0.065912	valid-error:0.067018
[77]	train-error:0.065796	valid-error:0.066965
[78]	train-error:0.065662	valid-error:0.066965
[79]	train-error:0.065725	valid-error:0.066992
[80]	train-error:0.065796	valid-error:0.067018
[81]	train-error:0.065778	valid-error:0.066938
[82]	train-error:0.065752	valid-error:0.066831
[83]	train-error:0.065832	valid-error:0.066938
[84]	train-error:0.065725	valid-error:0.066911
[85]	train-error:0.065609	valid-error:0.066884
[86]	train-error:0.065689	valid-error:0.066938
[87]	train-error:0.065653	valid-error:0.066858
[88]	train-error:0.065618	valid-error:0.066751
[89]	train-error:0.065627	valid-error:0.066751
[90]	train-error:0.065591	valid-error:0.066697
[91]	train-error:0.065636	valid-error:0.066777
[92]	train-error:0.065636	valid-error:0.066804
[93]	train-error:0.065582	valid-error:0.066751
[94]	train-error:0.065582	valid-error:0.06667
[95]	train-error:0.065618	valid-error:0.066751
[96]	train-error:0.065573	valid-error:0.066617
[97]	train-error:0.065484	valid-error:0.066563
[98]	train-error:0.065395	valid-error:0.06659
[99]	train-error:0.065359	valid-error:0.066563
[100]	train-error:0.065421	valid-error:0.066563
[101]	train-error:0.065484	valid-error:0.066617
[102]	train-error:0.065368	valid-error:0.066563
[103]	train-error:0.06527	valid-error:0.066349
[104]	train-error:0.065225	valid-error:0.066269
[105]	train-error:0.065073	valid-error:0.066242
[106]	train-error:0.064984	valid-error:0.066188
[107]	train-error:0.064913	valid-error:0.066162
[108]	train-error:0.064797	valid-error:0.065921
[109]	train-error:0.064868	valid-error:0.066001
[110]	train-error:0.064761	valid-error:0.065813
[111]	train-error:0.064805	valid-error:0.06584
[112]	train-error:0.064743	valid-error:0.065867
[113]	train-error:0.064672	valid-error:0.065813
[114]	train-error:0.064582	valid-error:0.065653
[115]	train-error:0.064475	valid-error:0.065572
[116]	train-error:0.06444	valid-error:0.065626
[117]	train-error:0.06444	valid-error:0.065439
[118]	train-error:0.064404	valid-error:0.065385
[119]	train-error:0.064359	valid-error:0.065385
[120]	train-error:0.06435	valid-error:0.065412
[121]	train-error:0.064368	valid-error:0.065385
[122]	train-error:0.064359	valid-error:0.065465
[123]	train-error:0.06435	valid-error:0.065412
[124]	train-error:0.064359	valid-error:0.065385
[125]	train-error:0.064377	valid-error:0.065546
[126]	train-error:0.064332	valid-error:0.065385
[127]	train-error:0.064341	valid-error:0.065465
[128]	train-error:0.064288	valid-error:0.065465
[129]	train-error:0.064288	valid-error:0.065492
[130]	train-error:0.064216	valid-error:0.065439
[131]	train-error:0.064252	valid-error:0.065385
[132]	train-error:0.064181	valid-error:0.065358
[133]	train-error:0.064047	valid-error:0.065385
[134]	train-error:0.064083	valid-error:0.065358
[135]	train-error:0.064127	valid-error:0.065385
[136]	train-error:0.064091	valid-error:0.065385
[137]	train-error:0.064047	valid-error:0.065412
[138]	train-error:0.06402	valid-error:0.065358
[139]	train-error:0.064002	valid-error:0.065331
[140]	train-error:0.06402	valid-error:0.065358
[141]	train-error:0.063931	valid-error:0.065412
[142]	train-error:0.063993	valid-error:0.065385
[143]	train-error:0.06385	valid-error:0.065358
[144]	train-error:0.063859	valid-error:0.065358
[145]	train-error:0.063868	valid-error:0.065305
[146]	train-error:0.063833	valid-error:0.065251
[147]	train-error:0.063779	valid-error:0.065251
[148]	train-error:0.063681	valid-error:0.065198
[149]	train-error:0.063645	valid-error:0.065198
[150]	train-error:0.06361	valid-error:0.065171
[151]	train-error:0.06361	valid-error:0.06509
[152]	train-error:0.06361	valid-error:0.065144
[153]	train-error:0.063565	valid-error:0.06509
[154]	train-error:0.063547	valid-error:0.065117
[155]	train-error:0.063529	valid-error:0.065117
[156]	train-error:0.063467	valid-error:0.065064
[157]	train-error:0.06352	valid-error:0.065171
[158]	train-error:0.063529	valid-error:0.065224
[159]	train-error:0.063422	valid-error:0.06509
[160]	train-error:0.063413	valid-error:0.065117
[161]	train-error:0.063476	valid-error:0.065171
[162]	train-error:0.063395	valid-error:0.065144
[163]	train-error:0.063422	valid-error:0.065144
[164]	train-error:0.063395	valid-error:0.065144
[165]	train-error:0.06336	valid-error:0.065144
[166]	train-error:0.063369	valid-error:0.065171
[167]	train-error:0.063324	valid-error:0.065117
[168]	train-error:0.06336	valid-error:0.065064
[169]	train-error:0.063315	valid-error:0.065064
[170]	train-error:0.063333	valid-error:0.065037
[171]	train-error:0.063315	valid-error:0.06509
[172]	train-error:0.063297	valid-error:0.065117
[173]	train-error:0.063315	valid-error:0.065144
[174]	train-error:0.063306	valid-error:0.065117
[175]	train-error:0.063253	valid-error:0.065117
[176]	train-error:0.063279	valid-error:0.065117
[177]	train-error:0.063324	valid-error:0.065117
[178]	train-error:0.063288	valid-error:0.06509
[179]	train-error:0.063297	valid-error:0.065198
[180]	train-error:0.063288	valid-error:0.06509
[181]	train-error:0.063297	valid-error:0.065117
[182]	train-error:0.063288	valid-error:0.065144
[183]	train-error:0.06327	valid-error:0.065037
[184]	train-error:0.063217	valid-error:0.064876
[185]	train-error:0.063244	valid-error:0.06493
[186]	train-error:0.063181	valid-error:0.064876
[187]	train-error:0.063181	valid-error:0.064876
[188]	train-error:0.063145	valid-error:0.06485
[189]	train-error:0.063128	valid-error:0.06485
[190]	train-error:0.06319	valid-error:0.064876
[191]	train-error:0.063172	valid-error:0.064796
[192]	train-error:0.063154	valid-error:0.064823
[193]	train-error:0.063181	valid-error:0.06485
[194]	train-error:0.063172	valid-error:0.06485
[195]	train-error:0.063181	valid-error:0.064823
[196]	train-error:0.06319	valid-error:0.064823
[197]	train-error:0.063128	valid-error:0.06485
[198]	train-error:0.063092	valid-error:0.06485
[199]	train-error:0.063029	valid-error:0.064823
[200]	train-error:0.063065	valid-error:0.064823
[201]	train-error:0.06302	valid-error:0.06485
[202]	train-error:0.063012	valid-error:0.064823
[203]	train-error:0.062976	valid-error:0.06485
[204]	train-error:0.063012	valid-error:0.06485
[205]	train-error:0.062958	valid-error:0.064957
[206]	train-error:0.062931	valid-error:0.064903
[207]	train-error:0.062922	valid-error:0.064903
[208]	train-error:0.06294	valid-error:0.06493
[209]	train-error:0.062904	valid-error:0.064876
[210]	train-error:0.062869	valid-error:0.064903
[211]	train-error:0.062895	valid-error:0.06493
[212]	train-error:0.062869	valid-error:0.064957
[213]	train-error:0.062895	valid-error:0.06493
[214]	train-error:0.062851	valid-error:0.06493
[215]	train-error:0.062851	valid-error:0.06493
[216]	train-error:0.062824	valid-error:0.064876
[217]	train-error:0.062806	valid-error:0.064796
[218]	train-error:0.062753	valid-error:0.064796
[219]	train-error:0.062762	valid-error:0.064823
[220]	train-error:0.062735	valid-error:0.064769
[221]	train-error:0.062699	valid-error:0.064823
[222]	train-error:0.062717	valid-error:0.06485
[223]	train-error:0.06269	valid-error:0.064742
[224]	train-error:0.06269	valid-error:0.064742
[225]	train-error:0.062672	valid-error:0.064769
[226]	train-error:0.062646	valid-error:0.064769
[227]	train-error:0.062646	valid-error:0.064796
[228]	train-error:0.062637	valid-error:0.064769
[229]	train-error:0.062646	valid-error:0.064769
[230]	train-error:0.062646	valid-error:0.064769
[231]	train-error:0.062646	valid-error:0.064742
[232]	train-error:0.062655	valid-error:0.064742
[233]	train-error:0.062646	valid-error:0.064769
[234]	train-error:0.062655	valid-error:0.064769
[235]	train-error:0.062663	valid-error:0.064796
[236]	train-error:0.062637	valid-error:0.064796
[237]	train-error:0.06261	valid-error:0.064823
[238]	train-error:0.062619	valid-error:0.06485
[239]	train-error:0.062583	valid-error:0.064823
[240]	train-error:0.062574	valid-error:0.064716
[241]	train-error:0.062547	valid-error:0.064769
[242]	train-error:0.062574	valid-error:0.064742
[243]	train-error:0.062565	valid-error:0.064689
[244]	train-error:0.062583	valid-error:0.064689
[245]	train-error:0.062574	valid-error:0.064689
[246]	train-error:0.062565	valid-error:0.064716
[247]	train-error:0.062574	valid-error:0.064716
[248]	train-error:0.062538	valid-error:0.064689
[249]	train-error:0.062521	valid-error:0.064716
[250]	train-error:0.06253	valid-error:0.064662
[251]	train-error:0.06253	valid-error:0.064689
[252]	train-error:0.062476	valid-error:0.064662
[253]	train-error:0.062476	valid-error:0.064716
[254]	train-error:0.062503	valid-error:0.064716
[255]	train-error:0.062503	valid-error:0.064716
[256]	train-error:0.062521	valid-error:0.064635
[257]	train-error:0.062476	valid-error:0.064635
[258]	train-error:0.062485	valid-error:0.064635
[259]	train-error:0.062503	valid-error:0.064609
[260]	train-error:0.062449	valid-error:0.064475
[261]	train-error:0.062414	valid-error:0.064421
[262]	train-error:0.062414	valid-error:0.064421
[263]	train-error:0.062396	valid-error:0.064421
[264]	train-error:0.062378	valid-error:0.064448
[265]	train-error:0.062351	valid-error:0.064475
[266]	train-error:0.062342	valid-error:0.064448
[267]	train-error:0.062342	valid-error:0.064528
[268]	train-error:0.062333	valid-error:0.064528
[269]	train-error:0.062324	valid-error:0.064528
[270]	train-error:0.062306	valid-error:0.064501
[271]	train-error:0.062298	valid-error:0.064475
[272]	train-error:0.062306	valid-error:0.064528
[273]	train-error:0.06228	valid-error:0.064555
[274]	train-error:0.062289	valid-error:0.064609
[275]	train-error:0.062253	valid-error:0.064662
[276]	train-error:0.062271	valid-error:0.064609
[277]	train-error:0.062253	valid-error:0.064609
[278]	train-error:0.062235	valid-error:0.064609
[279]	train-error:0.062217	valid-error:0.064501
[280]	train-error:0.062226	valid-error:0.064555
[281]	train-error:0.062235	valid-error:0.064501
[282]	train-error:0.062226	valid-error:0.064448
[283]	train-error:0.062181	valid-error:0.064394
[284]	train-error:0.062199	valid-error:0.064448
[285]	train-error:0.062173	valid-error:0.064448
[286]	train-error:0.062146	valid-error:0.064421
[287]	train-error:0.062137	valid-error:0.064394
[288]	train-error:0.062155	valid-error:0.064394
[289]	train-error:0.062173	valid-error:0.064394
[290]	train-error:0.062164	valid-error:0.064421
[291]	train-error:0.062137	valid-error:0.064501
[292]	train-error:0.062146	valid-error:0.064555
[293]	train-error:0.062137	valid-error:0.064501
[294]	train-error:0.06211	valid-error:0.064528
[295]	train-error:0.062101	valid-error:0.064528
[296]	train-error:0.062092	valid-error:0.064475
[297]	train-error:0.062092	valid-error:0.064475
[298]	train-error:0.062083	valid-error:0.064448
[299]	train-error:0.062092	valid-error:0.064394
[300]	train-error:0.06203	valid-error:0.064528
[301]	train-error:0.061994	valid-error:0.064501
[302]	train-error:0.061994	valid-error:0.064475
[303]	train-error:0.062012	valid-error:0.064475
[304]	train-error:0.061985	valid-error:0.064475
[305]	train-error:0.062003	valid-error:0.064475
[306]	train-error:0.061941	valid-error:0.064421
[307]	train-error:0.061932	valid-error:0.064421
[308]	train-error:0.061923	valid-error:0.064421
[309]	train-error:0.061878	valid-error:0.064421
[310]	train-error:0.061869	valid-error:0.064421
[311]	train-error:0.061869	valid-error:0.064394
[312]	train-error:0.061878	valid-error:0.064368
[313]	train-error:0.061869	valid-error:0.064394
[314]	train-error:0.061869	valid-error:0.064421
[315]	train-error:0.061878	valid-error:0.064475
[316]	train-error:0.061851	valid-error:0.064475
[317]	train-error:0.061878	valid-error:0.064448
[318]	train-error:0.061869	valid-error:0.064394
[319]	train-error:0.061833	valid-error:0.064394
[320]	train-error:0.061789	valid-error:0.064314
[321]	train-error:0.061807	valid-error:0.064314
[322]	train-error:0.061807	valid-error:0.064314
[323]	train-error:0.061789	valid-error:0.064287
[324]	train-error:0.061789	valid-error:0.064314
[325]	train-error:0.061789	valid-error:0.064287
[326]	train-error:0.061798	valid-error:0.06426
[327]	train-error:0.061798	valid-error:0.06426
[328]	train-error:0.061798	valid-error:0.06426
[329]	train-error:0.061798	valid-error:0.064234
[330]	train-error:0.061789	valid-error:0.064234
[331]	train-error:0.061789	valid-error:0.06426
[332]	train-error:0.061798	valid-error:0.064314
[333]	train-error:0.061807	valid-error:0.064314
[334]	train-error:0.061816	valid-error:0.064341
[335]	train-error:0.061816	valid-error:0.064314
[336]	train-error:0.061824	valid-error:0.064287
[337]	train-error:0.061824	valid-error:0.064314
[338]	train-error:0.061833	valid-error:0.064314
[339]	train-error:0.061816	valid-error:0.064314
[340]	train-error:0.061816	valid-error:0.064234
[341]	train-error:0.061789	valid-error:0.06426
[342]	train-error:0.061771	valid-error:0.06426
[343]	train-error:0.06178	valid-error:0.064314
[344]	train-error:0.061798	valid-error:0.064287
[345]	train-error:0.061798	valid-error:0.06418
[346]	train-error:0.061744	valid-error:0.064207
[347]	train-error:0.061762	valid-error:0.064153
[348]	train-error:0.061762	valid-error:0.064153
[349]	train-error:0.061762	valid-error:0.064153
[350]	train-error:0.061771	valid-error:0.064234
[351]	train-error:0.061762	valid-error:0.064234
[352]	train-error:0.061744	valid-error:0.064234
[353]	train-error:0.06178	valid-error:0.064234
[354]	train-error:0.061744	valid-error:0.064234
[355]	train-error:0.061744	valid-error:0.06426
[356]	train-error:0.061753	valid-error:0.064287
[357]	train-error:0.061735	valid-error:0.064234
[358]	train-error:0.061744	valid-error:0.06426
[359]	train-error:0.061726	valid-error:0.06426
[360]	train-error:0.061691	valid-error:0.06426
[361]	train-error:0.0617	valid-error:0.06426
[362]	train-error:0.061691	valid-error:0.064287
[363]	train-error:0.061691	valid-error:0.064234
[364]	train-error:0.061691	valid-error:0.064234
[365]	train-error:0.061664	valid-error:0.064287
[366]	train-error:0.061673	valid-error:0.064287
[367]	train-error:0.061646	valid-error:0.064314
[368]	train-error:0.061646	valid-error:0.064314
[369]	train-error:0.061655	valid-error:0.064287
[370]	train-error:0.061646	valid-error:0.064314
[371]	train-error:0.061673	valid-error:0.064314
[372]	train-error:0.061682	valid-error:0.064314
[373]	train-error:0.061664	valid-error:0.064341
[374]	train-error:0.061682	valid-error:0.064368
[375]	train-error:0.061655	valid-error:0.064368
[376]	train-error:0.061637	valid-error:0.064368
[377]	train-error:0.061619	valid-error:0.064341
[378]	train-error:0.06161	valid-error:0.064368
[379]	train-error:0.061628	valid-error:0.064368
[380]	train-error:0.061619	valid-error:0.064368
[381]	train-error:0.061619	valid-error:0.064368
[382]	train-error:0.061637	valid-error:0.064341
[383]	train-error:0.061592	valid-error:0.064341
[384]	train-error:0.061592	valid-error:0.064341
[385]	train-error:0.061575	valid-error:0.06426
[386]	train-error:0.061584	valid-error:0.064287
[387]	train-error:0.061584	valid-error:0.064287
[388]	train-error:0.061592	valid-error:0.064234
[389]	train-error:0.061575	valid-error:0.06426
[390]	train-error:0.061539	valid-error:0.064234
[391]	train-error:0.061521	valid-error:0.06426
[392]	train-error:0.061521	valid-error:0.064234
[393]	train-error:0.06153	valid-error:0.064207
[394]	train-error:0.061539	valid-error:0.064207
[395]	train-error:0.061521	valid-error:0.064234
[396]	train-error:0.061485	valid-error:0.064287
[397]	train-error:0.061485	valid-error:0.064287
[398]	train-error:0.061485	valid-error:0.064287
[399]	train-error:0.061494	valid-error:0.064287
[400]	train-error:0.061485	valid-error:0.064287
[401]	train-error:0.061503	valid-error:0.064287
[402]	train-error:0.061494	valid-error:0.064287
[403]	train-error:0.061494	valid-error:0.064314
[404]	train-error:0.061512	valid-error:0.064314
[405]	train-error:0.061521	valid-error:0.064314
[406]	train-error:0.061503	valid-error:0.064341
[407]	train-error:0.061494	valid-error:0.064368
[408]	train-error:0.061476	valid-error:0.064368
[409]	train-error:0.061476	valid-error:0.064341
[410]	train-error:0.061476	valid-error:0.064341
[411]	train-error:0.061459	valid-error:0.064314
[412]	train-error:0.061423	valid-error:0.06426
[413]	train-error:0.061432	valid-error:0.064207
[414]	train-error:0.06145	valid-error:0.064207
[415]	train-error:0.061467	valid-error:0.064207
[416]	train-error:0.061459	valid-error:0.064207
[417]	train-error:0.061467	valid-error:0.064234
[418]	train-error:0.061459	valid-error:0.064234
[419]	train-error:0.061423	valid-error:0.064234
[420]	train-error:0.061432	valid-error:0.064234
[421]	train-error:0.06145	valid-error:0.06426
[422]	train-error:0.061441	valid-error:0.06426
[423]	train-error:0.061423	valid-error:0.06426
[424]	train-error:0.061441	valid-error:0.06426
[425]	train-error:0.061432	valid-error:0.064234
[426]	train-error:0.061432	valid-error:0.064234
[427]	train-error:0.061414	valid-error:0.064234
[428]	train-error:0.061432	valid-error:0.064234
[429]	train-error:0.061396	valid-error:0.064234
[430]	train-error:0.061423	valid-error:0.064234
[431]	train-error:0.061405	valid-error:0.06426
[432]	train-error:0.06136	valid-error:0.06426
[433]	train-error:0.061369	valid-error:0.06426
[434]	train-error:0.061396	valid-error:0.06426
[435]	train-error:0.061405	valid-error:0.06426
[436]	train-error:0.061405	valid-error:0.06426
[437]	train-error:0.061378	valid-error:0.064287
[438]	train-error:0.061369	valid-error:0.064314
[439]	train-error:0.061378	valid-error:0.064314
[440]	train-error:0.06136	valid-error:0.064314
[441]	train-error:0.061343	valid-error:0.064314
[442]	train-error:0.061325	valid-error:0.064287
[443]	train-error:0.061325	valid-error:0.064341
[444]	train-error:0.061307	valid-error:0.064314
[445]	train-error:0.061325	valid-error:0.064314
[446]	train-error:0.061325	valid-error:0.064314
[447]	train-error:0.061307	valid-error:0.064341
Stopping. Best iteration:
[347]	train-error:0.061762	valid-error:0.064153

time: 1min 59s

保存模型

model_xgb.dump_model('xgb_v1')
time: 206 ms

信用卡评估各特征重要性

xgb.plot_importance(model_xgb);
信用卡评分模型构建数据_第9张图片
time: 559 ms

XGBOOST的树的可视化

xgb.to_graphviz(model_xgb)
time: 159 ms

预测预测借款人是否会预期

dtest = xgb.DMatrix(X_valid, feature_names=feature_name)
y_test = model_xgb.predict(dtest)
entry = pd.DataFrame()
entry['ID'] = np.arange(1, len(y_test)+1)
entry['Probability'] = y_test
time: 1.17 s
entry.to_csv('pred.csv', header=True, index=False)
time: 258 ms

四.总结

  1. 贷款人群年龄基本呈正态分布,30-40的贷款金额最多
  2. 20-30的人群是信用卡逾期的高发人群
  3. 信用卡和个人信贷额度的总余额,负债率,月收入是贷款人是否会逾期的最重要的三个因素
  4. 由于数据比较稀疏,在建模前酒醒了离散化处理,有利于建立强壮的模型

你可能感兴趣的:(数据分析,数据分析,数据挖掘)