信用卡评分模型构建
背景描述
目前拥有用户年龄,信用卡和个人信贷额度的总余额,过去2年借款人逾期,预测借款人是否会预期次数,月收入,负债比率,家属等信息,通过这些信息建立风控,信用评分模型,预测预测借款人是否会预期。
一.导入数据和库
导入相应库
import datetime
import pandas as pd
import numpy as np
import os
import seaborn as sns
import re
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('always')
warnings.filterwarnings('ignore')
sns.set(style="darkgrid")
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/opt/conda/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
time: 1.57 s
导入数据
train = pd.read_csv('/home/kesci/input/kaggle4396/cs-training.csv')
test = pd.read_csv('/home/kesci/input/kaggle4396/cs-test.csv')
time: 248 ms
train.drop(columns=["Unnamed: 0"], inplace=True)
test.drop(columns=["Unnamed: 0"], inplace=True)
time: 9.83 ms
数据维度
train.shape
(150000, 11)
time: 3.97 ms
有无缺失值
train.isnull().sum()
SeriousDlqin2yrs 0
RevolvingUtilizationOfUnsecuredLines 0
age 0
NumberOfTime30-59DaysPastDueNotWorse 0
DebtRatio 0
MonthlyIncome 29731
NumberOfOpenCreditLinesAndLoans 0
NumberOfTimes90DaysLate 0
NumberRealEstateLoansOrLines 0
NumberOfTime60-89DaysPastDueNotWorse 0
NumberOfDependents 3924
dtype: int64
time: 30.9 ms
有无重复值
train.duplicated().sum()
609
time: 61.2 ms
整体分布
train.info()
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 11 columns):
SeriousDlqin2yrs 150000 non-null int64
RevolvingUtilizationOfUnsecuredLines 150000 non-null float64
age 150000 non-null int64
NumberOfTime30-59DaysPastDueNotWorse 150000 non-null int64
DebtRatio 150000 non-null float64
MonthlyIncome 120269 non-null float64
NumberOfOpenCreditLinesAndLoans 150000 non-null int64
NumberOfTimes90DaysLate 150000 non-null int64
NumberRealEstateLoansOrLines 150000 non-null int64
NumberOfTime60-89DaysPastDueNotWorse 150000 non-null int64
NumberOfDependents 146076 non-null float64
dtypes: float64(4), int64(7)
memory usage: 12.6 MB
time: 32.2 ms
看下数据
train.head()
|
SeriousDlqin2yrs |
RevolvingUtilizationOfUnsecuredLines |
age |
NumberOfTime30-59DaysPastDueNotWorse |
DebtRatio |
MonthlyIncome |
NumberOfOpenCreditLinesAndLoans |
NumberOfTimes90DaysLate |
NumberRealEstateLoansOrLines |
NumberOfTime60-89DaysPastDueNotWorse |
NumberOfDependents |
0 |
1 |
0.766127 |
45 |
2 |
0.802982 |
9120.0 |
13 |
0 |
6 |
0 |
2.0 |
1 |
0 |
0.957151 |
40 |
0 |
0.121876 |
2600.0 |
4 |
0 |
0 |
0 |
1.0 |
2 |
0 |
0.658180 |
38 |
1 |
0.085113 |
3042.0 |
2 |
1 |
0 |
0 |
0.0 |
3 |
0 |
0.233810 |
30 |
0 |
0.036050 |
3300.0 |
5 |
0 |
0 |
0 |
0.0 |
4 |
0 |
0.907239 |
49 |
1 |
0.024926 |
63588.0 |
7 |
0 |
1 |
0 |
0.0 |
time: 12.1 ms
cor=train.corr()
fig, ax = plt.subplots(figsize=(10, 10))
sns.heatmap(cor, xticklabels=cor.columns, yticklabels=cor.columns, annot=True, ax=ax);
time: 1.2 s
二.数据前处理
train_clean = train.copy()
time: 6.31 ms
去重
train_clean.drop_duplicates(inplace=True)
time: 198 ms
缺失值处理
通过众数填充缺失值
def fill_na(df):
na_list = [i for i in df.isnull().sum().index if df.isnull().sum()[i] > 0]
for n in na_list:
train_fillna = train_clean[n][train_clean[n].isna() == False]
train_clean[n].fillna(train_fillna.median(), inplace=True)
time: 1.13 ms
fill_na(train_clean)
train_clean.isnull().sum()
SeriousDlqin2yrs 0
RevolvingUtilizationOfUnsecuredLines 0
age 0
NumberOfTime30-59DaysPastDueNotWorse 0
DebtRatio 0
MonthlyIncome 0
NumberOfOpenCreditLinesAndLoans 0
NumberOfTimes90DaysLate 0
NumberRealEstateLoansOrLines 0
NumberOfTime60-89DaysPastDueNotWorse 0
NumberOfDependents 0
dtype: int64
time: 360 ms
贷款人的年龄分布
plt.figure(figsize=(16, 6))
sns.distplot(train_clean["age"], color = "black");
time: 665 ms
train_clean["age_label"] = pd.cut(train_clean["age"], np.arange(20, 110, 10))
time: 9.82 ms
bins = [0, 30, 40, 50, 60, 70, 110]
labels = ['0-29', '30-39', '40-49', '50-59', '60-69', '70+']
train_clean['age_grouped'] = pd.cut(train_clean['age'], bins, right=0, labels=labels)
train_clean.drop(columns="age", inplace=True)
time: 13.2 ms
def plot_age(col, fun):
data = pd.concat([train_clean[col], train_clean["age_label"]], axis = 1)
if fun == "s":
df = data.groupby("age_label")[col].sum()
elif fun == "m":
df = data.groupby("age_label")[col].mean()
df.plot(kind="bar", figsize=(16, 6))
time: 1.14 ms
贷款人的信用卡和个人信贷额度的总余额和年龄关系
plot_age("RevolvingUtilizationOfUnsecuredLines", "m");
time: 294 ms
bins = [0, 0.15, 0.30, 0.45, 0.60, 0.75, 0.90, 1.05,
train_clean['RevolvingUtilizationOfUnsecuredLines'].max()*1.05]
labels = [
'0-0.15',
'0.15-0.30',
'0.30-0.45',
'0.45-0.60',
'0.60-0.75',
'0.75-0.90',
'0.90-1.05',
'1.05+']
train_clean['ru_grouped'] = pd.cut(train_clean['RevolvingUtilizationOfUnsecuredLines'],
bins, right=0, labels=labels)
train_clean.drop(columns='ru_grouped', inplace=True)
time: 12.8 ms
负债率是否有异常值
plt.figure(figsize=(16, 6))
sns.distplot(train_clean['DebtRatio'].apply(np.log1p), color="r");
time: 748 ms
train_clean["dr_log"] = train_clean["DebtRatio"].apply(np.log1p)
train_clean.drop(columns="DebtRatio", inplace=True)
plot_age("dr_log", "m")
time: 452 ms
bins = [0, 2, 4, 6, 10, 14,
train_clean['NumberOfOpenCreditLinesAndLoans'].max()*1.05]
labels = ['0-1', '2-3', '4-5', '6-9', '10-13', '14+']
train_clean['num_oc_grouped'] = pd.cut(train_clean['NumberOfOpenCreditLinesAndLoans'], \
bins, right=0, labels=labels)
train_clean.drop(columns='NumberOfOpenCreditLinesAndLoans', inplace=True)
time: 13.2 ms
bins = [0, 1, 2, 4,
train_clean['NumberOfDependents'].max()*1.05]
labels = ['0', '1', '2-3', '4+']
train_clean['num_dep_grouped'] = pd.cut(train_clean['NumberOfDependents'], \
bins, right=0, labels=labels)
train_clean.drop(columns='num_dep_grouped', inplace=True)
time: 10.6 ms
过去2年,借款人逾期次数
PastDueNotWorse = [i for i in train_clean.columns if "NumberOfTime" in i]
plot_age(PastDueNotWorse, fun = "m")
time: 566 ms
cor = train_clean[PastDueNotWorse].corr()
cor
|
NumberOfTime30-59DaysPastDueNotWorse |
NumberOfTimes90DaysLate |
NumberOfTime60-89DaysPastDueNotWorse |
NumberOfTime30-59DaysPastDueNotWorse |
1.000000 |
0.980489 |
0.984535 |
NumberOfTimes90DaysLate |
0.980489 |
1.000000 |
0.991409 |
NumberOfTime60-89DaysPastDueNotWorse |
0.984535 |
0.991409 |
1.000000 |
time: 12 ms
train_clean.drop(columns=["NumberOfTime30-59DaysPastDueNotWorse", \
"NumberOfTime60-89DaysPastDueNotWorse"], inplace=True)
time: 3.13 ms
借款人是否逾期和年龄关系
plt.figure(figsize=(16, 6))
sns.countplot(data=train_clean, x="age_label", hue="SeriousDlqin2yrs");
time: 376 ms
逾期的分布
train_clean['income_log'] = (train_clean['MonthlyIncome']/10000).apply(np.log1p)
train_clean.drop(columns=['MonthlyIncome'], inplace=True)
time: 8.29 ms
三.训练模型
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import f1_score, roc_auc_score, confusion_matrix, accuracy_score, fbeta_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
time: 776 µs
首先使用逻辑回归建模
attributes = train_clean.columns.drop(['SeriousDlqin2yrs'])
sol = ['SeriousDlqin2yrs']
df = pd.get_dummies(train_clean, drop_first=True)
X = pd.get_dummies(train_clean[attributes], drop_first=True)
y = train_clean[sol]
X_train, X_valid, y_train, y_valid = train_test_split(
X, y, test_size=0.25, shuffle=True)
time: 77.2 ms
def plot_est_score(Range):
score_list = pd.DataFrame({}, index=np.arange(
Range.shape[0]+1), columns=[["train_score", "test_score"]])
for i in Range:
lg = LogisticRegression(C=i, solver='lbfgs')
pred = lg.fit(X_train, y_train).predict(X_valid)
ascore = lg.score(X_train, y_train)
fscore = lg.score(X_valid, y_valid)
score_list.loc[i-1, "train_score"] = ascore
score_list.loc[i-1, "test_score"] = fscore
score_list.dropna(inplace=True)
score_max = score_list.max()
score_max_index = score_list[score_list == score_list.max()].dropna().index[0]
print(
"nC={}\nmax =\n{}".format(
score_max_index,
score_max))
score_list.plot(figsize=(16, 4))
time: 1.95 ms
plot_est_score(np.array([0.01, 0.03, 0.1, 0.3, 1, 3, 10]))
nC=-0.99
max =
train_score 0.933534
test_score 0.932660
dtype: float64
time: 32.7 s
这里开始网络调参
params_LR = {'C': [0.01, 0.03, 0.1, 0.3, 1, 3, 10],
'solver': ['lbfgs', 'liblinear']}
gs = GridSearchCV(LogisticRegression(max_iter=1000),
param_grid = params_LR,
scoring = 'f1',
cv=5).fit(X_train, y_train)
gs.best_params_
{'C': 0.01, 'solver': 'lbfgs'}
time: 7min 41s
model_lr = LogisticRegression(C=gs.best_params_['C'], solver=gs.best_params_['solver']).fit(X_train, y_train)
print('train Score: %.6f' % model_lr.score(X_train, y_train))
print('valid Score: %.6f' % model_lr.score(X_valid, y_valid))
train Score: 0.933534
valid Score: 0.932660
time: 4.63 s
使用XGBOOST建模预测
import xgboost as xgb
params_xgb = {'max_depth': 6,
'eta': 1,
'silent': 1,
'objective': 'binary:logistic',
'eval_matric': 'f1'}
params_xgb2 = {'max_depth': 5,
'eta': 0.025,
'silent':1,
'objective': 'binary:logistic',
'eval_matric': 'auc',
'minchildweight': 10.0,
'maxdeltastep': 1.8,
'colsample_bytree': 0.4,
'subsample': 0.8,
'gamma': 0.65,
'numboostround' : 391}
time: 91.4 ms
regex = re.compile(r"\[|\]|<", re.IGNORECASE)
feature_name = [regex.sub("_", col) if any(x in str(col) for x in set(('[', ']', '<'))) else col for col in X.columns]
time: 1.16 ms
dtrain = xgb.DMatrix(X_train, y_train, feature_names=feature_name)
dvalid = xgb.DMatrix(X_valid, y_valid, feature_names=feature_name)
evals = [(dtrain, 'train'), (dvalid, 'valid')]
model_xgb = xgb.train(params_xgb2, dtrain, 1000, evals, early_stopping_rounds=100);
[0] train-error:0.066403 valid-error:0.068973
Multiple eval metrics have been passed: 'valid-error' will be used for early stopping.
Will train until valid-error hasn't improved in 100 rounds.
[1] train-error:0.065118 valid-error:0.066858
[2] train-error:0.065725 valid-error:0.067259
[3] train-error:0.066742 valid-error:0.067714
[4] train-error:0.066751 valid-error:0.067741
[5] train-error:0.066751 valid-error:0.067741
[6] train-error:0.066751 valid-error:0.067741
[7] train-error:0.066751 valid-error:0.067741
[8] train-error:0.066733 valid-error:0.067768
[9] train-error:0.066751 valid-error:0.067741
[10] train-error:0.066751 valid-error:0.067741
[11] train-error:0.066742 valid-error:0.067741
[12] train-error:0.066671 valid-error:0.067741
[13] train-error:0.066742 valid-error:0.067741
[14] train-error:0.066751 valid-error:0.067741
[15] train-error:0.066751 valid-error:0.067741
[16] train-error:0.066751 valid-error:0.067741
[17] train-error:0.066751 valid-error:0.067741
[18] train-error:0.066751 valid-error:0.067741
[19] train-error:0.066751 valid-error:0.067741
[20] train-error:0.066751 valid-error:0.067741
[21] train-error:0.066751 valid-error:0.067741
[22] train-error:0.066751 valid-error:0.067741
[23] train-error:0.066751 valid-error:0.067741
[24] train-error:0.066751 valid-error:0.067741
[25] train-error:0.066751 valid-error:0.067741
[26] train-error:0.066751 valid-error:0.067741
[27] train-error:0.066751 valid-error:0.067741
[28] train-error:0.066751 valid-error:0.067741
[29] train-error:0.066742 valid-error:0.067741
[30] train-error:0.066733 valid-error:0.067741
[31] train-error:0.066671 valid-error:0.067741
[32] train-error:0.066635 valid-error:0.067714
[33] train-error:0.066689 valid-error:0.067741
[34] train-error:0.066742 valid-error:0.067741
[35] train-error:0.066742 valid-error:0.067741
[36] train-error:0.066742 valid-error:0.067741
[37] train-error:0.066698 valid-error:0.067741
[38] train-error:0.066635 valid-error:0.067741
[39] train-error:0.066599 valid-error:0.067634
[40] train-error:0.066617 valid-error:0.067688
[41] train-error:0.066608 valid-error:0.067634
[42] train-error:0.066635 valid-error:0.067714
[43] train-error:0.066653 valid-error:0.067741
[44] train-error:0.066689 valid-error:0.067741
[45] train-error:0.066644 valid-error:0.067714
[46] train-error:0.066689 valid-error:0.067741
[47] train-error:0.066644 valid-error:0.067714
[48] train-error:0.066635 valid-error:0.067714
[49] train-error:0.066617 valid-error:0.067661
[50] train-error:0.066582 valid-error:0.067607
[51] train-error:0.06651 valid-error:0.067527
[52] train-error:0.066457 valid-error:0.067527
[53] train-error:0.066341 valid-error:0.067473
[54] train-error:0.066323 valid-error:0.067447
[55] train-error:0.06626 valid-error:0.067473
[56] train-error:0.066332 valid-error:0.067473
[57] train-error:0.06626 valid-error:0.0675
[58] train-error:0.066189 valid-error:0.067447
[59] train-error:0.066135 valid-error:0.067447
[60] train-error:0.066189 valid-error:0.067473
[61] train-error:0.066144 valid-error:0.067447
[62] train-error:0.066117 valid-error:0.067366
[63] train-error:0.066082 valid-error:0.067313
[64] train-error:0.066028 valid-error:0.067206
[65] train-error:0.0661 valid-error:0.06734
[66] train-error:0.066001 valid-error:0.067259
[67] train-error:0.065984 valid-error:0.067152
[68] train-error:0.065903 valid-error:0.067018
[69] train-error:0.065796 valid-error:0.066938
[70] train-error:0.065876 valid-error:0.066965
[71] train-error:0.065939 valid-error:0.067045
[72] train-error:0.065984 valid-error:0.067152
[73] train-error:0.065894 valid-error:0.066992
[74] train-error:0.065805 valid-error:0.066965
[75] train-error:0.065868 valid-error:0.067018
[76] train-error:0.065912 valid-error:0.067018
[77] train-error:0.065796 valid-error:0.066965
[78] train-error:0.065662 valid-error:0.066965
[79] train-error:0.065725 valid-error:0.066992
[80] train-error:0.065796 valid-error:0.067018
[81] train-error:0.065778 valid-error:0.066938
[82] train-error:0.065752 valid-error:0.066831
[83] train-error:0.065832 valid-error:0.066938
[84] train-error:0.065725 valid-error:0.066911
[85] train-error:0.065609 valid-error:0.066884
[86] train-error:0.065689 valid-error:0.066938
[87] train-error:0.065653 valid-error:0.066858
[88] train-error:0.065618 valid-error:0.066751
[89] train-error:0.065627 valid-error:0.066751
[90] train-error:0.065591 valid-error:0.066697
[91] train-error:0.065636 valid-error:0.066777
[92] train-error:0.065636 valid-error:0.066804
[93] train-error:0.065582 valid-error:0.066751
[94] train-error:0.065582 valid-error:0.06667
[95] train-error:0.065618 valid-error:0.066751
[96] train-error:0.065573 valid-error:0.066617
[97] train-error:0.065484 valid-error:0.066563
[98] train-error:0.065395 valid-error:0.06659
[99] train-error:0.065359 valid-error:0.066563
[100] train-error:0.065421 valid-error:0.066563
[101] train-error:0.065484 valid-error:0.066617
[102] train-error:0.065368 valid-error:0.066563
[103] train-error:0.06527 valid-error:0.066349
[104] train-error:0.065225 valid-error:0.066269
[105] train-error:0.065073 valid-error:0.066242
[106] train-error:0.064984 valid-error:0.066188
[107] train-error:0.064913 valid-error:0.066162
[108] train-error:0.064797 valid-error:0.065921
[109] train-error:0.064868 valid-error:0.066001
[110] train-error:0.064761 valid-error:0.065813
[111] train-error:0.064805 valid-error:0.06584
[112] train-error:0.064743 valid-error:0.065867
[113] train-error:0.064672 valid-error:0.065813
[114] train-error:0.064582 valid-error:0.065653
[115] train-error:0.064475 valid-error:0.065572
[116] train-error:0.06444 valid-error:0.065626
[117] train-error:0.06444 valid-error:0.065439
[118] train-error:0.064404 valid-error:0.065385
[119] train-error:0.064359 valid-error:0.065385
[120] train-error:0.06435 valid-error:0.065412
[121] train-error:0.064368 valid-error:0.065385
[122] train-error:0.064359 valid-error:0.065465
[123] train-error:0.06435 valid-error:0.065412
[124] train-error:0.064359 valid-error:0.065385
[125] train-error:0.064377 valid-error:0.065546
[126] train-error:0.064332 valid-error:0.065385
[127] train-error:0.064341 valid-error:0.065465
[128] train-error:0.064288 valid-error:0.065465
[129] train-error:0.064288 valid-error:0.065492
[130] train-error:0.064216 valid-error:0.065439
[131] train-error:0.064252 valid-error:0.065385
[132] train-error:0.064181 valid-error:0.065358
[133] train-error:0.064047 valid-error:0.065385
[134] train-error:0.064083 valid-error:0.065358
[135] train-error:0.064127 valid-error:0.065385
[136] train-error:0.064091 valid-error:0.065385
[137] train-error:0.064047 valid-error:0.065412
[138] train-error:0.06402 valid-error:0.065358
[139] train-error:0.064002 valid-error:0.065331
[140] train-error:0.06402 valid-error:0.065358
[141] train-error:0.063931 valid-error:0.065412
[142] train-error:0.063993 valid-error:0.065385
[143] train-error:0.06385 valid-error:0.065358
[144] train-error:0.063859 valid-error:0.065358
[145] train-error:0.063868 valid-error:0.065305
[146] train-error:0.063833 valid-error:0.065251
[147] train-error:0.063779 valid-error:0.065251
[148] train-error:0.063681 valid-error:0.065198
[149] train-error:0.063645 valid-error:0.065198
[150] train-error:0.06361 valid-error:0.065171
[151] train-error:0.06361 valid-error:0.06509
[152] train-error:0.06361 valid-error:0.065144
[153] train-error:0.063565 valid-error:0.06509
[154] train-error:0.063547 valid-error:0.065117
[155] train-error:0.063529 valid-error:0.065117
[156] train-error:0.063467 valid-error:0.065064
[157] train-error:0.06352 valid-error:0.065171
[158] train-error:0.063529 valid-error:0.065224
[159] train-error:0.063422 valid-error:0.06509
[160] train-error:0.063413 valid-error:0.065117
[161] train-error:0.063476 valid-error:0.065171
[162] train-error:0.063395 valid-error:0.065144
[163] train-error:0.063422 valid-error:0.065144
[164] train-error:0.063395 valid-error:0.065144
[165] train-error:0.06336 valid-error:0.065144
[166] train-error:0.063369 valid-error:0.065171
[167] train-error:0.063324 valid-error:0.065117
[168] train-error:0.06336 valid-error:0.065064
[169] train-error:0.063315 valid-error:0.065064
[170] train-error:0.063333 valid-error:0.065037
[171] train-error:0.063315 valid-error:0.06509
[172] train-error:0.063297 valid-error:0.065117
[173] train-error:0.063315 valid-error:0.065144
[174] train-error:0.063306 valid-error:0.065117
[175] train-error:0.063253 valid-error:0.065117
[176] train-error:0.063279 valid-error:0.065117
[177] train-error:0.063324 valid-error:0.065117
[178] train-error:0.063288 valid-error:0.06509
[179] train-error:0.063297 valid-error:0.065198
[180] train-error:0.063288 valid-error:0.06509
[181] train-error:0.063297 valid-error:0.065117
[182] train-error:0.063288 valid-error:0.065144
[183] train-error:0.06327 valid-error:0.065037
[184] train-error:0.063217 valid-error:0.064876
[185] train-error:0.063244 valid-error:0.06493
[186] train-error:0.063181 valid-error:0.064876
[187] train-error:0.063181 valid-error:0.064876
[188] train-error:0.063145 valid-error:0.06485
[189] train-error:0.063128 valid-error:0.06485
[190] train-error:0.06319 valid-error:0.064876
[191] train-error:0.063172 valid-error:0.064796
[192] train-error:0.063154 valid-error:0.064823
[193] train-error:0.063181 valid-error:0.06485
[194] train-error:0.063172 valid-error:0.06485
[195] train-error:0.063181 valid-error:0.064823
[196] train-error:0.06319 valid-error:0.064823
[197] train-error:0.063128 valid-error:0.06485
[198] train-error:0.063092 valid-error:0.06485
[199] train-error:0.063029 valid-error:0.064823
[200] train-error:0.063065 valid-error:0.064823
[201] train-error:0.06302 valid-error:0.06485
[202] train-error:0.063012 valid-error:0.064823
[203] train-error:0.062976 valid-error:0.06485
[204] train-error:0.063012 valid-error:0.06485
[205] train-error:0.062958 valid-error:0.064957
[206] train-error:0.062931 valid-error:0.064903
[207] train-error:0.062922 valid-error:0.064903
[208] train-error:0.06294 valid-error:0.06493
[209] train-error:0.062904 valid-error:0.064876
[210] train-error:0.062869 valid-error:0.064903
[211] train-error:0.062895 valid-error:0.06493
[212] train-error:0.062869 valid-error:0.064957
[213] train-error:0.062895 valid-error:0.06493
[214] train-error:0.062851 valid-error:0.06493
[215] train-error:0.062851 valid-error:0.06493
[216] train-error:0.062824 valid-error:0.064876
[217] train-error:0.062806 valid-error:0.064796
[218] train-error:0.062753 valid-error:0.064796
[219] train-error:0.062762 valid-error:0.064823
[220] train-error:0.062735 valid-error:0.064769
[221] train-error:0.062699 valid-error:0.064823
[222] train-error:0.062717 valid-error:0.06485
[223] train-error:0.06269 valid-error:0.064742
[224] train-error:0.06269 valid-error:0.064742
[225] train-error:0.062672 valid-error:0.064769
[226] train-error:0.062646 valid-error:0.064769
[227] train-error:0.062646 valid-error:0.064796
[228] train-error:0.062637 valid-error:0.064769
[229] train-error:0.062646 valid-error:0.064769
[230] train-error:0.062646 valid-error:0.064769
[231] train-error:0.062646 valid-error:0.064742
[232] train-error:0.062655 valid-error:0.064742
[233] train-error:0.062646 valid-error:0.064769
[234] train-error:0.062655 valid-error:0.064769
[235] train-error:0.062663 valid-error:0.064796
[236] train-error:0.062637 valid-error:0.064796
[237] train-error:0.06261 valid-error:0.064823
[238] train-error:0.062619 valid-error:0.06485
[239] train-error:0.062583 valid-error:0.064823
[240] train-error:0.062574 valid-error:0.064716
[241] train-error:0.062547 valid-error:0.064769
[242] train-error:0.062574 valid-error:0.064742
[243] train-error:0.062565 valid-error:0.064689
[244] train-error:0.062583 valid-error:0.064689
[245] train-error:0.062574 valid-error:0.064689
[246] train-error:0.062565 valid-error:0.064716
[247] train-error:0.062574 valid-error:0.064716
[248] train-error:0.062538 valid-error:0.064689
[249] train-error:0.062521 valid-error:0.064716
[250] train-error:0.06253 valid-error:0.064662
[251] train-error:0.06253 valid-error:0.064689
[252] train-error:0.062476 valid-error:0.064662
[253] train-error:0.062476 valid-error:0.064716
[254] train-error:0.062503 valid-error:0.064716
[255] train-error:0.062503 valid-error:0.064716
[256] train-error:0.062521 valid-error:0.064635
[257] train-error:0.062476 valid-error:0.064635
[258] train-error:0.062485 valid-error:0.064635
[259] train-error:0.062503 valid-error:0.064609
[260] train-error:0.062449 valid-error:0.064475
[261] train-error:0.062414 valid-error:0.064421
[262] train-error:0.062414 valid-error:0.064421
[263] train-error:0.062396 valid-error:0.064421
[264] train-error:0.062378 valid-error:0.064448
[265] train-error:0.062351 valid-error:0.064475
[266] train-error:0.062342 valid-error:0.064448
[267] train-error:0.062342 valid-error:0.064528
[268] train-error:0.062333 valid-error:0.064528
[269] train-error:0.062324 valid-error:0.064528
[270] train-error:0.062306 valid-error:0.064501
[271] train-error:0.062298 valid-error:0.064475
[272] train-error:0.062306 valid-error:0.064528
[273] train-error:0.06228 valid-error:0.064555
[274] train-error:0.062289 valid-error:0.064609
[275] train-error:0.062253 valid-error:0.064662
[276] train-error:0.062271 valid-error:0.064609
[277] train-error:0.062253 valid-error:0.064609
[278] train-error:0.062235 valid-error:0.064609
[279] train-error:0.062217 valid-error:0.064501
[280] train-error:0.062226 valid-error:0.064555
[281] train-error:0.062235 valid-error:0.064501
[282] train-error:0.062226 valid-error:0.064448
[283] train-error:0.062181 valid-error:0.064394
[284] train-error:0.062199 valid-error:0.064448
[285] train-error:0.062173 valid-error:0.064448
[286] train-error:0.062146 valid-error:0.064421
[287] train-error:0.062137 valid-error:0.064394
[288] train-error:0.062155 valid-error:0.064394
[289] train-error:0.062173 valid-error:0.064394
[290] train-error:0.062164 valid-error:0.064421
[291] train-error:0.062137 valid-error:0.064501
[292] train-error:0.062146 valid-error:0.064555
[293] train-error:0.062137 valid-error:0.064501
[294] train-error:0.06211 valid-error:0.064528
[295] train-error:0.062101 valid-error:0.064528
[296] train-error:0.062092 valid-error:0.064475
[297] train-error:0.062092 valid-error:0.064475
[298] train-error:0.062083 valid-error:0.064448
[299] train-error:0.062092 valid-error:0.064394
[300] train-error:0.06203 valid-error:0.064528
[301] train-error:0.061994 valid-error:0.064501
[302] train-error:0.061994 valid-error:0.064475
[303] train-error:0.062012 valid-error:0.064475
[304] train-error:0.061985 valid-error:0.064475
[305] train-error:0.062003 valid-error:0.064475
[306] train-error:0.061941 valid-error:0.064421
[307] train-error:0.061932 valid-error:0.064421
[308] train-error:0.061923 valid-error:0.064421
[309] train-error:0.061878 valid-error:0.064421
[310] train-error:0.061869 valid-error:0.064421
[311] train-error:0.061869 valid-error:0.064394
[312] train-error:0.061878 valid-error:0.064368
[313] train-error:0.061869 valid-error:0.064394
[314] train-error:0.061869 valid-error:0.064421
[315] train-error:0.061878 valid-error:0.064475
[316] train-error:0.061851 valid-error:0.064475
[317] train-error:0.061878 valid-error:0.064448
[318] train-error:0.061869 valid-error:0.064394
[319] train-error:0.061833 valid-error:0.064394
[320] train-error:0.061789 valid-error:0.064314
[321] train-error:0.061807 valid-error:0.064314
[322] train-error:0.061807 valid-error:0.064314
[323] train-error:0.061789 valid-error:0.064287
[324] train-error:0.061789 valid-error:0.064314
[325] train-error:0.061789 valid-error:0.064287
[326] train-error:0.061798 valid-error:0.06426
[327] train-error:0.061798 valid-error:0.06426
[328] train-error:0.061798 valid-error:0.06426
[329] train-error:0.061798 valid-error:0.064234
[330] train-error:0.061789 valid-error:0.064234
[331] train-error:0.061789 valid-error:0.06426
[332] train-error:0.061798 valid-error:0.064314
[333] train-error:0.061807 valid-error:0.064314
[334] train-error:0.061816 valid-error:0.064341
[335] train-error:0.061816 valid-error:0.064314
[336] train-error:0.061824 valid-error:0.064287
[337] train-error:0.061824 valid-error:0.064314
[338] train-error:0.061833 valid-error:0.064314
[339] train-error:0.061816 valid-error:0.064314
[340] train-error:0.061816 valid-error:0.064234
[341] train-error:0.061789 valid-error:0.06426
[342] train-error:0.061771 valid-error:0.06426
[343] train-error:0.06178 valid-error:0.064314
[344] train-error:0.061798 valid-error:0.064287
[345] train-error:0.061798 valid-error:0.06418
[346] train-error:0.061744 valid-error:0.064207
[347] train-error:0.061762 valid-error:0.064153
[348] train-error:0.061762 valid-error:0.064153
[349] train-error:0.061762 valid-error:0.064153
[350] train-error:0.061771 valid-error:0.064234
[351] train-error:0.061762 valid-error:0.064234
[352] train-error:0.061744 valid-error:0.064234
[353] train-error:0.06178 valid-error:0.064234
[354] train-error:0.061744 valid-error:0.064234
[355] train-error:0.061744 valid-error:0.06426
[356] train-error:0.061753 valid-error:0.064287
[357] train-error:0.061735 valid-error:0.064234
[358] train-error:0.061744 valid-error:0.06426
[359] train-error:0.061726 valid-error:0.06426
[360] train-error:0.061691 valid-error:0.06426
[361] train-error:0.0617 valid-error:0.06426
[362] train-error:0.061691 valid-error:0.064287
[363] train-error:0.061691 valid-error:0.064234
[364] train-error:0.061691 valid-error:0.064234
[365] train-error:0.061664 valid-error:0.064287
[366] train-error:0.061673 valid-error:0.064287
[367] train-error:0.061646 valid-error:0.064314
[368] train-error:0.061646 valid-error:0.064314
[369] train-error:0.061655 valid-error:0.064287
[370] train-error:0.061646 valid-error:0.064314
[371] train-error:0.061673 valid-error:0.064314
[372] train-error:0.061682 valid-error:0.064314
[373] train-error:0.061664 valid-error:0.064341
[374] train-error:0.061682 valid-error:0.064368
[375] train-error:0.061655 valid-error:0.064368
[376] train-error:0.061637 valid-error:0.064368
[377] train-error:0.061619 valid-error:0.064341
[378] train-error:0.06161 valid-error:0.064368
[379] train-error:0.061628 valid-error:0.064368
[380] train-error:0.061619 valid-error:0.064368
[381] train-error:0.061619 valid-error:0.064368
[382] train-error:0.061637 valid-error:0.064341
[383] train-error:0.061592 valid-error:0.064341
[384] train-error:0.061592 valid-error:0.064341
[385] train-error:0.061575 valid-error:0.06426
[386] train-error:0.061584 valid-error:0.064287
[387] train-error:0.061584 valid-error:0.064287
[388] train-error:0.061592 valid-error:0.064234
[389] train-error:0.061575 valid-error:0.06426
[390] train-error:0.061539 valid-error:0.064234
[391] train-error:0.061521 valid-error:0.06426
[392] train-error:0.061521 valid-error:0.064234
[393] train-error:0.06153 valid-error:0.064207
[394] train-error:0.061539 valid-error:0.064207
[395] train-error:0.061521 valid-error:0.064234
[396] train-error:0.061485 valid-error:0.064287
[397] train-error:0.061485 valid-error:0.064287
[398] train-error:0.061485 valid-error:0.064287
[399] train-error:0.061494 valid-error:0.064287
[400] train-error:0.061485 valid-error:0.064287
[401] train-error:0.061503 valid-error:0.064287
[402] train-error:0.061494 valid-error:0.064287
[403] train-error:0.061494 valid-error:0.064314
[404] train-error:0.061512 valid-error:0.064314
[405] train-error:0.061521 valid-error:0.064314
[406] train-error:0.061503 valid-error:0.064341
[407] train-error:0.061494 valid-error:0.064368
[408] train-error:0.061476 valid-error:0.064368
[409] train-error:0.061476 valid-error:0.064341
[410] train-error:0.061476 valid-error:0.064341
[411] train-error:0.061459 valid-error:0.064314
[412] train-error:0.061423 valid-error:0.06426
[413] train-error:0.061432 valid-error:0.064207
[414] train-error:0.06145 valid-error:0.064207
[415] train-error:0.061467 valid-error:0.064207
[416] train-error:0.061459 valid-error:0.064207
[417] train-error:0.061467 valid-error:0.064234
[418] train-error:0.061459 valid-error:0.064234
[419] train-error:0.061423 valid-error:0.064234
[420] train-error:0.061432 valid-error:0.064234
[421] train-error:0.06145 valid-error:0.06426
[422] train-error:0.061441 valid-error:0.06426
[423] train-error:0.061423 valid-error:0.06426
[424] train-error:0.061441 valid-error:0.06426
[425] train-error:0.061432 valid-error:0.064234
[426] train-error:0.061432 valid-error:0.064234
[427] train-error:0.061414 valid-error:0.064234
[428] train-error:0.061432 valid-error:0.064234
[429] train-error:0.061396 valid-error:0.064234
[430] train-error:0.061423 valid-error:0.064234
[431] train-error:0.061405 valid-error:0.06426
[432] train-error:0.06136 valid-error:0.06426
[433] train-error:0.061369 valid-error:0.06426
[434] train-error:0.061396 valid-error:0.06426
[435] train-error:0.061405 valid-error:0.06426
[436] train-error:0.061405 valid-error:0.06426
[437] train-error:0.061378 valid-error:0.064287
[438] train-error:0.061369 valid-error:0.064314
[439] train-error:0.061378 valid-error:0.064314
[440] train-error:0.06136 valid-error:0.064314
[441] train-error:0.061343 valid-error:0.064314
[442] train-error:0.061325 valid-error:0.064287
[443] train-error:0.061325 valid-error:0.064341
[444] train-error:0.061307 valid-error:0.064314
[445] train-error:0.061325 valid-error:0.064314
[446] train-error:0.061325 valid-error:0.064314
[447] train-error:0.061307 valid-error:0.064341
Stopping. Best iteration:
[347] train-error:0.061762 valid-error:0.064153
time: 1min 59s
保存模型
model_xgb.dump_model('xgb_v1')
time: 206 ms
信用卡评估各特征重要性
xgb.plot_importance(model_xgb);
time: 559 ms
XGBOOST的树的可视化
xgb.to_graphviz(model_xgb)
time: 159 ms
预测预测借款人是否会预期
dtest = xgb.DMatrix(X_valid, feature_names=feature_name)
y_test = model_xgb.predict(dtest)
entry = pd.DataFrame()
entry['ID'] = np.arange(1, len(y_test)+1)
entry['Probability'] = y_test
time: 1.17 s
entry.to_csv('pred.csv', header=True, index=False)
time: 258 ms
四.总结
- 贷款人群年龄基本呈正态分布,30-40的贷款金额最多
- 20-30的人群是信用卡逾期的高发人群
- 信用卡和个人信贷额度的总余额,负债率,月收入是贷款人是否会逾期的最重要的三个因素
- 由于数据比较稀疏,在建模前酒醒了离散化处理,有利于建立强壮的模型