首次编辑:2020-08-06
编辑人:Cairne
随着互联网在传统金融和电子商务领域的不断渗透,风控+互联网的融合也对传统的风控提出了新的要求和挑战。以评分卡为例,互联网形态下的评分卡需要面临更多维数据、更实时数据、更异常数据的挑战。因此,懂得互联网业务下的风控评分卡已经成为互联网风控从业人员的新要求。1
以下内容翻译自参考2 用于个人学习。2
Python中信贷评分卡中常用的两个库有“scorecardpy”和“Toad”。其中scorecardpy是由谢士晨博士开发,该软件包是R软件包评分卡的python版本。它的目标是通过提供一些常见任务的功能,使传统信用风险计分卡模型的开发更加轻松有效。该包的功能及对应的函数如下:
· 数据划分(split_df)
· 变量选择(iv, var_filter)
· 变量按权重(woe)分箱(woebin, woebin_plot, woebin_adj, woebin_ply)
· 评分转换(scorecard, scorecard_ply)
· 模型评估(perf_eva, perf_psi)
在命令台中使用以下代码,从PYPI中下载scorecardpy的最新发布版本:
# python2
pip install scorecardpy
# python3
pip3 install scorecardpy
也可以使用以下指令从GitHub上下载scorecardpy的最新版本:
# python2
pip install git+git://github.com/shichenxie/scorecardpy.git
# python3
pip3 install git+git://github.com/shichenxie/scorecardpy.git
注意,以上安装需要Pandas版本高于(包括)0.25.0,否则下载会失败!
以下实例向您展示了如何开发一个通用的信用风险评分卡项目:
# Traditional Credit Scoring Using Logistic Regression
# 基于逻辑回归的传统信用回归
import scorecardpy as sc
# data prepare ------
# 数据准备 -----
# load germancredit data
# 加载germancredit(德国信贷)数据
dat = sc.germancredit()
# filter variable via missing rate, iv, identical value rate
# 通过数据缺失率、IV值、相同值来过滤变量
dt_s = sc.var_filter(dat, y="creditability")
# breaking dt into train and test
# 将DataFrame分成训练集和测试集
train, test = sc.split_df(dt_s, 'creditability').values()
# woe binning ------
# 根据woe值进行分箱
bins = sc.woebin(dt_s, y="creditability")
# sc.woebin_plot(bins)
# binning adjustment
# 分箱调整
# # adjust breaks interactively
# # 有交互地进行重组调整
# breaks_adj = sc.woebin_adj(dt_s, "creditability", bins)
# # or specify breaks manually
# # 或者手动进行重组调整
breaks_adj = {
'age.in.years': [26, 35, 40],
'other.debtors.or.guarantors': ["none", "co-applicant%,%guarantor"]
}
bins_adj = sc.woebin(dt_s, y="creditability", breaks_list=breaks_adj)
# converting train and test into woe values
# 将测试集和训练集转换为woe值
train_woe = sc.woebin_ply(train, bins_adj)
test_woe = sc.woebin_ply(test, bins_adj)
y_train = train_woe.loc[:,'creditability']
X_train = train_woe.loc[:,train_woe.columns != 'creditability']
y_test = test_woe.loc[:,'creditability']
X_test = test_woe.loc[:,train_woe.columns != 'creditability']
# logistic regression ------
# 逻辑回归 -----
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(penalty='l1', C=0.9, solver='saga', n_jobs=-1)
lr.fit(X_train, y_train)
# lr.coef_
# lr.intercept_
# predicted proability
# 可能性预测
train_pred = lr.predict_proba(X_train)[:,1]
test_pred = lr.predict_proba(X_test)[:,1]
# performance ks & roc ------
# ks 和 roc 的性能表现 -----
train_perf = sc.perf_eva(y_train, train_pred, title = "train")
test_perf = sc.perf_eva(y_test, test_pred, title =