python打卡day31@浙大疏锦行

DAY 31 文件的规范拆分和写法

知识点回顾

  1. 规范的文件命名
  2. 规范的文件夹管理
  3. 机器学习项目的拆分
  4. 编码格式和类型注解

作业:尝试针对之前的心脏病项目,准备拆分的项目文件,思考下哪些部分可以未来复用。

一、导入数据库

import numpy as np
import pandas as pd

二、数据可视化

import matplotlib.pyplot as plt
import seaborn as sns 
 
from sklearn.ensemble import RandomForestClassifier 
 
from sklearn.tree import DecisionTreeClassifier
 
from sklearn.tree import export_graphviz 
 
from sklearn.metrics import roc_curve, auc 
from sklearn.metrics import classification_report 
 
from sklearn.metrics import confusion_matrix 
 
from sklearn.model_selection import train_test_split 
 
 
 
np.random.seed(123) 
pd.options.mode.chained_assignment = None  
 
%matplotlib inline

三、可视化风格的设置

sns.set(palette = 'pastel', rc = {"figure.figsize": (10,5), 
                                  "axes.titlesize" : 14,    
                                  "axes.labelsize" : 12,    
                                  "xtick.labelsize" : 10,   
                                  "ytick.labelsize" : 10 }) 
a = sns.countplot(x = 'target', data = dt)              
a.set_title('Distribution of Presence of Heart Disease') 
a.set_xticklabels(['Absent', 'Present'])                 
plt.xlabel("Presence of Heart Disease")                  
 
plt.show()
g = sns.countplot(x = 'age', data = dt) 
g.set_title('Distribution of Age')      
plt.xlabel('Age')                       
b = sns.countplot(x = 'target', data = dt, hue = 'sex')         
plt.legend(['Female', 'Male'])                                    
b.set_title('Distribution of Presence of Heart Disease by Sex')   
b.set_xticklabels(['Absent', 'Present'])

plt.show()
sns.distplot(dt['chol'].dropna(), kde=True, color='darkblue', bins=40)

四、数据预处理

dt['sex'][dt['sex'] == 0] = 'female'
dt['sex'][dt['sex'] == 1] = 'male'
 
dt['chest_pain_type'][dt['chest_pain_type'] == 1] = 'typical angina'
dt['chest_pain_type'][dt['chest_pain_type'] == 2] = 'atypical angina'
dt['chest_pain_type'][dt['chest_pain_type'] == 3] = 'non-anginal pain'
dt['chest_pain_type'][dt['chest_pain_type'] == 4] = 'asymptomatic'
 
dt['fasting_blood_sugar'][dt['fasting_blood_sugar'] == 0] = 'lower than 120mg/ml'
dt['fasting_blood_sugar'][dt['fasting_blood_sugar'] == 1] = 'greater than 120mg/ml'
 
dt['rest_ecg'][dt['rest_ecg'] == 0] = 'normal'
dt['rest_ecg'][dt['rest_ecg'] == 1] = 'ST-T wave abnormality'
dt['rest_ecg'][dt['rest_ecg'] == 2] = 'left ventricular hypertrophy'
 
dt['exercise_induced_angina'][dt['exercise_induced_angina'] == 0] = 'no'
dt['exercise_induced_angina'][dt['exercise_induced_angina'] == 1] = 'yes'
 
dt['st_slope'][dt['st_slope'] == 1] = 'upsloping'
dt['st_slope'][dt['st_slope'] == 2] = 'flat'
dt['st_slope'][dt['st_slope'] == 3] = 'downsloping'
 
dt['thalassemia'][dt['thalassemia'] == 1] = 'normal'
dt['thalassemia'][dt['thalassemia'] == 2] = 'fixed defect'
dt['thalassemia'][dt['thalassemia'] == 3] = 'reversable defect'

五、创建模型

model = RandomForestClassifier(max_depth=5, n_estimators=10)    
model.fit(X_train, y_train)                                     

六、模型训练

y_predict = model.predict(X_test)
y_pred_quant = model.predict_proba(X_test)[:, 1]
y_pred_bin = model.predict(X_test)

七、模型评估

total=sum(sum(confusion_matrix))
 
sensitivity = confusion_matrix[0,0]/(confusion_matrix[0,0]+confusion_matrix[1,0])
print('灵敏度 : ', sensitivity )
 
specificity = confusion_matrix[1,1]/(confusion_matrix[1,1]+confusion_matrix[0,1])
print('特异度 : ', specificity)

你可能感兴趣的:(python,开发语言)