scikit-learn作业题

作业题目:

scikit-learn作业题_第1张图片

解题思路:

1.用sklearn的datasets中的make_classification函数创建一个数据集,根据所给提示,我们设置n_samples=1000, n_features=10;

2.用交叉验证分离数据集为10份,用cross_validation的KFolds函数;

3.分别使用三种训练算法:朴素贝叶斯、支持向量机和随机森林;

4.分别使用三种指标评估交叉验证的性能:精确率、F1值、AUC ROC曲线。

代码:

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

#创建数据集
dataset = datasets.make_classification(n_samples=1000, n_features=10)

#用交叉验证分离数据
cv = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
for train_index, test_index in cv:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

#朴素贝叶斯
print('Gaussian NB') 
alg = GaussianNB() #算法训练,下同
alg.fit(X_train, y_train) 
pred = alg.predict(X_test)
print(metrics.accuracy_score(y_test, pred))#评估指标
print(metrics.f1_score(y_test, pred))
print(metrics.roc_auc_score(y_test, pred))
print('\n')

#支持向量机
print('SVC')
alg = SVC(C = 1e-01, kernel = 'rbf', gamma = 0.1)
alg.fit(X_train, y_train)
pred = alg.predict(X_test)
print(metrics.accuracy_score(y_test, pred))
print(metrics.f1_score(y_test, pred))
print(metrics.roc_auc_score(y_test, pred))
print('\n')

#随机森林
print('RandomForestClassifier')
alg = RandomForestClassifier(n_estimators=10)
alg.fit(X_train, y_train)
pred = alg.predict(X_test)
print(metrics.accuracy_score(y_test, pred))
print(metrics.f1_score(y_test, pred))
print(metrics.roc_auc_score(y_test, pred))

结果:

scikit-learn作业题_第2张图片

你可能感兴趣的:(作业)