机器学习算法之----SVM

SVM代码实例

本博客重在讲解SVM原理,以及其具体实现方式。

  • What is SVM?
    Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression. However, it is mostly used in classification problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is the number of features ) with the value of each feature being the value of a particular coordinate.
  • How is the data classified?
    We perform classification by finding the hyperplane that differentiates the two classes very well. In other words the algorithm outputs an optimal hyperplane which categorizes new examples.
  • What is an optimal Hyper-Plane?
    For SVM, it’s the one that maximizes the margins from both tags. In other words: the hyperplane whose distance to the nearest element of each tag is the largest.
  • KERNEL
    The learning of the hyperplane in linear SVM is done by transforming the problem using some linear algebra. This is where the kernel plays role.
    Polynomial and exponential kernels calculates separation line in higher dimension. This is called kernel trick.
  • GAMMA
  • The gamma parameter defines how far the influence of a single training set reaches. With low gamma, points far away from the possible separation line are considered in calculation for the separation line. Where as high gamma means the points close to possible line are considered in calculation.
  • REGULARIZATION
    For large values of this parameter, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of it will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points.
  • MARGIN
    A margin is a separation of line to the closest class points. A good margin is one where this separation is larger for both the classes. A good margin allows the points to be in their respective classes without crossing to other class.

Support Vector Machine (SVM)

代码流程

导入库函数

@Adam_Louis
'''
adam坤
'''
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

导入数据集

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, 4].values

将数据集分离为训练集与测试集

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

特征缩放

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

用SVM拟合训练集

from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

预测测试集结果

y_pred = classifier.predict(X_test)

生成融合矩阵

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)

可视化训练集结果

from matplotlib.colors import ListedColormap
X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

机器学习算法之----SVM_第1张图片
机器学习算法之----SVM_第2张图片

你可能感兴趣的:(AI程序员,机器学习,算法)