如图所示的训练数据集,其正样本是 x 1 = ( 3 , 3 ) ⊤ x_1 = (3,3)^\top x1=(3,3)⊤, x 2 = ( 4 , 3 ) ⊤ x_2 = (4,3)^\top x2=(4,3)⊤,负样本是 x 3 = ( 1 , 1 ) ⊤ x_3 = (1,1)^\top x3=(1,1)⊤,使用感知器算法的随机梯度法求感知机模型 f ( x ) = sign ( w ⋅ x + b ) f(x) = \text{sign}(w \cdot x + b) f(x)=sign(w⋅x+b)。这里, w = ( w ( 1 ) , w ( 2 ) ) ⊤ w = (w^{(1)}, w^{(2)})^\top w=(w(1),w(2))⊤, x = ( x ( 1 ) , x ( 2 ) ) ⊤ x = (x^{(1)}, x^{(2)})^\top x=(x(1),x(2))⊤。
解答
构建最优化问题:
min w , b L ( w , b ) = − ∑ x i ∈ M y i ( w ⋅ x i + b ) \min_{w,b} L(w,b) = -\sum_{x_i \in M} y_i (w \cdot x_i + b) w,bminL(w,b)=−xi∈M∑yi(w⋅xi+b)
求解 w w w, b b b, η = 1 \eta = 1 η=1。
(1) 取初值 w 0 = 0 w_0 = 0 w0=0, b 0 = 0 b_0 = 0 b0=0;
(2) 对 x 1 = ( 3 , 3 ) ⊤ x_1 = (3,3)^\top x1=(3,3)⊤, y 1 ( w 0 ⋅ x 1 + b 0 ) = 0 y_1 (w_0 \cdot x_1 + b_0) = 0 y1(w0⋅x1+b0)=0,未能被正确分类,更新 w w w, b b b:
w 1 = w 0 + y 1 x 1 = ( 3 , 3 ) ⊤ b 1 = b 0 + y 1 = 1 w_1 = w_0 + y_1 x_1 = (3,3)^\top \quad b_1 = b_0 + y_1 = 1 w1=w0+y1x1=(3,3)⊤b1=b0+y1=1
得到线性模型:
w 1 ⋅ x + b 1 = 3 x ( 1 ) + 3 x ( 2 ) + 1 w_1 \cdot x + b_1 = 3x^{(1)} + 3x^{(2)} + 1 w1⋅x+b1=3x(1)+3x(2)+1
(3) 对 x 1 x_1 x1, x 2 x_2 x2,显然, y i ( w 1 ⋅ x i + b 1 ) > 0 y_i (w_1 \cdot x_i + b_1) > 0 yi(w1⋅xi+b1)>0,被正确分类,不修改 w w w, b b b;对 x 3 = ( 1 , 1 ) ⊤ x_3 = (1,1)^\top x3=(1,1)⊤, y 3 ( w 1 ⋅ x 3 + b 1 ) < 0 y_3 (w_1 \cdot x_3 + b_1) < 0 y3(w1⋅x3+b1)<0,被错分类,更新 w w w, b b b:
w 2 = w 1 + y 3 x 3 = ( 2 , 2 ) ⊤ b 2 = b 1 + y 3 = 0 w_2 = w_1 + y_3 x_3 = (2,2)^\top \quad b_2 = b_1 + y_3 = 0 w2=w1+y3x3=(2,2)⊤b2=b1+y3=0
得到线性模型:
w 2 ⋅ x + b 2 = 2 x ( 1 ) + 2 x ( 2 ) w_2 \cdot x + b_2 = 2x^{(1)} + 2x^{(2)} w2⋅x+b2=2x(1)+2x(2)
如此继续下去,直到
w 7 = ( 1 , 1 ) ⊤ , b 7 = − 3 w_7 = (1, 1)^\top, \quad b_7 = -3 w7=(1,1)⊤,b7=−3
w 7 ⋅ x + b 7 = x ( 1 ) + x ( 2 ) − 3 w_7 \cdot x + b_7 = x^{(1)} + x^{(2)} - 3 w7⋅x+b7=x(1)+x(2)−3
对所有数据点 y i ( w 7 ⋅ x i + b 7 ) > 0 y_i(w_7 \cdot x_i + b_7) > 0 yi(w7⋅xi+b7)>0,没有错分类点,损失函数达到极小。
分离超平面为 x ( 1 ) + x ( 2 ) − 3 = 0 x^{(1)} + x^{(2)} - 3 = 0 x(1)+x(2)−3=0,感知机模型为 f ( x ) = sign ( x ( 1 ) + x ( 2 ) − 3 ) f(x) = \text{sign}(x^{(1)} + x^{(2)} - 3) f(x)=sign(x(1)+x(2)−3)。
迭代过程见表。
表 求解的迭代过程
迭代次数 | 错分类点 | w w w | b b b | w ⋅ x + b w \cdot x + b w⋅x+b |
---|---|---|---|---|
0 | - | 0 | 0 | 0 |
1 | x 1 x_1 x1 | ( 3 , 3 ) ⊤ (3, 3)^\top (3,3)⊤ | 1 | 3 x ( 1 ) + 3 x ( 2 ) + 1 3x^{(1)} + 3x^{(2)} + 1 3x(1)+3x(2)+1 |
2 | x 3 x_3 x3 | ( 2 , 2 ) ⊤ (2, 2)^\top (2,2)⊤ | 0 | 2 x ( 1 ) + 2 x ( 2 ) 2x^{(1)} + 2x^{(2)} 2x(1)+2x(2) |
3 | x 3 x_3 x3 | ( 1 , 1 ) ⊤ (1, 1)^\top (1,1)⊤ | -1 | x ( 1 ) + x ( 2 ) − 1 x^{(1)} + x^{(2)} - 1 x(1)+x(2)−1 |
4 | x 3 x_3 x3 | ( 0 , 0 ) ⊤ (0, 0)^\top (0,0)⊤ | -2 | -2 |
5 | x 1 x_1 x1 | ( 3 , 3 ) ⊤ (3, 3)^\top (3,3)⊤ | -1 | 3 x ( 1 ) + 3 x ( 2 ) − 1 3x^{(1)} + 3x^{(2)} - 1 3x(1)+3x(2)−1 |
6 | x 3 x_3 x3 | ( 2 , 2 ) ⊤ (2, 2)^\top (2,2)⊤ | -2 | 2 x ( 1 ) + 2 x ( 2 ) − 2 2x^{(1)} + 2x^{(2)} - 2 2x(1)+2x(2)−2 |
7 | x 3 x_3 x3 | ( 1 , 1 ) ⊤ (1, 1)^\top (1,1)⊤ | -3 | x ( 1 ) + x ( 2 ) − 3 x^{(1)} + x^{(2)} - 3 x(1)+x(2)−3 |
8 | 0 | ( 1 , 1 ) ⊤ (1, 1)^\top (1,1)⊤ | -3 | x ( 1 ) + x ( 2 ) − 3 x^{(1)} + x^{(2)} - 3 x(1)+x(2)−3 |
这是在计算中错分类点先后取 x 1 , x 3 , x 3 , x 3 , x 1 , x 3 , x 3 x_1, x_3, x_3, x_3, x_1, x_3, x_3 x1,x3,x3,x3,x1,x3,x3 得到的分离超平面和感知机模型。如果在计算中错分类点依次取 x 1 , x 3 , x 3 , x 3 , x 2 , x 3 , x 3 , x 1 , x 3 , x 3 x_1, x_3, x_3, x_3, x_2, x_3, x_3, x_1, x_3, x_3 x1,x3,x3,x3,x2,x3,x3,x1,x3,x3,那么得到的分离超平面是 2 x ( 1 ) + x ( 2 ) − 5 = 0 2x^{(1)} + x^{(2)} - 5 = 0 2x(1)+x(2)−5=0。
可见,感知器算法由于采用不同的初值或选取不同的错分类点,解可以不同。