鲁棒最小二乘法的三种优化形式(CVX)

文章目录

    • 数据初始部分
    • (a) robust least-squares problem
    • (b)least-squares problem with variable weights
    • (c)quadratic program
    • 参考

鲁棒最小二乘法的主要思想是对误差大的样本进行抑制,减小他们对结果的影响。这里主要整理一下参考部分的CVX代码思路。这个代码给出了三种等价的优化形式

数据初始部分

测试数据是随机生成的

randn('state',0);
m = 16; n = 8;
A = randn(m,n);
b = randn(m,1);
M = 2;

(a) robust least-squares problem

m i n i m i z e β ∑ i = 1 m h u b e r ( β T x i − y i ) h u b e r ( u ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M \underset{\beta}{minimize}\sum_{i=1}^{m} huber(\beta^Tx_i - y_i)\\ huber(u)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right. βminimizei=1mhuber(βTxiyi)huber(u)={u2,M(2uM),u<=Mu>M
这里是对他们的误差进行限制,以免回归系数过度偏向误差大的样本造成过拟合,M是一个常数。
∣ β T x i − y i ∣ > M |\beta^Tx_i - y_i|>M βTxiyi>M,令 ∣ u ∣ = ∣ β T x i − y i ∣ = M + v , v > 0 |u|=|\beta^Tx_i - y_i|=M+v,v>0 u=βTxiyi=M+v,v>0,则有 M ( 2 ∣ u ∣ − M ) = M ( 2 ( M + v ) − M ) = M 2 + 2 M v = ( M + v ) 2 − v 2 M(2|u| - M)=M(2(M+v)-M)=M^2+2Mv=(M+v)^2-v^2 M(2uM)=M(2(M+v)M)=M2+2Mv=(M+v)2v2
v视为超出M的部分,为了放缓残差的增长速率,这个函数实际时扔掉了v的二次项

disp('Computing the solution of the robust least-squares problem...');
cvx_begin
    variable x1(n)
    minimize( sum(huber(A*x1-b,M)) )
cvx_end

(b)least-squares problem with variable weights

权值优化

disp('Computing the solution of the least-squares problem with variable weights...');
cvx_begin
    variable x2(n)
    variable w(m)
    minimize( sum(quad_over_lin(diag(A*x2-b),w'+1)) + M^2*ones(1,m)*w)
    w >= 0;
cvx_end

这个形式感觉有一些突兀,没有第一种来得直观
先看看误差函数 f ( w ) = u 2 / ( w + 1 ) + M 2 ∗ w , u = β T x i − y i , w > = 0 f(w)=u^2/(w+1)+M^2*w,u =\beta^Tx_i - y_i,w>=0 f(w)=u2/(w+1)+M2w,u=βTxiyi,w>=0

f ′ ( w ) = M 2 − u 2 / ( w + 1 ) 2 f'(w)=M^2 - u^2/(w + 1)^2 f(w)=M2u2/(w+1)2

可以看到,当 ∣ u ∣ < M , f ′ ( w ) > 0 , f ( w ) |u|0,f(w) u<M,f(w)>0,f(w)单调递增,因此极值在 w = 0 w=0 w=0处得到,反之,取 f ′ ( w ) = 0 ⇒ w ∗ = ∣ u ∣ M − 1 f'(w)=0\Rightarrow w^*=\frac{|u|}{M}-1 f(w)=0w=Mu1
因此带入 f ( w ) f(w) f(w)得到

f ( w ) = { u 2 , ∣ u ∣ < = M M ( 2 ∣ u ∣ − M ) , ∣ u ∣ > M f(w)=\left\{\begin{matrix} u^2 ,&|u| <= M \\ M(2|u| - M),& |u| > M \end{matrix}\right. f(w)={u2,M(2uM),u<=Mu>M
等价于huber函数
这里需要说明的是
quad_over_lin(diag(A*x2-b),w’+1)的意思

A ∗ x 2 − b ∈ R m × 1 A*x2-b \in \mathbb{R}^{m \times 1} Ax2bRm×1属于向量, d i a g diag diag将其转为对角矩阵,对应其对角元素。个人理解纯粹是为了quad_over_lin(x,y)计算

f quad_over_lin ( x , y ) = { x T x / y y > 0 + ∞ y ≤ 0 f_{\text{quad\_over\_lin}}(x,y) = \begin{cases} x^Tx/y & y > 0 \\ +\infty & y\leq 0 \end{cases} fquad_over_lin(x,y)={xTx/y+y>0y0

©quadratic program

disp('Computing the solution of the quadratic program...');
cvx_begin
    variable x3(n)
    variable u(m)
    variable v(m)
    minimize( sum(square(u) +  2*M*v) )
    A*x3 - b <= u + v;
    A*x3 - b >= -u - v;
    u >= 0;
    u <= M;
    v >= 0;
cvx_end

目标值关于v,u的单调递增的函数,因此u和v越小越好

假设有 t i = ∣ A i T x 3 − b i ∣ t_i=|A_i^Tx_3 - b_i| ti=AiTx3bi,则有 0 < = t i < = u i + v i 0<=t_i<=u_i+v_i 0<=ti<=ui+vi
极值条件下,必有 u i + v i = t i u_i+v_i=t_i ui+vi=ti
v i = t i − u i , v i > = 0 ⇒ u i < = t i v_i = t_i-u_i,v_i>=0\Rightarrow u_i<=t_i vi=tiui,vi>=0ui<=ti
优化目标 f ( u i ) = s q u a r e ( u i ) + 2 ∗ M ∗ v i ) = u i 2 − 2 M u i + 2 M t i f(u_i)=square(u_i) + 2*M*v_i)=u_i^2-2Mu_i+2Mt_i f(ui)=square(ui)+2Mvi)=ui22Mui+2Mti

一阶导得到 f ′ ( u i ) = 2 u i − 2 M f'(u_i)=2u_i-2M f(ui)=2ui2M,由于 ∣ 0 ∣ < = u i < = M |0|<=u_i<=M 0<=ui<=M
u i < = M u_i<=M ui<=M目标单调递减,目标的极值位置取决于t_i

u i ∗ = { ∣ t i ∣ , t i < = M M , t i > M u_i^*=\left\{\begin{matrix} |t_i| ,&t_i <= M \\ M,& t_i > M \end{matrix}\right. ui={ti,M,ti<=Mti>M

相对应的
v i ∗ = { 0 , t i < = M t i − M , t i > M v_i^*=\left\{\begin{matrix} 0 ,&t_i <= M \\ t_i-M,& t_i > M \end{matrix}\right. vi={0,tiM,ti<=Mti>M

将最优解带入到优化目标种可以得到
f ( u , v ) = { t i 2 , t i < = M M ( 2 t i − M ) , t i > M f(u,v)=\left\{\begin{matrix} t_i^2 ,&t_i <= M \\ M(2 t_i-M),& t_i > M \end{matrix}\right. f(uv)={ti2,M(2tiM),ti<=Mti>M

参考

http://web.cvxr.com/cvx/examples/cvxbook/Ch04_cvx_opt_probs/html/ex_4_5.html#source

你可能感兴趣的:(数据科学,CVX)