Levenberg-Marquardt(LM)算法是非线性最小二乘问题中常用的一种优化算法,它融合了高斯-牛顿法和梯度下降法的优点,在数值计算与SLAM、图像配准、机器学习等领域中应用广泛。
我们希望最小化一个非线性残差平方和目标函数:
min x f ( x ) = 1 2 ∑ i = 1 m r i ( x ) 2 = 1 2 ∥ r ( x ) ∥ 2 \min_{\mathbf{x}} \, f(\mathbf{x}) = \frac{1}{2} \sum_{i=1}^m r_i(\mathbf{x})^2 = \frac{1}{2} \| \mathbf{r}(\mathbf{x}) \|^2 xminf(x)=21i=1∑mri(x)2=21∥r(x)∥2
其中,
我们要最小化的是残差的平方和。
在当前点 x k \mathbf{x}_k xk 处,对残差函数进行一阶泰勒展开:
r ( x k + Δ x ) ≈ r ( x k ) + J ( x k ) Δ x \mathbf{r}(\mathbf{x}_k + \Delta \mathbf{x}) \approx \mathbf{r}(\mathbf{x}_k) + J(\mathbf{x}_k) \Delta \mathbf{x} r(xk+Δx)≈r(xk)+J(xk)Δx
其中 J ∈ R m × n J \in \mathbb{R}^{m \times n} J∈Rm×n 是 Jacobian:
J i j = ∂ r i ∂ x j J_{ij} = \frac{\partial r_i}{\partial x_j} Jij=∂xj∂ri
代入目标函数:
min Δ x 1 2 ∥ r + J Δ x ∥ 2 \min_{\Delta \mathbf{x}} \frac{1}{2} \| \mathbf{r} + J \Delta \mathbf{x} \|^2 Δxmin21∥r+JΔx∥2
导出正规方程(Normal Equation):
J T J Δ x = − J T r J^T J \Delta \mathbf{x} = - J^T \mathbf{r} JTJΔx=−JTr
这就是高斯-牛顿法。
LM法通过引入一个阻尼因子 λ \lambda λ 来平衡 Gauss-Newton 与 Gradient Descent:
( J T J + λ I ) Δ x = − J T r (J^T J + \lambda I) \Delta \mathbf{x} = - J^T \mathbf{r} (JTJ+λI)Δx=−JTr
为了更稳定地调整 λ \lambda λ,可以采用如下对角矩阵:
( J T J + λ ⋅ diag ( J T J ) ) Δ x = − J T r (J^T J + \lambda \cdot \text{diag}(J^T J)) \Delta \mathbf{x} = - J^T \mathbf{r} (JTJ+λ⋅diag(JTJ))Δx=−JTr
这种处理使 LM 更具有数值稳定性。
x = x0
lambda = lambda_init
while not converged:
r = residual(x)
J = jacobian(x)
H = J^T * J
g = J^T * r
solve (H + lambda * diag(H)) * dx = -g
if cost(x + dx) < cost(x):
x = x + dx
lambda = lambda / factor
else:
lambda = lambda * factor
构造修正的线性系统:
( J T J + λ ⋅ diag ( J T J ) ) Δ x = − J T r (J^T J + \lambda \cdot \text{diag}(J^T J)) \Delta \mathbf{x} = -J^T \mathbf{r} (JTJ+λ⋅diag(JTJ))Δx=−JTr
更新新参数: x new = x + Δ x \mathbf{x}_{\text{new}} = \mathbf{x} + \Delta \mathbf{x} xnew=x+Δx
计算新误差 ∥ r ( x new ) ∥ 2 \| \mathbf{r}(\mathbf{x}_{\text{new}}) \|^2 ∥r(xnew)∥2
如果误差变小,接受更新,并降低 λ \lambda λ:
λ ← λ / factor \lambda \leftarrow \lambda / \text{factor} λ←λ/factor
否则,拒绝更新,提高阻尼系数以减小步长:
λ ← λ × factor \lambda \leftarrow \lambda \times \text{factor} λ←λ×factor
如果满足以下任一条件,则终止:
否则返回步骤 2,继续迭代
初始化 x, lambda
↓
计算残差 r(x), Jacobian J
↓
构建系统 (JᵀJ + λI)Δx = -Jᵀr
↓
求解 Δx
↓
计算新误差 cost(x + Δx)
↓
误差减少?
┌─────────────┐
↓ ↓
Yes No
↓ ↓
x ← x + Δx λ ← λ × factor
λ ← λ / factor ↓
↓
满足终止条件?
↓
Yes → 退出
No → 回到迭代
我们以拟合二次函数为例,给定数据点 ( x i , y i ) (x_i, y_i) (xi,yi),最小化以下残差:
r i ( a , b , c ) = y i − ( a x i 2 + b x i + c ) r_i(a, b, c) = y_i - (a x_i^2 + b x_i + c) ri(a,b,c)=yi−(axi2+bxi+c)
Jacobian:
J i = [ − x i 2 , − x i , − 1 ] J_i = \left[ -x_i^2, -x_i, -1 \right] Ji=[−xi2,−xi,−1]
#include
#include
#include
using namespace Eigen;
using namespace std;
struct DataPoint {
double x, y;
};
struct LMResult {
Vector3d params;
double final_cost;
int iterations;
};
LMResult LevenbergMarquardt(const vector<DataPoint>& data, Vector3d init, int max_iter = 100) {
Vector3d x = init;
double lambda = 1e-3;
double v = 2.0;
int n = data.size();
double last_cost = 1e20;
for (int iter = 0; iter < max_iter; ++iter) {
MatrixXd J(n, 3);
VectorXd r(n);
for (int i = 0; i < n; ++i) {
double xi = data[i].x;
double yi = data[i].y;
double yi_est = x(0) * xi * xi + x(1) * xi + x(2);
r(i) = yi - yi_est;
J(i, 0) = -xi * xi;
J(i, 1) = -xi;
J(i, 2) = -1.0;
}
Matrix3d H = J.transpose() * J;
Vector3d g = J.transpose() * r;
Matrix3d H_lm = H + lambda * H.diagonal().asDiagonal();
Vector3d dx = H_lm.ldlt().solve(-g);
Vector3d x_new = x + dx;
// compute new cost
double new_cost = 0.0;
for (int i = 0; i < n; ++i) {
double xi = data[i].x;
double yi = data[i].y;
double yi_est = x_new(0) * xi * xi + x_new(1) * xi + x_new(2);
double ri = yi - yi_est;
new_cost += ri * ri;
}
if (new_cost < last_cost) {
x = x_new;
lambda *= 0.8;
last_cost = new_cost;
} else {
lambda *= 2.0;
}
if (dx.norm() < 1e-6) break;
}
return {x, last_cost, max_iter};
}
int main() {
vector<DataPoint> data;
for (int i = 0; i <= 10; ++i) {
double x = i;
double y = 2.0 * x * x + 3.0 * x + 1.0 + ((rand() % 100) / 50.0 - 1.0); // 加噪声
data.push_back({x, y});
}
Vector3d init(0.0, 0.0, 0.0);
auto result = LevenbergMarquardt(data, init);
cout << "Estimated parameters: " << result.params.transpose() << endl;
cout << "Final cost: " << result.final_cost << endl;
return 0;
}
方法 | 特点 |
---|---|
梯度下降法 | 收敛稳定但慢 |
高斯-牛顿法 | 快速但易发散 |
Levenberg-Marquardt | 二者结合,自动调节,收敛稳定 |
阻尼初始化值 λ \lambda λ:设置为初始 Hessian 的最大对角元素的某个比例(如 λ = τ ⋅ max ( diag ( J T J ) ) \lambda = \tau \cdot \max(\text{diag}(J^T J)) λ=τ⋅max(diag(JTJ)))
梯度与步长判断条件: