定义3.11.1 设 A H = A \boldsymbol{A}^{\mathrm{H}}=\boldsymbol{A} AH=A ,称实数
R ( X ) = X H A X X H X ( X ∈ C n , X ≠ 0 ) R(\boldsymbol{X})=\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}} \quad\left(\boldsymbol{X} \in C^n, \boldsymbol{X} \neq 0\right) R(X)=XHXXHAX(X∈Cn,X=0)
为 Hermite 矩阵 A \boldsymbol{A} A 的 Rayleigh 商。
由于 Hermite 矩阵 A \boldsymbol{A} A 的特征值全是实数,不妨设 A \boldsymbol{A} A 的 n n n 个特征值如下排列
λ 1 ⩽ λ 2 ⩽ ⋯ ⩽ λ n \lambda_1 \leqslant \lambda_2 \leqslant \cdots \leqslant \lambda_n λ1⩽λ2⩽⋯⩽λn
定理 3.11.1 Hermite 矩阵 A \boldsymbol{A} A 的 Rayleigh 商具有如下性质:
(1) R ( k X ) = R ( X ) ( k ∈ R ) R(k X)=R(X) \quad(k \in \mathbf{R}) R(kX)=R(X)(k∈R)
(2) λ 1 ⩽ R ( X ) ⩽ λ n \lambda_1 \leqslant R(X) \leqslant \lambda_n λ1⩽R(X)⩽λn
(3) min X ≠ 0 R ( X ) = λ 1 , max X ≠ 0 R ( X ) = λ n \min _{X \neq 0} R(X)=\lambda_1, \quad \max _{X \neq 0} R(X)=\lambda_n minX=0R(X)=λ1,maxX=0R(X)=λn
证明(1)由定义3.11.1 可得。
(2)矩阵 A \boldsymbol{A} A 可以酉对角化,即
U H A U = diag ( λ 1 , λ 2 , ⋯ , λ n ) = Λ \boldsymbol{U}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{U}=\operatorname{diag}\left(\lambda_1, \lambda_2, \cdots, \lambda_n\right)=\Lambda UHAU=diag(λ1,λ2,⋯,λn)=Λ
命 X = U Y \boldsymbol{X}=\boldsymbol{U} \boldsymbol{Y} X=UY ,则
R ( X ) = Y H U H A U Y Y H Y = Y H Λ Y Y H Y = λ 1 y 1 y ˉ 1 + λ 2 y 2 y ˉ 2 + ⋯ + λ n y n y ˉ n Y H Y \begin{aligned} R(\boldsymbol{X}) & =\frac{\boldsymbol{Y}^{\mathrm{H}} \boldsymbol{U}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{U} \boldsymbol{Y}}{\boldsymbol{Y}^{\mathrm{H}} \boldsymbol{Y}}=\frac{\boldsymbol{Y}^{\mathrm{H}} \boldsymbol{\Lambda} \boldsymbol{Y}}{\boldsymbol{Y}^{\mathrm{H}} \boldsymbol{Y}} \\ & =\frac{\lambda_1 y_1 \bar{y}_1+\lambda_2 y_2 \bar{y}_2+\cdots+\lambda_n y_n \bar{y}_n}{\boldsymbol{Y}^{\mathrm{H}} \boldsymbol{Y}} \end{aligned} R(X)=YHYYHUHAUY=YHYYHΛY=YHYλ1y1yˉ1+λ2y2yˉ2+⋯+λnynyˉn
因为
λ 1 ( y 1 y ˉ 1 + ⋯ + y n y ˉ n ) ⩽ λ 1 y 1 y ˉ 1 + ⋯ + λ n y n y ˉ n ⩽ λ n ( y 1 y ˉ 1 + ⋯ + y n y ˉ n ) \begin{aligned} \lambda_1\left(y_1 \bar{y}_1+\cdots+y_n \bar{y}_n\right) & \leqslant \lambda_1 y_1 \bar{y}_1+\cdots+\lambda_n y_n \bar{y}_n \\ & \leqslant \lambda_n\left(y_1 \bar{y}_1+\cdots+y_n \bar{y}_n\right) \end{aligned} λ1(y1yˉ1+⋯+ynyˉn)⩽λ1y1yˉ1+⋯+λnynyˉn⩽λn(y1yˉ1+⋯+ynyˉn)
即
λ 1 Y H Y ⩽ Y H Λ Y ⩽ λ n Y H Y \lambda_1 \boldsymbol{Y}^{\mathrm{H}} \boldsymbol{Y} \leqslant \boldsymbol{Y}^{\mathrm{H}} \boldsymbol{\Lambda} \boldsymbol{Y} \leqslant \lambda_n \boldsymbol{Y}^{\mathrm{H}} \boldsymbol{Y} λ1YHY⩽YHΛY⩽λnYHY
于是
λ 1 ⩽ R ( X ) ⩽ λ n \lambda_1 \leqslant R(X) \leqslant \lambda_n λ1⩽R(X)⩽λn
(3)对于(2)中的每一个 U \boldsymbol{U} U 适当选取 X \boldsymbol{X} X ,使得 y 2 = y 3 = ⋯ = y n = 0 y_2=y_3=\cdots=y_n=0 y2=y3=⋯=yn=0 ,便得
R ( X ) = λ 1 R(\boldsymbol{X})=\lambda_1 R(X)=λ1
类似地,适当选取 X X X ,使得 y 1 = y 2 = ⋯ = y n − 1 = 0 y_1=y_2=\cdots=y_{n-1}=0 y1=y2=⋯=yn−1=0 ,便得
R ( X ) = λ n R(X)=\lambda_n R(X)=λn
综合之,便得
min X ≠ 0 R ( X ) = λ 1 , max X ≠ 0 R ( X ) = λ n \min _{X \neq 0} R(X)=\lambda_1, \quad \max _{X \neq 0} R(X)=\lambda_n X=0minR(X)=λ1,X=0maxR(X)=λn
定理 3.11.2 设 X 1 , X 2 , ⋯ , X k − 1 \boldsymbol{X}_1, \boldsymbol{X}_2, \cdots, \boldsymbol{X}_{k-1} X1,X2,⋯,Xk−1 是 Hermite 矩阵 A \boldsymbol{A} A 的分别属于特征值 λ 1 \lambda_1 λ1 , λ 2 , ⋯ , λ k − 1 \lambda_2, \cdots, \lambda_{k-1} λ2,⋯,λk−1 的特征向量, R k R_k Rk 是子空间 span ( X 1 , X 2 , ⋯ , X k − 1 ) \operatorname{span}\left(X_1, X_2, \cdots, X_{k-1}\right) span(X1,X2,⋯,Xk−1) 的正交补子空间,则
λ k = min X ∈ R k R ( X ) \lambda_k=\min _{\boldsymbol{X} \in R_k} R(\boldsymbol{X}) λk=X∈RkminR(X)
证明 不妨设 X 1 , X 2 , ⋯ , X k − 1 , X k , ⋯ , X n \boldsymbol{X}_1, \boldsymbol{X}_2, \cdots, \boldsymbol{X}_{k-1}, \boldsymbol{X}_k, \cdots, \boldsymbol{X}_n X1,X2,⋯,Xk−1,Xk,⋯,Xn 为 A \boldsymbol{A} A 的 n n n 个标准正交的特征向量组.显然
R k = span ( X k , X k + 1 , ⋯ , X n ) R_k=\operatorname{span}\left(X_k, X_{k+1}, \cdots, X_n\right) Rk=span(Xk,Xk+1,⋯,Xn)
对于任意 n n n 维向量 X \boldsymbol{X} X ,均有
X = C 1 X 1 + C 2 X 2 + ⋯ + C n X n X=C_1 X_1+C_2 X_2+\cdots+C_n X_n X=C1X1+C2X2+⋯+CnXn
于是
R ( X ) = X H A X X H X = ( C 1 X 1 + C 2 X 2 + ⋯ + C n X n ) H A ( C 1 X 1 + C 2 X 2 + ⋯ + C n X n ) ( C 1 X 1 + C 2 X 2 + ⋯ + C n X n ) H ( C 1 X 1 + C 2 X 2 + ⋯ + C n X n ) = λ 1 C ˉ 1 C 1 + λ 2 C ˉ 2 C 2 + ⋯ + λ n C ˉ n C n C 1 C ˉ 1 + C 2 C ˉ 2 + ⋯ + C n C ˉ n = λ 1 a 1 + λ 2 a 2 + ⋯ + λ n a n \begin{aligned} R(X) & =\frac{X^{\mathrm{H}} A X}{X^{\mathrm{H}} X} \\ & =\frac{\left(C_1 X_1+C_2 X_2+\cdots+C_n X_n\right)^{\mathrm{H}} A\left(C_1 X_1+C_2 X_2+\cdots+C_n X_n\right)}{\left(C_1 X_1+C_2 X_2+\cdots+C_n X_n\right)^{\mathrm{H}}\left(C_1 X_1+C_2 X_2+\cdots+C_n X_n\right)} \\ & =\frac{\lambda_1 \bar{C}_1 C_1+\lambda_2 \bar{C}_2 C_2+\cdots+\lambda_n \bar{C}_n C_n}{C_1 \bar{C}_1+C_2 \bar{C}_2+\cdots+C_n \bar{C}_n} \\ & =\lambda_1 a_1+\lambda_2 a_2+\cdots+\lambda_n a_n \end{aligned} R(X)=XHXXHAX=(C1X1+C2X2+⋯+CnXn)H(C1X1+C2X2+⋯+CnXn)(C1X1+C2X2+⋯+CnXn)HA(C1X1+C2X2+⋯+CnXn)=C1Cˉ1+C2Cˉ2+⋯+CnCˉnλ1Cˉ1C1+λ2Cˉ2C2+⋯+λnCˉnCn=λ1a1+λ2a2+⋯+λnan
其中
a i = C ˉ i C i C ˉ 1 C 1 + C ˉ 2 C 2 + ⋯ + C ˉ n C n ⩾ 0 , 且 ∑ i = 1 n a i = 1 a_i=\frac{\bar{C}_i C_i}{\bar{C}_1 C_1+\bar{C}_2 C_2+\cdots+\bar{C}_n C_n} \geqslant 0, \text { 且 } \sum_{i=1}^n a_i=1 ai=Cˉ1C1+Cˉ2C2+⋯+CˉnCnCˉiCi⩾0, 且 i=1∑nai=1
当 k = 1 k=1 k=1 时, R 1 = C n R_1=C^n R1=Cn .此即定理 3.11.1.
当 k = 2 k=2 k=2 时, X ∈ R 2 X \in R_2 X∈R2 ,这时 C 1 = 0 C_1=0 C1=0 ,故
X = C 2 X 2 + C 3 X 3 + ⋯ + C n X n R ( X ) = λ 2 a 2 + λ 3 a 3 + ⋯ + λ n a n . λ 2 = min X ∈ R 2 R ( X ) \begin{gathered} \boldsymbol{X}=C_2 \boldsymbol{X}_2+C_3 X_3+\cdots+C_n X_n \\ R(X)=\lambda_2 a_2+\lambda_3 a_3+\cdots+\lambda_n a_n . \\ \lambda_2=\min _{\boldsymbol{X} \in R_2} R(X) \end{gathered} X=C2X2+C3X3+⋯+CnXnR(X)=λ2a2+λ3a3+⋯+λnan.λ2=X∈R2minR(X)
于是
其余类推。
类似地还可以证明:
定理 3.11.3 设 X ∈ span ( X r , X r + 1 , ⋯ , X s ) , 1 ⩽ r < s ⩽ n \boldsymbol{X} \in \operatorname{span}\left(\boldsymbol{X}_r, \boldsymbol{X}_{r+1}, \cdots, \boldsymbol{X}_s\right), 1 \leqslant rX∈span(Xr,Xr+1,⋯,Xs),1⩽r<s⩽n ,则
min X ≠ 0 R ( X ) = λ r , max X ≠ 0 R ( X ) = λ s \min _{\boldsymbol{X} \neq 0} R(\boldsymbol{X})=\lambda_r, \quad \max _{\boldsymbol{X} \neq 0} R(\boldsymbol{X})=\lambda_s X=0minR(X)=λr,X=0maxR(X)=λs
定理3.11.4 设 V k V_k Vk 是 n n n 维复向量空间中任意 k k k 维子空间,则有极小一极大原理
λ k = min V k max X ∈ V k R ( X ) \lambda_k=\min _{V_k} \max _{\boldsymbol{X} \in V_k} R(\boldsymbol{X}) λk=VkminX∈VkmaxR(X)
或极大一极小原理
λ k = max V n − k + 1 min X ∈ V n − k + 1 R ( X ) \lambda_k=\max _{V_{n-k+1}} \min _{X \in V_{n-k+1}} R(X) λk=Vn−k+1maxX∈Vn−k+1minR(X)
证明 k − 1 k-1 k−1 维子空间 span ( X 1 , X 2 , ⋯ , X k − 1 ) \operatorname{span}\left(\boldsymbol{X}_1, \boldsymbol{X}_2, \cdots, \boldsymbol{X}_{k-1}\right) span(X1,X2,⋯,Xk−1) 的正交补子空间 R k R_k Rk 是 n − k + 1 n-k+1 n−k+1维,因此 V k V_k Vk 与 R k R_k Rk 必有公共的非零向量 Y k \boldsymbol{Y}_k Yk ,故
min X ∈ R k R ( X ) = λ k ⩽ R ( Y k ) \min _{\boldsymbol{X} \in R_k} R(\boldsymbol{X})=\lambda_k \leqslant R\left(\boldsymbol{Y}_k\right) X∈RkminR(X)=λk⩽R(Yk)
又因为 Y k ∈ V k \boldsymbol{Y}_k \in V_k Yk∈Vk ,故
R ( Y k ) ⩽ max X ∈ V k R ( X ) λ k ⩽ min V k max X ∈ V k R ( X ) \begin{aligned} & R\left(\boldsymbol{Y}_k\right) \leqslant \max _{\boldsymbol{X} \in V_k} R(\boldsymbol{X}) \\ & \lambda_k \leqslant \min _{V_k} \max _{\boldsymbol{X} \in V_k} R(\boldsymbol{X}) \end{aligned} R(Yk)⩽X∈VkmaxR(X)λk⩽VkminX∈VkmaxR(X)
因此
又由前面定理知
min V k max X ∈ V k R ( X ) ⩽ max X ∈ L ( X 1 , X 2 , ⋯ , X k ) R ( X ) = λ k \min _{V_k} \max _{\boldsymbol{X} \in V_k} R(\boldsymbol{X}) \leqslant \max _{\boldsymbol{X} \in L\left(X_1, X_2, \cdots, X_k\right)} R(\boldsymbol{X})=\lambda_k VkminX∈VkmaxR(X)⩽X∈L(X1,X2,⋯,Xk)maxR(X)=λk
综合两不等式可得
λ k = min V k max X ∈ V k R ( X ) \lambda_k=\min _{V_k} \max _{X \in V_k} R(\boldsymbol{X}) λk=VkminX∈VkmaxR(X)
令 B = − A \boldsymbol{B}=-\boldsymbol{A} B=−A ,则 B \boldsymbol{B} B 的特征值按递减顺序排列
μ 1 ⩾ μ 2 ⩾ ⋯ ⩾ μ n \mu_1 \geqslant \mu_2 \geqslant \cdots \geqslant \mu_n μ1⩾μ2⩾⋯⩾μn
其中 μ k = − λ n − k + 1 \mu_k=-\lambda_{n-k+1} μk=−λn−k+1 ,由刚才所证有
λ n − k + 1 = − μ k = − min V k max X ∈ V k X H B X X H X = − min V k { max X ∈ V k − X H A X X H X } = − min V k { − min X ∈ V k X H A X X H X } = max V k min X ∈ V k X H A X X H X = max V k min X ∈ V k R ( X ) \begin{aligned} \lambda_{n-k+1} & =-\mu_k=-\min _{V_k} \max _{\boldsymbol{X} \in V_k} \frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{B} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}} \\ & =-\min _{V_k}\left\{\max _{\boldsymbol{X} \in V_k} \frac{-\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}\right\} \\ & =-\min _{V_k}\left\{-\min _{\boldsymbol{X} \in V_k} \frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{X^{\mathrm{H}} \boldsymbol{X}}\right\}\\ &=\max _{V_k} \min _{\boldsymbol{X} \in V_k} \frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}=\max _{\boldsymbol{V}_k} \min _{\boldsymbol{X} \in \boldsymbol{V}_k} R(\boldsymbol{X})\\ \end{aligned} λn−k+1=−μk=−VkminX∈VkmaxXHXXHBX=−Vkmin{X∈VkmaxXHX−XHAX}=−Vkmin{−X∈VkminXHXXHAX}=VkmaxX∈VkminXHXXHAX=VkmaxX∈VkminR(X)
把 n − k + 1 n-k+1 n−k+1 用 i i i 代替上式得
λ i = max V n − i + 1 min X ∈ V n − i + 1 R ( X ) \lambda_i=\max _{V_{n-i+1}} \min _{X \in V_{n-i+1}} R(X) λi=Vn−i+1maxX∈Vn−i+1minR(X)
最后应用 Rayleigh 商研究 Hermite 矩阵特征值的摄动定理,即讨论矩阵的元素发生微小变化时对应矩阵特征值的变化范围。
定理3.11.5 设 A , B \boldsymbol{A}, \boldsymbol{B} A,B 是 Hermite 矩阵, λ i ( A ) , λ i ( B ) \lambda_i(\boldsymbol{A}), \lambda_i(\boldsymbol{B}) λi(A),λi(B) 与 λ i ( A + B ) \lambda_i(\boldsymbol{A}+\boldsymbol{B}) λi(A+B) 分别表示矩阵 A , B \boldsymbol{A}, \boldsymbol{B} A,B 与 A + B \boldsymbol{A}+\boldsymbol{B} A+B 的特征值,且特征值从小到大按递增顺序排列.则对于每一个 k k k ,有
λ k ( A ) + λ 1 ( B ) ⩽ λ k ( A + B ) ⩽ λ k ( A ) + λ n ( B ) \lambda_k(\boldsymbol{A})+\lambda_1(\boldsymbol{B}) \leqslant \lambda_k(\boldsymbol{A}+\boldsymbol{B}) \leqslant \lambda_k(\boldsymbol{A})+\lambda_n(\boldsymbol{B}) λk(A)+λ1(B)⩽λk(A+B)⩽λk(A)+λn(B)
证明 因为
λ k ( A + B ) = max V n − k + 1 min X ∈ V n − k + 1 X H ( A + B ) X X H X = max V n − k + 1 min X ∈ V n − k + 1 [ X H A X X H X + X H B X X H X ] ⩽ max V n − k + 1 min X ∈ V n − k + 1 [ X H A X X H X + λ n ( B ) ] = λ k ( A ) + λ n ( B ) λ k ( A + B ) = max V n − k + 1 min X ∈ V n − k + 1 [ X H A X X H X + X H B X X H X ] ⩾ max V n − k + 1 min X ∈ V n − k + 1 [ X H A X X H X + λ 1 ( B ) ] = λ k ( A ) + λ 1 ( B ) \begin{aligned} \lambda_k(\boldsymbol{A}+\boldsymbol{B})= & \max _{V_{n-k+1}} \min _{\boldsymbol{X} \in V_{n-k+1}} \frac{\boldsymbol{X}^{\mathrm{H}}(\boldsymbol{A}+\boldsymbol{B}) \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}} \\ = & \max _{V_{n-k+1}} \min _{\boldsymbol{X} \in V_{n-k+1}}\left[\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}+\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{B} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}\right] \leqslant \\ & \max _{V_{n-k+1}} \min _{\boldsymbol{X} \in V_{n-k+1}}\left[\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}+\lambda_n(\boldsymbol{B})\right] \\ = & \lambda_k(\boldsymbol{A})+\lambda_n(\boldsymbol{B}) \\ \lambda_k(\boldsymbol{A}+\boldsymbol{B})= & \max _{V_{n-k+1}} \min _{\boldsymbol{X} \in V_{n-k+1}}\left[\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}+\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{B} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}\right] \geqslant \\ & \max _{V_{n-k+1}} \min _{\boldsymbol{X} \in V_{n-k+1}}\left[\frac{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{A} \boldsymbol{X}}{\boldsymbol{X}^{\mathrm{H}} \boldsymbol{X}}+\lambda_1(\boldsymbol{B})\right] \\ = & \lambda_k(\boldsymbol{A})+\lambda_1(\boldsymbol{B}) \end{aligned} λk(A+B)===λk(A+B)==Vn−k+1maxX∈Vn−k+1minXHXXH(A+B)XVn−k+1maxX∈Vn−k+1min[XHXXHAX+XHXXHBX]⩽Vn−k+1maxX∈Vn−k+1min[XHXXHAX+λn(B)]λk(A)+λn(B)Vn−k+1maxX∈Vn−k+1min[XHXXHAX+XHXXHBX]⩾Vn−k+1maxX∈Vn−k+1min[XHXXHAX+λ1(B)]λk(A)+λ1(B)
例 3.11.1 设 A , B \boldsymbol{A}, \boldsymbol{B} A,B 是 Hermite 矩阵,且 B \boldsymbol{B} B 是半正定的,则
λ k ( A ) ⩽ λ k ( A + B ) \lambda_k(\boldsymbol{A}) \leqslant \lambda_k(\boldsymbol{A}+\boldsymbol{B}) λk(A)⩽λk(A+B)
解 因为
λ k ˙ ( A + B ) ⩾ λ k ( A ) + λ 1 ( B ) \lambda_{\dot{k}}(\boldsymbol{A}+\boldsymbol{B}) \geqslant \lambda_k(\boldsymbol{A})+\lambda_1(\boldsymbol{B}) λk˙(A+B)⩾λk(A)+λ1(B)
由于 B \boldsymbol{B} B 为半正定矩阵,所以 λ 1 ( B ) ⩾ 0 \lambda_1(\boldsymbol{B}) \geqslant 0 λ1(B)⩾0 .从而得到所需结论.