背景与目标
Gotoh 算法由 O. Gotoh 于1982年提出,是 Needleman–Wunsch 全局比对算法的一个改进,支持 仿射缺口惩罚(affine gap penalty):
gap_score ( k ) = − α − ( k − 1 ) β , \text{gap\_score}(k) = -\alpha - (k-1)\,\beta, gap_score(k)=−α−(k−1)β,
其中 α > 0 \alpha>0 α>0 是缺口打开惩罚(gap opening), β > 0 \beta>0 β>0 是缺口延伸惩罚(gap extension)。
核心目标:在允许对开新缺口和延续已有缺口的不同惩罚权重下,找到两条序列的最优全局比对。
应用场景
核心思路
令序列
A = a 1 ⋯ a m , B = b 1 ⋯ b n \mathbf{A}=a_1\cdots a_m,\quad \mathbf{B}=b_1\cdots b_n A=a1⋯am,B=b1⋯bn,匹配得分函数 s ( a i , b j ) s(a_i,b_j) s(ai,bj),仿射缺口参数 α \alpha α(打开)和 β \beta β(延伸)。
2.1 定义三个矩阵
2.2 递推公式
E [ i , j ] = max { E [ i , j − 1 ] − β , H [ i , j − 1 ] − ( α + β ) } , F [ i , j ] = max { F [ i − 1 , j ] − β , H [ i − 1 , j ] − ( α + β ) } , H [ i , j ] = max { H [ i − 1 , j − 1 ] + s ( a i , b j ) , E [ i , j ] , F [ i , j ] } . \begin{aligned} E[i,j] &= \max\bigl\{\,E[i,\,j-1]-\beta,\;H[i,\,j-1] -(\alpha+\beta)\bigr\},\\ F[i,j] &= \max\bigl\{\,F[i-1,\,j]-\beta,\;H[i-1,\,j] -(\alpha+\beta)\bigr\},\\ H[i,j] &= \max\bigl\{\,H[i-1,\,j-1] + s(a_i,b_j),\;E[i,j],\;F[i,j]\bigr\}. \end{aligned} E[i,j]F[i,j]H[i,j]=max{E[i,j−1]−β,H[i,j−1]−(α+β)},=max{F[i−1,j]−β,H[i−1,j]−(α+β)},=max{H[i−1,j−1]+s(ai,bj),E[i,j],F[i,j]}.
2.3 初始化
H [ 0 , 0 ] = 0 , E [ 0 , 0 ] = F [ 0 , 0 ] = − ∞ , ∀ j ≥ 1 : H [ 0 , j ] = − ( α + ( j − 1 ) β ) , E [ 0 , j ] = − ( α + ( j − 1 ) β ) , F [ 0 , j ] = − ∞ , ∀ i ≥ 1 : H [ i , 0 ] = − ( α + ( i − 1 ) β ) , F [ i , 0 ] = − ( α + ( i − 1 ) β ) , E [ i , 0 ] = − ∞ . \begin{aligned} H[0,0] &= 0, & E[0,0] = F[0,0] &= -\infty,\\ \forall\,j\ge1:\;& H[0,j] = -\bigl(\alpha+(j-1)\beta\bigr),\quad E[0,j] = -\bigl(\alpha+(j-1)\beta\bigr),\quad F[0,j] = -\infty,\\ \forall\,i\ge1:\;& H[i,0] = -\bigl(\alpha+(i-1)\beta\bigr),\quad F[i,0] = -\bigl(\alpha+(i-1)\beta\bigr),\quad E[i,0] = -\infty. \end{aligned} H[0,0]∀j≥1:∀i≥1:=0,H[0,j]=−(α+(j−1)β),E[0,j]=−(α+(j−1)β),F[0,j]=−∞,H[i,0]=−(α+(i−1)β),F[i,0]=−(α+(i−1)β),E[i,0]=−∞.E[0,0]=F[0,0]=−∞,
其中 “ − ∞ -\infty −∞” 表示该状态不合法。
# 输入
# A[1..m], B[1..n]: 待比对序列
# s(a,b): 匹配得分函数
# α: 缺口打开惩罚
# β: 缺口延伸惩罚
# 输出
# aligned_A, aligned_B: 全局比对结果
function Gotoh(A, B, s, α, β):
m ← length(A); n ← length(B)
# 1) 初始化矩阵 H, E, F 大小 (m+1)x(n+1)
for i in 0..m:
for j in 0..n:
H[i,j], E[i,j], F[i,j] ← -∞
H[0,0] ← 0
for j in 1..n:
H[0,j] ← - (α + (j-1)*β)
E[0,j] ← - (α + (j-1)*β)
for i in 1..m:
H[i,0] ← - (α + (i-1)*β)
F[i,0] ← - (α + (i-1)*β)
# 2) 递推填表
for i in 1..m:
for j in 1..n:
E[i,j] ← max(E[i, j-1] - β,
H[i, j-1] - (α + β))
F[i,j] ← max(F[i-1, j] - β,
H[i-1, j] - (α + β))
H[i,j] ← max(H[i-1, j-1] + s(A[i], B[j]),
E[i,j],
F[i,j])
# 3) 回溯还原对齐
aligned_A, aligned_B ← empty strings
i, j ← m, n
# 从矩阵 H、E、F 的最大值状态出发
state ← argmax_over({H[i,j], E[i,j], F[i,j]})
while i>0 or j>0:
if state == H:
if H[i,j] == H[i-1,j-1] + s(A[i],B[j]):
aligned_A.prepend(A[i]); aligned_B.prepend(B[j])
i, j ← i-1, j-1
state ← argmax_over({H[i,j], E[i,j], F[i,j]})
else if H[i,j] == E[i,j]:
state ← E
else:
state ← F
else if state == E:
aligned_A.prepend('-'); aligned_B.prepend(B[j])
j ← j-1
# E[i,j] 来源于 E (延伸) or H (新开)
state ← (E[i,j] + β == E[i, j-1]) ? E : H
else: # state == F
aligned_A.prepend(A[i]); aligned_B.prepend('-')
i ← i-1
state ← (F[i,j] + β == F[i-1, j]) ? F : H
return aligned_A, aligned_B