DDPM(Denoising Diffusion Probabilistic Models)的公式推导

总结:DDPM通过最小化预测噪声的均方误差,使反向过程逐步去噪生成数据。核心推导在于通过变分推断将KL散度转换为噪声预测问题,大幅简化了训练目标。


1. 前向扩散过程


前向过程通过 \( T \) 步逐渐向数据 \( x_0 \) 添加高斯噪声,最终得到纯噪声 \( x_T \)。每步定义为:
\[
q(x_t | x_{t-1}) = \mathcal{N}\left(x_t; \sqrt{1-\beta_t}x_{t-1}, \beta_t \mathbf{I}\right)
\]
其中 \( \beta_t \) 是噪声调度参数。通过重参数化技巧,可直接从 \( x_0 \) 计算任意时间步 \( t \) 的 \( x_t \):
\[
x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1-\alpha_t} \epsilon_t \quad (\epsilon_t \sim \mathcal{N}(0, \mathbf{I}))
\]
累积 \( \alpha_t = 1-\beta_t \),定义 \( \bar{\alpha}_t = \prod_{i=1}^t \alpha_i \),则:
\[
q(x_t | x_0) = \mathcal{N}\left(x_t; \sqrt{\bar{\alpha}_t}x_0, (1-\bar{\alpha}_t)\mathbf{I}\right)
\]
即:
\[
x_t = \sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon \quad (\epsilon \sim \mathcal{N}(0, \mathbf{I}))
\]


2. 反向去噪过程


反向过程从噪声 \( x_T \) 逐步恢复数据 \( x_0 \),定义为一个马尔可夫链:
\[
p_\theta(x_{t-1}|x_t) = \mathcal{N}\left(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t)\right)
\]
在DDPM中,方差 \( \Sigma_\theta \) 固定为 \( \sigma_t^2\mathbf{I} \),通常取 \( \sigma_t^2 = \beta_t \) 或 \( \tilde{\beta}_t = \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t}\beta_t \)。


3. 变分下界(ELBO)推导


优化目标是最大化对数似然的下界:
\[
\mathbb{E}_{q(x_0)}[\log p_\theta(x_0)] \geq \text{ELBO} = \mathbb{E}_q\left[\log p_\theta(x_{0:T}) - \log q(x_{1:T}|x_0)\right]
\]
分解ELBO为:
\[
\text{ELBO} = \mathbb{E}_q\left[\log p(x_T) + \sum_{t=1}^T \log \frac{p_\theta(x_{t-1}|x_t)}{q(x_t|x_{t-1})}\right]
\]
进一步分解为各时间步的损失:
\[
\text{ELBO} = \mathbb{E}_q\left[\log p_\theta(x_0|x_1)\right] - \sum_{t=2}^T \mathbb{E}_q\left[D_\text{KL}(q(x_{t-1}|x_t, x_0) \Vert p_\theta(x_{t-1}|x_t))\right] - D_\text{KL}(q(x_T|x_0) \Vert p(x_T))
\]


4. 关键步骤:反向过程的后验分布


通过贝叶斯定理,前向过程的后验分布为:
\[
q(x_{t-1}|x_t, x_0) = \mathcal{N}\left(x_{t-1}; \tilde{\mu}_t(x_t, x_0), \tilde{\beta}_t\mathbf{I}\right)
\]
其中:
\[
\tilde{\mu}_t = \frac{\sqrt{\alpha_t}(1-\bar{\alpha}_{t-1})}{1-\bar{\alpha}_t}x_t + \frac{\sqrt{\bar{\alpha}_{t-1}}\beta_t}{1-\bar{\alpha}_t}x_0, \quad \tilde{\beta}_t = \frac{1-\bar{\alpha}_{t-1}}{1-\bar{\alpha}_t}\beta_t
\]
将 \( x_0 \) 替换为 \( x_t \) 的表达式 \( x_0 = \frac{x_t - \sqrt{1-\bar{\alpha}_t}\epsilon}{\sqrt{\bar{\alpha}_t}} \),得:
\[
\tilde{\mu}_t = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}}\epsilon\right)
\]


5. 参数化与损失函数


令神经网络 \( \epsilon_\theta \) 预测噪声 \( \epsilon \),反向过程的均值参数化为:
\[
\mu_\theta(x_t, t) = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{\beta_t}{\sqrt{1-\bar{\alpha}_t}}\epsilon_\theta(x_t, t)\right)
\]
KL散度项简化为:
\[
\mathbb{E}_t\left[\frac{\beta_t^2}{2\sigma_t^2\alpha_t(1-\bar{\alpha}_t)}\|\epsilon - \epsilon_\theta(x_t, t)\|^2\right]
\]
忽略权重系数后,损失函数为:
\[
\mathcal{L} = \mathbb{E}_{t,x_0,\epsilon}\left[\|\epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t}x_0 + \sqrt{1-\bar{\alpha}_t}\epsilon, t)\|^2\right]
\]


6. 最终训练目标


训练时,随机采样时间步 \( t \),计算预测噪声的均方误差:
\[
\boxed{\mathcal{L}_\text{simple} = \mathbb{E}_{t,x_0,\epsilon}\left[\|\epsilon - \epsilon_\theta(x_t, t)\|^2\right]}
\]

你可能感兴趣的:(机器学习,人工智能,深度学习,算法)