当前位置: 首页 > ai >正文

生成模型 | 扩散模型损失函数公式推导

接上文:生成模型 | 扩散模型公式推导,原始正文补充内容后过长了,网站需要切分。

损失函数

考虑到扩散模型生成图片的过程是一步一步的序列,所以生成真实样本x0x_0x0的概率可以写成所有中间转筒序列的联合概率的积分:
pθ(x0)=∫pθ(x0,x1,…,xT)dx1…dxTp_\theta(x_0) = \int p_\theta(x_0, x_1, \dots, x_T) dx_1 \dots dx_T pθ(x0)=pθ(x0,x1,,xT)dx1dxT

而训练模型的目标是让它生成的样本分布pθ(x)p_\theta(x)pθ(x)更符合训练集分布(q(x0)q(x_0)q(x0))。所以模型整体损失可以描述为:
L=Eq(x0)[−log⁡pθ(x0)]=−∫q(x0)log⁡pθ(x0)dx0=−∫q(x0)log⁡(∫pθ(x0:T)dx1:T)dx0(本身需要考虑所有噪声轨迹难以计算)=−∫q(x0)log⁡(∫q(x1:T∣x0)pθ(x0:T)q(x1:T∣x0)dx1:T)dx0(引入容易采样的前向分布来简化计算)≤−∫q(x0)(∫q(x1:T∣x0)log⁡pθ(x0:T)q(x1:T∣x0)dx1:T)dx0(∫q(x1:T∣x0)dx1:T=1)=−∫q(x0)q(x1:T∣x0)log⁡pθ(x0:T)q(x1:T∣x0)dx0:T=−∫q(x0:T)log⁡pθ(x0:T)q(x1:T∣x0)dx0:T=−Eq(x0:T)[log⁡pθ(x0:T)q(x1:T∣x0)]=Eq(x0:T)[log⁡q(x1:T∣x0)pθ(x0:T)]=Eq(x0:T)[log⁡q(x1:T∣x0)−log⁡pθ(x0:T)]=Eq(x0:T)[log⁡∏t=1Tq(xt∣xt−1)−log⁡[pθ(xT)∏t=1Tpθ(xt−1∣xt)]]=Eq(x0:T)[∑t=1Tlog⁡q(xt∣xt−1)−log⁡pθ(xT)−∑t=1Tlog⁡pθ(xt−1∣xt)](q(xt∣xt−1)=q(xt∣xt−1,x0)=q(xt,xt−1,x0)q(xt−1,x0)=q(xt−1∣xt,x0)q(xt∣x0)q(x0)q(xt−1x0)=q(xt−1∣xt,x0)q(xt∣x0)q(xt−1∣x0))=Eq(x0:T)[∑t=2Tlog⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)q(xt∣x0)q(xt−1∣x0)+log⁡q(x1∣x0)pθ(x0∣x1)−log⁡pθ(xT)]=Eq(x0:T)[∑t=2Tlog⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)+∑t=2Tlog⁡q(xt∣x0)q(xt−1∣x0)+log⁡q(x1∣x0)pθ(x0∣x1)−log⁡pθ(xT)]=Eq(x0:T)[∑t=2Tlog⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)+log⁡q(xT∣x0)q(x1∣x0)+log⁡q(x1∣x0)pθ(x0∣x1)−log⁡pθ(xT)]=Eq(x0:T)[∑t=2Tlog⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)+log⁡q(xT∣x0)pθ(xT)pθ(x0∣x1)]=Eq(x0:T)[log⁡q(xT∣x0)pθ(xT)+∑t=2Tlog⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)−log⁡pθ(x0∣x1)]=Eq(x0:T)[log⁡q(xT∣x0)pθ(xT)]+∑t=2TEq(x0:T)[log⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)]+Eq(x0:T)[−log⁡pθ(x0∣x1)]=Eq(x0)Eq(xT∣x0)[log⁡q(xT∣x0)pθ(xT)]+∑t=2TEq(xt,x0)Eq(xt−1∣xt,x0)[log⁡q(xt−1∣xt,x0)pθ(xt−1∣xt)]+Eq(x0,x1)[−log⁡pθ(x0∣x1)]=Eq(x0)[DKL(q(xT∣x0)∣pθ(xT))]+∑t=2TEq(xt,x0)[DKL(q(xt−1∣xt,x0)∣pθ(xt−1∣xt))]+Eq(x0,x1)[−log⁡pθ(x0∣x1)]\begin{aligned} L & = \mathbb{E}_{q(x_0)} \left[ - \log p_\theta(x_0) \right] \\ & = - \int q(x_0) \log p_\theta(x_0) dx_0 \\ & = - \int q(x_0) \log \left ( \int p_\theta(x_{0:T}) dx_{1:T} \right ) dx_0 \quad (本身需要考虑所有噪声轨迹难以计算) \\ & = - \int q(x_0) \log \left ( \int q(x_{1:T} | x_0) \frac{p_\theta(x_{0:T})}{q(x_{1:T} | x_0)} dx_{1:T} \right ) dx_0 \quad (引入容易采样的前向分布来简化计算) \\ & \le - \int q(x_0) \left( \int q(x_{1:T} | x_0) \log \frac{p_\theta(x_{0:T})}{q(x_{1:T} | x_0)} dx_{1:T} \right) dx_0 \quad (\int q(x_{1:T} | x_0) dx_{1:T} = 1) \\ & = - \int q(x_0) q(x_{1:T} | x_0) \log \frac{p_\theta(x_{0:T})}{q(x_{1:T} | x_0)} dx_{0:T} \\ & = - \int q(x_{0:T}) \log \frac{p_\theta(x_{0:T})}{q(x_{1:T} | x_0)} dx_{0:T} \\ & = - \mathbb{E}_{q(x_{0:T})} \left [ \log \frac{p_\theta(x_{0:T})}{q(x_{1:T} | x_0)} \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \log \frac{q(x_{1:T} | x_0)}{p_\theta(x_{0:T})} \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \log q(x_{1:T} | x_0) -\log p_\theta(x_{0:T}) \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \log \prod^{T}_{t=1} q(x_t | x_{t-1}) - \log [p_{\theta}(x_T) \prod^{T}_{t=1} p_\theta(x_{t-1} | x_t)] \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \sum^{T}_{t=1} \log q(x_t | x_{t-1}) - \log p_{\theta}(x_T) - \sum^{T}_{t=1} \log p_\theta(x_{t-1} | x_t) \right ] \quad \Big( q(x_t | x_{t-1}) = q(x_t | x_{t-1}, x_0) = \frac{q(x_t, x_{t-1}, x_0)}{q(x_{t-1}, x_0)} = \frac{q(x_{t-1} | x_t, x_0) q(x_t | x_0) q(x_0)}{q(x_{t-1} x_0)} = \frac{q(x_{t-1} | x_t, x_0) q(x_t | x_0)}{q(x_{t-1} | x_0)} \Big) \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \sum^{T}_{t=2} \log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} \frac{q(x_{t} | x_0) }{q(x_{t-1} | x_0)} + \log \frac{q(x_{1} | x_{0})}{p_\theta(x_{0} | x_1)} - \log p_{\theta}(x_T) \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \sum^{T}_{t=2} \log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} + \sum^{T}_{t=2} \log \frac{q(x_{t} | x_0) }{q(x_{t-1} | x_0)} + \log \frac{q(x_{1} | x_{0})}{p_\theta(x_{0} | x_1)} - \log p_{\theta}(x_T) \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \sum^{T}_{t=2} \log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} + \log \frac{q(x_{T} | x_0) }{q(x_{1} | x_{0})} + \log \frac{q(x_{1} | x_{0})}{p_\theta(x_{0} | x_1)} - \log p_{\theta}(x_T) \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \sum^{T}_{t=2} \log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} + \log \frac{q(x_{T} | x_0) }{p_{\theta}(x_T) p_\theta(x_{0} | x_1)} \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \left [ \log \frac{q(x_{T} | x_0) }{p_{\theta}(x_T)} + \sum^{T}_{t=2} \log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} - \log p_\theta(x_{0} | x_1) \right ] \\ & = \mathbb{E}_{q(x_{0:T})} \Big[\log \frac{q(x_{T} | x_0) }{p_{\theta}(x_T)} \Big] + \sum^{T}_{t=2} \mathbb{E}_{q(x_{0:T})} \Big[\log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} \Big] + \mathbb{E}_{q(x_{0:T})} \Big[- \log p_\theta(x_{0} | x_1) \Big] \\ & = \mathbb{E}_{q(x_{0})} \mathbb{E}_{q(x_T | x_{0})} \Big[\log \frac{q(x_{T} | x_0) }{p_{\theta}(x_T)}\Big] + \sum^{T}_{t=2} \mathbb{E}_{q(x_{t}, x_{0})} \mathbb{E}_{q(x_{t-1} | x_{t}, x_{0})} \Big[\log \frac{q(x_{t-1} | x_{t}, x_0)}{p_\theta(x_{t-1} | x_t)} \Big] + \mathbb{E}_{q(x_{0}, x_{1})} \Big[- \log p_\theta(x_{0} | x_1) \Big] \\ & = \mathbb{E}_{q(x_{0})} \Big[D_\text{KL}(q(x_{T} | x_0) \vert p_{\theta}(x_T))\Big] + \sum^{T}_{t=2} \mathbb{E}_{q(x_{t}, x_{0})} \Big[D_\text{KL}(q(x_{t-1} | x_{t}, x_0) \vert p_{\theta}(x_{t-1} | x_t))\Big] + \mathbb{E}_{q(x_{0}, x_{1})} \Big[- \log p_\theta(x_{0} | x_1)\Big] \end{aligned} L=Eq(x0)[logpθ(x0)]=q(x0)logpθ(x0)dx0=q(x0)log(pθ(x0:T)dx1:T)dx0(本身需要考虑所有噪声轨迹难以计算)=q(x0)log(q(x1:Tx0)q(x1:Tx0)pθ(x0:T)dx1:T)dx0(引入容易采样的前向分布来简化计算)q(x0)(q(x1:Tx0)logq(x1:Tx0)pθ(x0:T)dx1:T)dx0(q(x1:Tx0)dx1:T=1)=q(x0)q(x1:Tx0)logq(x1:Tx0)pθ(x0:T)dx0:T=q(x0:T)logq(x1:Tx0)pθ(x0:T)dx0:T=Eq(x0:T)[logq(x1:Tx0)pθ(x0:T)]=Eq(x0:T)[logpθ(x0:T)q(x1:Tx0)]=Eq(x0:T)[logq(x1:Tx0)logpθ(x0:T)]=Eq(x0:T)[logt=1Tq(xtxt1)log[pθ(xT)t=1Tpθ(xt1xt)]]=Eq(x0:T)[t=1Tlogq(xtxt1)logpθ(xT)t=1Tlogpθ(xt1xt)](q(xtxt1)=q(xtxt1,x0)=q(xt1,x0)q(xt,xt1,x0)=q(xt1x0)q(xt1xt,x0)q(xtx0)q(x0)=q(xt1x0)q(xt1xt,x0)q(xtx0))=Eq(x0:T)[t=2Tlogpθ(xt1xt)q(xt1xt,x0)q(xt1x0)q(xtx0)+logpθ(x0x1)q(x1x0)logpθ(xT)]=Eq(x0:T)[t=2Tlogpθ(xt1xt)q(xt1xt,x0)+t=2Tlogq(xt1x0)q(xtx0)+logpθ(x0x1)q(x1x0)logpθ(xT)]=Eq(x0:T)[t=2Tlogpθ(xt1xt)q(xt1xt,x0)+logq(x1x0)q(xTx0)+logpθ(x0x1)q(x1x0)logpθ(xT)]=Eq(x0:T)[t=2Tlogpθ(xt1xt)q(xt1xt,x0)+logpθ(xT)pθ(x0x1)q(xTx0)]=Eq(x0:T)[logpθ(xT)q(xTx0)+t=2Tlogpθ(xt1xt)q(xt1xt,x0)logpθ(x0x1)]=Eq(x0:T)[logpθ(xT)q(xTx0)]+t=2TEq(x0:T)[logpθ(xt1xt)q(xt1xt,x0)]+Eq(x0:T)[logpθ(x0x1)]=Eq(x0)Eq(xTx0)[logpθ(xT)q(xTx0)]+t=2TEq(xt,x0)Eq(xt1xt,x0)[logpθ(xt1xt)q(xt1xt,x0)]+Eq(x0,x1)[logpθ(x0x1)]=Eq(x0)[DKL(q(xTx0)pθ(xT))]+t=2TEq(xt,x0)[DKL(q(xt1xt,x0)pθ(xt1xt))]+Eq(x0,x1)[logpθ(x0x1)]

注意,这里用到Jensen 不等式:对于凹函数 (向上凸的) f(x)f(x)f(x)(例如这里的log⁡\loglog函数),有:
f(∑i=1nλixi)≥∑i=1nλif(xi),∑i=1nλi=1f(\sum_{i=1}^n \lambda_i x_i) \ge \sum_{i=1}^n \lambda_i f(x_i), \sum_{i=1}^n \lambda_i = 1 f(i=1nλixi)i=1nλif(xi),i=1nλi=1
即自变量线性组合对应的函数值大于自变量对应函数值的线性组合。

损失中的第一项当作LTL_TLT:由于qqq无学习参数且xTx_TxT为固定高斯噪声,该项当做常量可以忽略。

损失中的最后一项当作L0L_0L0,考虑到pθ(xt−1∣xt)≈q(xt−1∣xt)=N(μ~θ(xt,t),β~tI)p_\theta(x_{t-1}|x_t) \approx q(x_{t-1}|x_t) = \mathcal{N}(\tilde{\mu}_\theta(x_t, t), \tilde{\beta}_t \mathbf{I})pθ(xt1xt)q(xt1xt)=N(μ~θ(xt,t),β~tI),可以近似写成:
pθ(x0∣x1)=N(μ~θ(x1,1),β~1I)=1(2πβ1)d/2exp⁡(−12β1∥x0−μ~θ(x1,1)∥2)−log⁡pθ(x0∣x1)=d2log⁡(2πβ1)+12β1∥x0−μ~θ(x1,1)∥2∝∥x0−μ~θ(x1,1)∥2\begin{aligned} p_\theta(x_{0} | x_1) & = \mathcal{N}(\tilde{\mu}_\theta(x_1, 1), \tilde{\beta}_1 \mathbf{I}) \\ & = \frac{1}{(2 \pi \beta_1)^{d/2}} \exp(-\frac{1}{2 \beta_1} \|x_0 - \tilde{\mu}_\theta(x_1, 1)\|^2) \\ -\log p_\theta(x_{0} | x_1) & = \frac{d}{2} \log(2 \pi \beta_1) + \frac{1}{2 \beta_1} \|x_0 - \tilde{\mu}_\theta(x_1, 1)\|^2 \\ & \propto \|x_0 - \tilde{\mu}_\theta(x_1, 1)\|^2 \end{aligned} pθ(x0x1)logpθ(x0x1)=N(μ~θ(x1,1),β~1I)=(2πβ1)d/21exp(2β11x0μ~θ(x1,1)2)=2dlog(2πβ1)+2β11x0μ~θ(x1,1)2x0μ~θ(x1,1)2
很多实验表明即使不单独强调该项,生成样本质量仍然很好,尤其当TTT很大且中间项训练充分时。

损失中的其余项当作Lt−1L_{t-1}Lt1,这可以看做在拉近两个高斯分布q(xt−1∣xt,x0)=N(μ~t(xt,x0),β~tI)q(x_{t-1} | x_t, x_0) = \mathcal{N}(\tilde{\mu}_t(x_t, x_0), \tilde{\beta}_t \mathbf{I})q(xt1xt,x0)=N(μ~t(xt,x0),β~tI)pθ(xt−1∣xt)=N(μ~θ(xt,t),β~tI)p_\theta(x_{t-1} | x_t) = \mathcal{N}(\tilde{\mu}_\theta(x_t, t), \tilde{\beta}_t \mathbf{I})pθ(xt1xt)=N(μ~θ(xt,t),β~tI)之间的距离。

考虑到两个ddd维高斯分布,同时协方差矩阵各向同性,二者之间的KL散度基于如下通用公式计算https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Multivariate_normal_distributions:
DKL(p∥q)=12(dσp2σq2+∥μp−μq∥2σq2−d−dlog⁡σp2σq2)D_\text{KL}(p \Vert q) = \frac{1}{2} \left( d \frac{\sigma_p^2}{\sigma_q^2} + \frac{\| \mu_p - \mu_q \|^2}{\sigma_q^2} - d - d \log \frac{\sigma_p^2}{\sigma_q^2} \right) DKL(pq)=21(dσq2σp2+σq2μpμq2ddlogσq2σp2)

所以具体形式表示为(β~θ=β~t\tilde{\beta}_\theta = \tilde{\beta}_tβ~θ=β~t):
DKL(q(xt−1∣xt,x0)∥pθ(xt−1∣xt))=12(dβ~tβ~θ+∥μ~t−μ~θ(xt,t)∥2β~θ−d−dlog⁡β~tβ~θ)=∥μ~t−μ~θ(xt,t)∥22β~t=12β~t∥1αt(xt−βt1−αˉtϵˉt)−1αt(xt−βt1−αˉtϵˉθ(xt,t))∥2=βt22αt(1−αˉt)β~t∥ϵˉt−ϵˉθ(xt,t)∥2=βt22αt(1−αˉt)β~t∥ϵˉt−ϵˉθ(αˉtx0+1−αˉtϵˉt,t)∥2∝∥ϵˉt−ϵˉθ(αˉtx0+1−αˉtϵˉt,t)∥2\begin{aligned} D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_{t}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_{t})) & = \frac{1}{2} \left( d \frac{\tilde{\beta}_t}{\tilde{\beta}_\theta} + \frac{\| \tilde{\mu}_t - \tilde{\mu}_\theta(x_t, t) \|^2}{\tilde{\beta}_\theta} - d - d \log \frac{\tilde{\beta}_t}{\tilde{\beta}_\theta} \right) \\ & = \frac{\| \tilde{\mu}_t - \tilde{\mu}_\theta(x_t, t) \|^2}{2 \tilde{\beta}_t} \\ & = \frac{1}{2 \tilde{\beta}_t} \| \frac{1}{ \sqrt{\alpha_t}} (x_t - \frac{ \beta_{t} }{\sqrt{ 1 - \bar{\alpha}_t }} \bar{\epsilon}_t) - \frac{1}{ \sqrt{\alpha_t}} (x_t - \frac{ \beta_{t} }{\sqrt{ 1 - \bar{\alpha}_t }} \bar{\epsilon}_\theta(x_t, t)) \|^2 \\ & = \frac{\beta_t^2}{2 \alpha_t (1 - \bar{\alpha}_t) \tilde{\beta}_t} \| \bar{\epsilon}_t - \bar{\epsilon}_\theta(x_t, t) \|^2 \\ & = \frac{\beta_t^2}{2 \alpha_t (1 - \bar{\alpha}_t) \tilde{\beta}_t} \| \bar{\epsilon}_t - \bar{\epsilon}_\theta(\sqrt{\bar{\alpha}_t} x_0 + \sqrt{ 1 - \bar{\alpha}_t } \bar{\epsilon}_t, t) \|^2 \\ & \propto \| \bar{\epsilon}_t - \bar{\epsilon}_\theta(\sqrt{\bar{\alpha}_t} x_0 + \sqrt{ 1 - \bar{\alpha}_t } \bar{\epsilon}_t, t) \|^2 \\ \end{aligned} DKL(q(xt1xt,x0)pθ(xt1xt))=21(dβ~θβ~t+β~θμ~tμ~θ(xt,t)2ddlogβ~θβ~t)=2β~tμ~tμ~θ(xt,t)2=2β~t1αt1(xt1αˉtβtϵˉt)αt1(xt1αˉtβtϵˉθ(xt,t))2=2αt(1αˉt)β~tβt2ϵˉtϵˉθ(xt,t)2=2αt(1αˉt)β~tβt2ϵˉtϵˉθ(αˉtx0+1αˉtϵˉt,t)2ϵˉtϵˉθ(αˉtx0+1αˉtϵˉt,t)2

不同于前向过程那样直接可以一步获得目标值,这里需要从对应于时间步TTT的随机噪声xTx_TxT逐步迭代去噪来获得对应于时间步000的目标数据x0x_0x0

值得注意的是,如果在前面关于均值的推导中,不将x0x_0x0替换,那么这里的损失形式实际上可以写成如下对真值图像的逼近(与这里提到的另一种损失推导思路存在相关之处:https://spaces.ac.cn/archives/9164#去噪过程):
μ~t=αt(1−αˉt−1)1−αˉtxt+αˉt−1βt1−αˉtx0μ~θ=αt(1−αˉt−1)1−αˉtxt+αˉt−1βt1−αˉtfθ(xt,t)DKL(q(xt−1∣xt,x0)∥pθ(xt−1∣xt))=∥μ~t−μ~θ(xt,t)∥22β~t=12β~t∥αt(1−αˉt−1)1−αˉtxt+αˉt−1βt1−αˉtx0−αt(1−αˉt−1)1−αˉtxt−αˉt−1βt1−αˉtfθ(xt,t)∥2=αˉt−1βt22β~t(1−αˉt)2∥x0−fθ(xt,t)∥2∝∥x0−fθ(xt,t)∥2\begin{aligned} \tilde{\mu}_t & = \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_t + \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_{t} }{1 - \bar{\alpha}_{t}} x_{0} \\ \tilde{\mu}_\theta & = \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_t + \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_{t} }{1 - \bar{\alpha}_{t}} f_\theta(x_t, t) \\ D_\text{KL}(q(\mathbf{x}_{t-1} \vert \mathbf{x}_{t}, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} \vert\mathbf{x}_{t})) & = \frac{\| \tilde{\mu}_t - \tilde{\mu}_\theta(x_t, t) \|^2}{2 \tilde{\beta}_t} \\ & = \frac{1}{2 \tilde{\beta}_t} \| \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_t + \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_{t} }{1 - \bar{\alpha}_{t}} x_{0} - \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1-\bar{\alpha}_{t}} x_t - \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_{t} }{1 - \bar{\alpha}_{t}} f_\theta(x_t, t) \|^2 \\ & = \frac{\bar{\alpha}_{t-1} \beta_t^2}{2 \tilde{\beta}_t (1 - \bar{\alpha}_t)^2} \| x_0 - f_\theta(x_t, t) \|^2 \\ & \propto \| x_0 - f_\theta(x_t, t) \|^2 \\ \end{aligned} μ~tμ~θDKL(q(xt1xt,x0)pθ(xt1xt))=1αˉtαt(1αˉt1)xt+1αˉtαˉt1βtx0=1αˉtαt(1αˉt1)xt+1αˉtαˉt1βtfθ(xt,t)=2β~tμ~tμ~θ(xt,t)2=2β~t11αˉtαt(1αˉt1)xt+1αˉtαˉt1βtx01αˉtαt(1αˉt1)xt1αˉtαˉt1βtfθ(xt,t)2=2β~t(1αˉt)2αˉt1βt2x0fθ(xt,t)2x0fθ(xt,t)2

两个一个是估计噪声,一个是估计图像。DDPM中的实验发现,估计噪声的效果是更好的。同时原论文提到省略掉前面的系数,以及把时间t在训练中也随机采样,能带来更好的生成结果

参考资料

  • 由浅入深了解Diffusion Model - ewrfcas的文章 - 知乎:https://zhuanlan.zhihu.com/p/525106459
  • 保姆级教程,基于pytorch从零实现生成扩散模型DDPM https://zhuanlan.zhihu.com/p/617895786
  • 生成扩散模型漫谈(一):DDPM = 拆楼 + 建楼 https://spaces.ac.cn/archives/9119
http://www.xdnf.cn/news/18519.html

相关文章:

  • 复杂工况漏检率↓79%!陌讯多模态融合算法在智慧能源设备检测的落地实践
  • Python 版本与 package 版本兼容性检查方法
  • 【Linux系列】macOS(MacBook)上获取 MAC 地址
  • 内网穿透教程
  • React学习(十三)
  • Java 泛型 T、E、K、V、?、S、U、V
  • week4-[字符数组]字符统计
  • 详细介绍将 AList 搭建 WebDav 添加到 PotPlayer 专辑 的方法
  • 基于Python与Tkinter的校园点餐系统设计与实现
  • 单片机的输出模式推挽和开漏如何选择呢?
  • [新启航]白光干涉仪与激光干涉仪的区别及应用解析
  • 【typenum】 24 去除尾部零的特性(private.rs片段)
  • MERGE 语句在 Delta Lake 中的原子更新原理
  • nodejs 集成mongodb实现增删改查
  • Kubernetes相关问题集(四)
  • 什么是正态分布
  • B.30.01.1-Java并发编程及电商场景应用
  • Socket 编程预备
  • 软件测试从入门到精通:通用知识点+APP专项实战
  • 使用Screenpipe+本地大模型实现私人助手Agent
  • 某电器5G智慧工厂网络建设全解析
  • Linux学习:信号的保存
  • TypeReference 泛型的使用场景及具体使用流程
  • GEO优化服务商:AI时代数字经济的新引擎——解码行业发展与技术创新实践
  • 【Spring Boot】集成Redis超详细指南 Redis在Spring Boot中的应用场景
  • kubernetes-dashboard使用http不登录
  • 【卷积神经网络详解与实例】1——计算机中的图像原理
  • 卓伊凡的开源战略与PHP-SG16加密技术深度解析-sg加密技术详解-卓伊凡
  • pixijs基础学习
  • pyecharts可视化图表-map:从入门到精通