当前位置：首页 > news >正文

如何做表征对齐？

news 2025/7/1 14:40:21

一、点对点软对齐（Micro-level）

核心思路：

使用相似度函数（如 cosine similarity）对 source 和 target 分布中的样本建立“点对点”匹配，通过 soft matching 实现特征对齐。

方法：

1. Token-wise Similarity Soft Alignment（REPA）

给定源域特征分布 $F^{(s)} = \{f^{(s)}_1, \dots, f^{(s)}_N\}$ 与目标域特征分布 $F^{(t)} = \{f^{(t)}_1, \dots, f^{(t)}_N\}$ ，每个 token 特征 $f^{(s)}_n, f^{(t)}_n \in \mathbb{R}^d$ 。
使用投影函数 $h_\phi(\cdot)$ 将 target 特征映射到 source 的特征空间以实现对齐。
定义 token 级的 soft alignment 损失如下：

$\mathcal{L}_{\text{REPA}}(\phi) = - \mathbb{E}_{x \sim \mathcal{D}} \left[ \frac{1}{N} \sum_{n=1}^{N} \text{sim} \left( f^{(s)}_n,\; h_\phi(f^{(t)}_n) \right) \right]$

其中：

$f^{(s)}_n$ 表示第 $n$ 个源域 token 特征；
$f^{(t)}_n$ 表示第 $n$ 个目标域 token 特征；
$h_\phi$ 为可学习的特征对齐函数；
$\text{sim}(\cdot, \cdot)$ 为相似度函数（如 cosine similarity）；
$\mathcal{D}$ 表示样本采样分布（如训练数据、时间步等）。

作用：该方法实现源域与目标域在 token 层级的“一对一”特征对齐，适用于同构分布或 token 数量一致的情况。

变种： Marginal Cosine Similarity Loss ( $\mathcal{L}_{\text{mcos}}$ )

对于 flatten 后的特征 $x^s, x^t \in \mathbb{R}^{N \times d}$ （ $\times w$ ）：
计算每个位置 $(i, j)$ 处的余弦相似度并加 margin：
$\mathcal{L}_{\text{mcos}} = \frac{1}{h \times w} \sum_{i=1}^h \sum_{j=1}^w \text{ReLU}\left(1 - m_1 - \frac{x^s_{ij} \cdot x^t_{ij}}{\|x^s_{ij}\| \|x^t_{ij}\|} \right)$
作用：只惩罚相似度低于 margin( $m_1$ ) 的点对，使低相似度点对更加对齐。

2. Soft Nearest Neighbor Matching

构建 source 到 target 的相似度矩阵 $S_{ij} = \text{sim}(x_i^s, x_j^t)$ 。
使用 softmax 对相似度矩阵按行归一化，构建 soft correspondence。
对应的损失函数可表示为：

$\mathcal{L}_{ptp} = \sum_{i} \text{KL}(\text{SoftSim}(x_i^s, X^t) \,\|\, \text{SoftSim}(x_i^t, X^s))$

其中：

$\text{SoftSim}(x_i^s, X^t) = \text{softmax}(\text{sim}(x_i^s, X^t))$
$\text{sim}$ 可为 cosine similarity 或 dot-product similarity

3. Contrastive / Triplet Loss

用于拉近相似点对，拉远不相似点对，适用于有监督或伪标签构造下的无监督场景。
Contrastive loss:

$\mathcal{L}_{\text{contrastive}} = y \cdot D^2 + (1 - y) \cdot \max(0, m - D)^2$

Triplet loss:

$\mathcal{L}_{\text{triplet}} = \max(0, D(x^s, x^{t+}) - D(x^s, x^{t-}) + m)$

二、结构一致性对齐（Macro-level）

核心思路：

对 source 和 target 特征内部的结构进行建模（如相似度图、manifold 结构），保持两者的一致性，从而实现“分布结构”的对齐。

方法：

1. Manifold Similarity Alignment

分别构造 source 和 target 的特征相似度矩阵 $S^s$ 与 $S^t$
最小化它们的差异：

$\mathcal{L}_{structure} = \| S^s - S^t \|_F^2$

其中 $S_{ij}^s = \text{sim}(x_i^s, x_j^s)$ ， $S_{ij}^t = \text{sim}(x_i^t, x_j^t)$ ， $\|\cdot\|_F$ 表示 Frobenius 范数。

变种： Marginal Distance Matrix Similarity Loss ( $\mathcal{L}_{\text{mdms}}$ )

对于 flatten 后的特征 $x^s, x^t \in \mathbb{R}^{N \times d}$ （ $\times w$ ）：
对所有特征对 $(i, j)$ ，对比其余弦相似度的差异：
$\mathcal{L}_{\text{mdms}} = \frac{1}{N^2} \sum_{i=1}^N \sum_{j=1}^N \text{ReLU} \left( \left| \frac{x_i^s \cdot x_j^s}{\|x_i^s\| \|x_j^s\|} - \frac{x_i^t \cdot x_j^t}{\|x_i^t\| \|x_j^t\|} \right| - m_2 \right)$
作用：保持 $S o u rce$ 和 $T a r g e t$ 的内部结构（相对分布）一致，关注结构差异大于 margin( $m_2$ ) 的特征对。