当前位置：首页 > web >正文

不同类型的语义相似度损失函数（SentenceTransformerLoss）

web 2025/6/27 21:20:00

文章目录

不同输入类型的损失
输入类型：[(anchor, positive/negative, label 1/0)...]，label为1距离小、为0距离大
- ContrastiveLoss（对比损失）
- OnlineContrastiveLoss
输入类型：[(sentence1, label1), (sentence2, label2)...]，label相同则距离小
- BatchAllTripletLoss
- BatchHardSoftMarginTripletLoss
- BatchHardTripletLoss
输入类型：[(sentence1, sentence2, score), ...], 拟合sentence pair的score（大于0小于1）
- CosineSimilarityLoss（相似度回归）
- CoSENTLoss（相似度回归和排序任务）
输入类型：[(sentence1, sentence2, label), ...], 多分类sentence pair
- SoftmaxLoss
输入类型：[(anchor, positive, negative), ...], 三元组样本对输入
- TripletLoss
- MultipleNegativesRankingLoss / InfoNCELoss
- CachedMultipleNegativesRankingLoss
输入类型：[(anchor, positive), ...], 仅正样本对输入
输入类型：[sentence1, sentence2, ...]，无标签输入

不同输入类型的损失

根据任务、数据类型选择合适的损失，详见这里。
在这里插入图片描述

输入类型：[(anchor, positive/negative, label 1/0)…]，label为1距离小、为0距离大

ContrastiveLoss（对比损失）

对于样本对A和B：

正样本对（类别为1），它们之间的距离应尽可能近；
负样本对（类别为0），它们之间的距离应尽可能远，只惩罚距离小于margin的负样本对，距离超过阈值时不再惩罚；

distance_metric默认为余弦距离，margin默认为0.5，loss为d^2(a,p) + max(margin - d^2(a,n), 0)。

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]assert len(reps) == 2rep_anchor, rep_other = repsdistances = self.distance_metric(rep_anchor, rep_other)losses = 0.5 * (labels.float() * distances.pow(2) + (1 - labels).float() * F.relu(self.margin - distances).pow(2))return losses.mean() if self.size_average else losses.sum()

OnlineContrastiveLoss

与ContrastiveLoss基本相同，该loss仅选择批次内困难样本计算损失，通常效果比对比损失更优。

损失：选择距离小于最大正样本对距离的负样本，选择距离大于最小负样本对距离的正样本。忽略负样本对最小距离与正样本对最大距离的差超过阈值的easy实例。

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor, size_average=False) -> Tensor:embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]distance_matrix = self.distance_metric(embeddings[0], embeddings[1])negs = distance_matrix[labels == 0]poss = distance_matrix[labels == 1]# select hard positive and hard negative pairsnegative_pairs = negs[negs < (poss.max() if len(poss) > 1 else negs.mean())]positive_pairs = poss[poss > (negs.min() if len(negs) > 1 else poss.mean())]positive_loss = positive_pairs.pow(2).sum()negative_loss = F.relu(self.margin - negative_pairs).pow(2).sum()loss = positive_loss + negative_lossreturn loss

输入类型：[(sentence1, label1), (sentence2, label2)…]，label相同则距离小

BatchAllTripletLoss

损失度量：

批次内具有相同标签的句子属于同一类，距离应近；
批次内具有不同标签的句子属于不同类，距离应远；
对于任意锚点样本，其与具有相同标签样本的距离应小于与其具有不同标签样本的距离；

比如对于四个样本[(a, label1), (b, label1), (c, label2), (d, label2)]，则pairwise_dist为[[aa, ab, ac, ad], ..., [da, db, dc, dd]]。若a作为锚点，ab正样本对距离，ac为负样本对距离，loss中的其中一项为ab-ac+margin。

正样本对距离越大，负样本对距离越小，则损失越大。忽略距离差大于margin的正负样本对，即ab-ac+margin<0，这种样本对容易区分，对损失影响不大。

def batch_all_triplet_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:# Get the pairwise distance matrixpairwise_dist = self.distance_metric(embeddings)anchor_positive_dist = pairwise_dist.unsqueeze(2)anchor_negative_dist = pairwise_dist.unsqueeze(1)# Compute a 3D tensor of size (batch_size, batch_size, batch_size)# triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k# Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)# and the 2nd (batch_size, 1, batch_size)triplet_loss = anchor_positive_dist - anchor_negative_dist + self.triplet_margin# Put to zero the invalid triplets# (where label(a) != label(p) or label(n) == label(a) or a == p)mask = BatchHardTripletLoss.get_triplet_mask(labels)triplet_loss = mask.float() * triplet_loss# Remove negative losses (i.e. the easy triplets)triplet_loss[triplet_loss < 0] = 0# Count number of positive triplets (where triplet_loss > 0)valid_triplets = triplet_loss[triplet_loss > 1e-16]num_positive_triplets = valid_triplets.size(0)# num_valid_triplets = mask.sum()# fraction_positive_triplets = num_positive_triplets / (num_valid_triplets.float() + 1e-16)# Get final mean triplet loss over the positive valid tripletstriplet_loss = triplet_loss.sum() / (num_positive_triplets + 1e-16)return triplet_loss

BatchHardSoftMarginTripletLoss

批次内任一锚点，与相同标签样本的最大距离也要比与不同标签的最小距离更近，同类样本即使远也要比非同类样本的距离近。

使用软间隔，loss=log(1 + exp(d(a, p) - d(a, n)))。正负样本对距离相近时，损失变化速率最快，易优化；正样本对距离远小于负样本距离时，损失趋于0。

def batch_hard_triplet_soft_margin_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:# Get the pairwise distance matrixpairwise_dist = self.distance_metric(embeddings)# For each anchor, get the hardest positive# First, we need to get a mask for every valid positive (they should have same label)mask_anchor_positive = BatchHardTripletLoss.get_anchor_positive_triplet_mask(labels).float()# We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))anchor_positive_dist = mask_anchor_positive * pairwise_dist# shape (batch_size, 1)hardest_positive_dist, _ = anchor_positive_dist.max(1, keepdim=True)# For each anchor, get the hardest negative# First, we need to get a mask for every valid negative (they should have different labels)mask_anchor_negative = BatchHardTripletLoss.get_anchor_negative_triplet_mask(labels).float()# We add the maximum value in each row to the invalid negatives (label(a) == label(n))max_anchor_negative_dist, _ = pairwise_dist.max(1, keepdim=True)anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)# shape (batch_size,)hardest_negative_dist, _ = anchor_negative_dist.min(1, keepdim=True)# Combine biggest d(a, p) and smallest d(a, n) into final triplet loss with soft margin# tl = hardest_positive_dist - hardest_negative_dist + margin# tl[tl < 0] = 0tl = torch.log1p(torch.exp(hardest_positive_dist - hardest_negative_dist))triplet_loss = tl.mean()return triplet_loss

BatchHardTripletLoss

与BatchHardSoftMarginTripletLoss不同的是，手动设置间隔，loss = d(a, p) - d(a, n) + margin，令loss[loss < 0] = 0，忽略正负距离相差超过阈值的样本对。

输入类型：[(sentence1, sentence2, score), …], 拟合sentence pair的score（大于0小于1）

CosineSimilarityLoss（相似度回归）

计算样本对之间的余弦相似分数，和标签分数做MSE损失。cos_score_transformation默认不执行任何操作，loss_fct默认为MSE损失。

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))return self.loss_fct(output, labels.float().view(-1))

CoSENTLoss（相似度回归和排序任务）

Cosine Sentence Loss，远离参考科学空间——CoSENT（一）：比Sentence-BERT更有效的句向量方案。

损失：对于句对(i, j)和(k,l)，若标签label[i,j] < label[k,l]，则期望模型预测的相似度scores[i,j] < scores[k,l]。损失定义为loss=log(1 + exp(s[i,j] - s[k,l]) + exp...)，即期望(i,j)的相似分数小于(k,l)！

相似分数度量：余弦相似分数score，1表示相似，0表示不相似。这里不是距离是相似分数，训练完成后，不同向量之间的 余弦距离表示语义相似度，适用于句子相似度回归和排序任务。

比如batch内3对样本编号1,2和3，真值labels为(0.1, 0.7, 0.9)，则样本对(1, 2), (1, 3), (2, 3)参与计算loss。如果预测scores为(0.3, 0.4, 0.2)，差值分数为(-0.1, 0.1, 0.2)，差值分数为正，则损失更大！

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]scores = self.similarity_fct(embeddings[0], embeddings[1])scores = scores * self.scalescores = scores[:, None] - scores[None, :]# label matrix indicating which pairs are relevantlabels = labels[:, None] < labels[None, :]labels = labels.float()# mask out irrelevant pairs so they are negligible after exp()scores = scores - (1 - labels) * 1e12# append a zero as e^0 = 1scores = torch.cat((torch.zeros(1).to(scores.device), scores.view(-1)), dim=0)loss = torch.logsumexp(scores, dim=0)return loss

输入类型：[(sentence1, sentence2, label), …], 多分类sentence pair

SoftmaxLoss

孪生网络，文本对多分类。

model = SentenceTransformer("microsoft/mpnet-base")
train_dataset = Dataset.from_dict({"sentence1": ["A person on a horse jumps over a broken down airplane.","A person on a horse jumps over a broken down airplane.","A person on a horse jumps over a broken down airplane.","Children smiling and waving at camera",],"sentence2": ["A person is training his horse for a competition.","A person is at a diner, ordering an omelette.","A person is outdoors, on a horse.","There are children present.",],"label": [1, 2, 0, 0],
})
loss = losses.SoftmaxLoss(model, model.get_sentence_embedding_dimension(), num_labels=3)

 def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor | tuple[Tensor, Tensor]:reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]rep_a, rep_b = repsvectors_concat = []if self.concatenation_sent_rep:vectors_concat.append(rep_a)vectors_concat.append(rep_b)if self.concatenation_sent_difference:vectors_concat.append(torch.abs(rep_a - rep_b))if self.concatenation_sent_multiplication:vectors_concat.append(rep_a * rep_b)features = torch.cat(vectors_concat, 1)output = self.classifier(features)if labels is not None:loss = self.loss_fct(output, labels.view(-1))return losselse:return reps, output

输入类型：[(anchor, positive, negative), …], 三元组样本对输入

TripletLoss

锚点与正负样本之间的距离要大于margin，也就是说，惩罚dis(anchor,neg) - dis(anhor,pos)<margin的三元组。默认distance_metric为欧式距离，margin为5。

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]rep_anchor, rep_pos, rep_neg = repsdistance_pos = self.distance_metric(rep_anchor, rep_pos)distance_neg = self.distance_metric(rep_anchor, rep_neg)losses = F.relu(distance_pos - distance_neg + self.triplet_margin)return losses.mean()

MultipleNegativesRankingLoss / InfoNCELoss

任意锚点样本，包含一条正样本和多条负样本。计算锚点和正、负样本之间的相似度，使用softmax多分类。增加锚点与正样本之间的相似度，降低锚点与负样本之间的相似度。

等价于InfoNCE loss，在softmax之间对score进行温度缩放。MultipleNegativesRankingLoss里面就是scale参数，scale=1就是标签的交叉熵损失。

def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:# Compute the embeddings and distribute them to anchor and candidates (positive and optionally negatives)embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]anchors = embeddings[0]  # (batch_size, embedding_dim)candidates = torch.cat(embeddings[1:])  # (batch_size * (1 + num_negatives), embedding_dim)# For every anchor, we compute the similarity to all other candidates (positives and negatives),# also from other anchors. This gives us a lot of in-batch negatives.scores = self.similarity_fct(anchors, candidates) * self.scale# (batch_size, batch_size * (1 + num_negatives))# anchor[i] should be most similar to candidates[i], as that is the paired positive,# so the label for anchor[i] is irange_labels = torch.arange(0, scores.size(0), device=scores.device)return self.cross_entropy_loss(scores, range_labels)