当前位置: 首页 > backend >正文

即插即用模块(1) -MAFM特征融合

(即插即用模块-特征处理部分) 一、(2024) MAFM&MCM 特征融合+特征解码

在这里插入图片描述

paper:MAGNet: Multi-scale Awareness and Global fusion Network for RGB-D salient object detection

1. 多尺度感知融合模块 (MAFM)

多尺度感知融合模块 (MAFM) 旨在高效融合 RGB 和深度模态的互补信息,充分利用 RGB 图像的丰富纹理细节和深度图像的空间结构特性,同时克服 RGB 对光照变化的敏感性以及深度图像细节不足的局限性。通过多尺度特征整合和非线性变换,MAFM 实现高效的特征融合,同时降低计算复杂度。

实现流程:
  1. 特征拼接:将 RGB 和深度特征图沿通道维度拼接,形成统一的多模态特征表示,保留各模态的独特信息。
  2. 深度可分离卷积 (DW 层):应用深度可分离卷积高效提取空间局部特征,随后进行批归一化 (BN) 以稳定训练,并通过 GELU 激活函数引入非线性,提升特征表达能力。
  3. 点卷积 (PW 层):通过点卷积优化通道间交互,再次应用 BN 和 GELU 激活,确保特征的有效 recalibration。
  4. 多头多尺度卷积 (MHMC):将融合特征输入 MHMC 模块,通过多尺度卷积捕捉不同尺度的上下文信息,进一步增强特征融合效果。
  5. 残差融合:通过残差结构和元素级求和,整合不同分支的特征图,保留全局和局部信息。
  6. 非线性变换:最终通过 GELU 激活函数进行非线性变换,生成融合特征图。

Multi-scale Awareness Fusion Module 结构图:
在这里插入图片描述

2. 多级卷积模块 (MCM)

多级卷积模块 (MCM) 旨在通过多尺度特征融合,逐步生成包含丰富细节的噪声目标预测图。MCM 采用残差结构,包含多个卷积块,通过整合不同尺度的特征图显著提升解码器的学习能力和泛化性能。

实现流程:
  1. 特征上采样与拼接:对高级特征图进行上采样,并与下一级特征图沿通道维度拼接,构建多尺度特征表示。
  2. 深度可分离卷积 (DW 层):使用深度可分离卷积提取空间特征,随后进行 BN 和 GELU 激活,以高效处理多尺度信息。
  3. 点卷积 (PW 层):通过点卷积优化通道间特征交互,再次应用 BN 和 GELU 激活,确保特征鲁棒性。
  4. 残差连接:将融合特征图与残差连接的结果进行元素级求和,生成最终输出,保留细节并增强稳定性。

Multi-level Convolution Module 结构图:

在这里插入图片描述

3、代码实现

import torch
import torch.nn as nn
import math
import torch.nn.functional as F
from timm.models.layers import trunc_normal_# Conv_One_Identity
class COI(nn.Module):def __init__(self, inc, k=3, p=1):super().__init__()self.outc = incself.dw = nn.Conv2d(inc, self.outc, kernel_size=k, padding=p, groups=inc)self.conv1_1 = nn.Conv2d(inc, self.outc, kernel_size=1, stride=1)self.bn1 = nn.BatchNorm2d(self.outc)self.bn2 = nn.BatchNorm2d(self.outc)self.bn3 = nn.BatchNorm2d(self.outc)self.act = nn.GELU()self.apply(self._init_weights)def forward(self, x):shortcut = self.bn1(x)x_dw = self.bn2(self.dw(x))x_conv1_1 = self.bn3(self.conv1_1(x))return self.act(shortcut + x_dw + x_conv1_1)def _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)elif isinstance(m, nn.Conv2d):fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsm.weight.data.normal_(0, math.sqrt(2.0 / fan_out))if m.bias is not None:m.bias.data.zero_()class MHMC(nn.Module):def __init__(self, dim, ca_num_heads=4, qkv_bias=True, proj_drop=0., ca_attention=1, expand_ratio=2):super().__init__()self.ca_attention = ca_attentionself.dim = dimself.ca_num_heads = ca_num_headsassert dim % ca_num_heads == 0, f"dim {dim} should be divided by num_heads {ca_num_heads}."self.act = nn.GELU()self.proj = nn.Linear(dim, dim)self.proj_drop = nn.Dropout(proj_drop)self.split_groups = self.dim // ca_num_headsself.v = nn.Linear(dim, dim, bias=qkv_bias)self.s = nn.Linear(dim, dim, bias=qkv_bias)for i in range(self.ca_num_heads):local_conv = nn.Conv2d(dim // self.ca_num_heads, dim // self.ca_num_heads, kernel_size=(3 + i * 2),padding=(1 + i), stride=1,groups=dim // self.ca_num_heads)  # kernel_size 3,5,7,9 大核dw卷积,padding 1,2,3,4setattr(self, f"local_conv_{i + 1}", local_conv)self.proj0 = nn.Conv2d(dim, dim * expand_ratio, kernel_size=1, padding=0, stride=1,groups=self.split_groups)self.bn = nn.BatchNorm2d(dim * expand_ratio)self.proj1 = nn.Conv2d(dim * expand_ratio, dim, kernel_size=1, padding=0, stride=1)self.apply(self._init_weights)def _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)elif isinstance(m, nn.Conv2d):fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsm.weight.data.normal_(0, math.sqrt(2.0 / fan_out))if m.bias is not None:m.bias.data.zero_()def forward(self, x, H, W):B, N, C = x.shapev = self.v(x)s = self.s(x).reshape(B, H, W, self.ca_num_heads, C // self.ca_num_heads).permute(3, 0, 4, 1,2)  # num_heads,B,C,H,Wfor i in range(self.ca_num_heads):local_conv = getattr(self, f"local_conv_{i + 1}")s_i = s[i]  # B,C,H,Ws_i = local_conv(s_i).reshape(B, self.split_groups, -1, H, W)if i == 0:s_out = s_ielse:s_out = torch.cat([s_out, s_i], 2)s_out = s_out.reshape(B, C, H, W)s_out = self.proj1(self.act(self.bn(self.proj0(s_out))))self.modulator = s_outs_out = s_out.reshape(B, C, N).permute(0, 2, 1)x = s_out * vx = self.proj(x)x = self.proj_drop(x)return x# Multi-scale Awareness Fusion Module
class MAFM(nn.Module):def __init__(self, inc):super().__init__()self.outc = incself.attention = MHMC(dim=inc)self.coi = COI(inc)self.pw = nn.Sequential(nn.Conv2d(in_channels=inc, out_channels=inc, kernel_size=1, stride=1),nn.BatchNorm2d(inc),nn.GELU())self.pre_att = nn.Sequential(nn.Conv2d(inc * 2, inc * 2, kernel_size=3, padding=1, groups=inc * 2),nn.BatchNorm2d(inc * 2),nn.GELU(),nn.Conv2d(inc * 2, inc, kernel_size=1),nn.BatchNorm2d(inc),nn.GELU())self.apply(self._init_weights)def forward(self, x, d):B, C, H, W = x.shapex_cat = torch.cat((x, d), dim=1)x_pre = self.pre_att(x_cat)# Attentionx_reshape = x_pre.flatten(2).permute(0, 2, 1)  # B,C,H,W to B,N,Cattention = self.attention(x_reshape, H, W)  # attentionattention = attention.permute(0, 2, 1).reshape(B, C, H, W)  # B,N,C to B,C,H,W# COIx_conv = self.coi(attention)  # dw3*3,1*1,identityx_conv = self.pw(x_conv)  # pwreturn x_convdef _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)elif isinstance(m, nn.Conv2d):fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsm.weight.data.normal_(0, math.sqrt(2.0 / fan_out))if m.bias is not None:m.bias.data.zero_()# Decoder
class MCM(nn.Module):def __init__(self, inc, outc):super().__init__()self.upsample2 = nn.Upsample(scale_factor=2, mode="bilinear", align_corners=True)self.rc = nn.Sequential(nn.Conv2d(in_channels=inc, out_channels=inc, kernel_size=3, padding=1, stride=1, groups=inc),nn.BatchNorm2d(inc),nn.GELU(),nn.Conv2d(in_channels=inc, out_channels=outc, kernel_size=1, stride=1),nn.BatchNorm2d(outc),nn.GELU())self.predtrans = nn.Sequential(nn.Conv2d(in_channels=outc, out_channels=outc, kernel_size=3, padding=1, groups=outc),nn.BatchNorm2d(outc),nn.GELU(),nn.Conv2d(in_channels=outc, out_channels=1, kernel_size=1))self.rc2 = nn.Sequential(nn.Conv2d(in_channels=outc * 2, out_channels=outc * 2, kernel_size=3, padding=1, groups=outc * 2),nn.BatchNorm2d(outc * 2),nn.GELU(),nn.Conv2d(in_channels=outc * 2, out_channels=outc, kernel_size=1, stride=1),nn.BatchNorm2d(outc),nn.GELU())self.apply(self._init_weights)def forward(self, x1, x2):x2_upsample = self.upsample2(x2)  # 上采样x2_rc = self.rc(x2_upsample)  # 减少通道数shortcut = x2_rcx_cat = torch.cat((x1, x2_rc), dim=1)  # 拼接x_forward = self.rc2(x_cat)  # 减少通道数2x_forward = x_forward + shortcutpred = F.interpolate(self.predtrans(x_forward), 384, mode="bilinear", align_corners=True)  # 预测图return pred, x_forwarddef _init_weights(self, m):if isinstance(m, nn.Linear):trunc_normal_(m.weight, std=.02)if isinstance(m, nn.Linear) and m.bias is not None:nn.init.constant_(m.bias, 0)elif isinstance(m, nn.LayerNorm):nn.init.constant_(m.bias, 0)nn.init.constant_(m.weight, 1.0)elif isinstance(m, nn.Conv2d):fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsfan_out //= m.groupsm.weight.data.normal_(0, math.sqrt(2.0 / fan_out))if m.bias is not None:m.bias.data.zero_()if __name__ == '__main__':x = torch.randn(4, 16, 128, 128).cuda()y = torch.randn(4, 16, 128, 128).cuda()z = torch.randn(4, 32, 64, 64).cuda()model = MAFM(16).cuda()out = model(x, y)# model = MCM(32, 16).cuda()# _, out = model(x, z)# print(out.shape)
':x = torch.randn(4, 16, 128, 128).cuda()y = torch.randn(4, 16, 128, 128).cuda()z = torch.randn(4, 32, 64, 64).cuda()model = MAFM(16).cuda()out = model(x, y)# model = MCM(32, 16).cuda()# _, out = model(x, z)# print(out.shape)
http://www.xdnf.cn/news/700.html

相关文章:

  • javascript day4
  • LicheeRV Nano 与Ubuntu官方risc-v 镜像混合
  • 12【生命周期·入门】为何需要与显式标注 (`‘a`):让编译器读懂引用的“有效期”
  • Oracle--SQL基本语法
  • lmm-r1开源程序是扩展 OpenRLHF 以支持 LMM RL 训练,用于在多模态任务上重现 DeepSeek-R1
  • Eureka搭建
  • BeautifulSoup 库的使用——python爬虫
  • 算法—合并排序—js(场景:大数据且需稳定性)
  • 23种设计模式-结构型模式之装饰器模式(Java版本)
  • C#进阶学习(八)常见的泛型数据结构类(3)SortedDictionary<TKey, TValue>与SortedList<TKey, TValue>
  • 大语言模型推理能力的强化学习现状理解GRPO与近期推理模型研究的新见解
  • PG CTE 递归 SQL 翻译为 达梦版本
  • 将 JSON 字符串转化为对象的详细笔记 (Java示例)
  • 【AI量化第26篇】以配置为核心的工程化研究管理——基于miniQMT的量化交易回测系统开发实记
  • 15. 三数之和
  • 计算机网络中的网络层:架构、功能与重要性
  • llama factory
  • springboot+vue3+mysql+websocket实现的即时通讯软件
  • C++数组栈与链表栈
  • 软考高级系统架构设计师-第16章 数学与经济管理
  • 切换 Python 版本(配置path方式,含trae)
  • 一个最简单的 Model Context Protocol 的例子
  • Halcon应用:相机标定
  • C++入门篇(下)
  • 线性DP:最长上升子序列(可不连续,数组必须连续)
  • Matlab 复合模糊PID
  • NumPy:数值计算基础与高性能数组操作
  • 如何使用人工智能大模型,免费快速写工作总结?
  • Linux基础指令 补充(自用)
  • 【微知】服务器如何获取服务器的SN序列号信息?(dmidecode -t 1)