当前位置：首页 > news >正文

基于CotSegNet网络和机器学习的棉花点云器官分割和表型信息提取

news 2025/9/1 10:37:29

一、引言

PointNet++作为点云处理领域的先驱与里程碑式深度学习模型，以其卓越的性能和对无序点云数据直接处理的能力而闻名。博主将分享1篇发表在《Computers and Electronics in Agriculture》（中科院1区TOP）的“Organ segmentation and phenotypic information extraction of cotton point clouds based on the CotSegNet network and machine learning”，说明PointNet++模型在植物器官分割和表现信息提取的应用。主要改进点如下：

（1）CotSegNet网络：提出了一种新的点云语义分割网络CotSegNet。具体来说，设计了一种改进的注意力机制，称为cgluconformer，以增强模型有效捕获关键信息的能力。此外，在CotSegNet中加入了SegNext注意机制，以改进特征表示并抑制不相关信息；

（2）改进的区域增长算法：本文提出了一种改进的区域增长算法，解决了相邻叶子之间的粘连和共平面导致的过度分割问题。这是通过引入距离限制来实现的。

论文原文：https://doi.org/10.1016/j.compag.2025.110466

欢迎大家交流、引用和分享，博文如需转载请注明来源。

二、研究背景

棉花作为全球最重要的经济作物之一，在农业经济和工业生产中占据核心地位。其纤维是纺织行业的主要原料，广泛应用于服装、家居用品及工业领域。此外，棉籽油和棉籽粕也是重要的食品与饲料来源。在作物生长过程中，植物表型发挥着关键作用。植物表型是指植物在生长发育过程中呈现的一套物理、生理及生化特征，这些特征由基因型与环境因素的动态相互作用所塑造，直观反映了植物的生长状况。

传统上，植物表型参数的提取依赖人工测量方法。尽管这些方法精度较高，但存在明显缺陷：效率低下、劳动强度大且可扩展性有限。更重要的是，它们可能对被测植物造成不可逆的损伤。例如，测量茎高时，使用卷尺或直尺等手动方式可能因操作不当导致茎秆弯曲或断裂；测量叶面积时，叶面积仪通常需要摘除叶片，这不仅会降低植物的光合能力，还可能显著影响幼苗生长；而测量叶片形态参数（如叶长、叶宽）时，使用游标卡尺或图像分析工具的手动操作可能损伤叶缘或叶脉。

因此，开发自动化、非损毁的植物表型参数测定技术势在必行。此类技术的进步将提高植物表型测量的效率，实现对植物生长与健康状况的有效监测，并最终推动植物育种实践的改进。

三、数据集

本研究使用知象光电生产的Revopoint Mini型精密蓝光三维扫描仪采集高分辨率点云数据。

部分数据集：https://pan.baidu.com/s/16pOAGeUHDH1ICuDVqrggYw.

图1 棉花原始点云数据

图2 棉花点云数据标注

四、模型改进

本文提出的CotSegNet模型采用编码器-解码器架构，用于点云语义分割任务，其结构如图5所示。该模型基于传统的PointNet++架构构建，并作为棉花苗期器官语义分割的基线模型。然而，PointNet++在处理高度弯曲表面的局部邻域信息时存在效率不足的问题。鉴于棉花具有复杂的几何特征，局部信息往往更为关键。为解决这一问题，CotSegNet在特征提取阶段整合了CGLUConvFormer和SegNext注意力机制，增强了邻域信息交互能力，并改进了原PointNet++结构。

本文设计了一种新型注意力机制——CGLUConvFormer。该模块融合了卷积门控线性单元（Convolutional Gated Linear Unit）、归一化层和逆分离卷积。其核心模块卷积门控线性单元通过门控机制动态调整特征图，不仅提升了模型捕捉关键信息的能力，还能有效减少噪声，显著增强模型的鲁棒性。在语义分割任务中，卷积门控线性单元擅长同时捕捉图像的局部和全局特征，并通过门控机制选择性传输这些特征，从而更精准地识别图像中的语义信息。通过结合深度卷积与逐点卷积，该模块在有效提取和重构特征的同时，减少了参数数量并提升了模型效率。其中，深度卷积用于提取空间特征，逐点卷积则将这些特征融合为更具表现力的特征图。此外，逆分离卷积不仅降低了计算复杂度，还提升了特征提取过程的效率。

SegNext注意力机制通过动态调整模型对输入数据不同区域的关注焦点，显著增强了关键特征的表示能力，同时抑制了无关信息。该机制通过计算每个特征位置的重要性权重，并对这些特征进行加权求和，生成更具代表性的特征表示。进一步地，SegNext整合了全局与局部注意力机制，有效捕捉图像中的上下文信息。全局注意力机制在整个图像范围内寻找相关信息，而局部注意力机制则聚焦于当前位置周围的特定区域。这种结合使模型能够更深入地理解复杂场景与对象之间的关系。在语义分割任务中，处理不同尺度的对象对提升性能至关重要。SegNext注意力机制通过动态调整不同尺度特征的权重，实现了多尺度特征的有效融合，使模型能够同时捕捉图像的细节信息与整体结构信息，从而提高了分割的准确率和鲁棒性。

图3 CotSegNet网络架构

（一）CGLUConvFormer注意机制

本研究提出了一个更通用的框架——MetaFormer。该模型由两个核心模块构成：Token Mixer（令牌混合器），用于整合空间位置信息；以及Channel MLP（通道多层感知机），用于融合通道信息。根据Token Mixer的具体实现方式不同，MetaFormer可以演变为多种不同类型的模型。

为提升点云特征提取能力，本文设计了一种创新的点云处理注意力机制——CGLUConvFormer。该机制采用倒置分离卷积（Inverted Separable Convolution）作为Token Mixer模块，通过一系列卷积操作有效提取点云的空间特征。同时，它以卷积门控线性单元（Convolutional Gated Linear Unit）作为Channel MLP模块，通过动态调整特征图的重要性实现通道信息的精准融合。

图4 CGLUConvFormer注意机制结构

CGLUConvFormer.py

from functools import partial
import torch
import torch.nn as nn
import torch.nn.functional as Ffrom timm.layers import DropPath, to_2tuple__all__ = ['MF_Attention', 'RandomMixing', 'SepConv', 'Pooling', 'MetaFormerBlock', 'MetaFormerCGLUBlock', 'LayerNormGeneral']class Scale(nn.Module):"""Scale vector by element multiplications."""def __init__(self, dim, init_value=1.0, trainable=True):super().__init__()self.scale = nn.Parameter(init_value * torch.ones(dim), requires_grad=trainable)def forward(self, x):return x * self.scaleclass SquaredReLU(nn.Module):"""Squared ReLU: https://arxiv.org/abs/2109.08668"""def __init__(self, inplace=False):super().__init__()self.relu = nn.ReLU(inplace=inplace)def forward(self, x):return torch.square(self.relu(x))class StarReLU(nn.Module):"""StarReLU: s * relu(x) ** 2 + b"""def __init__(self, scale_value=1.0, bias_value=0.0,scale_learnable=True, bias_learnable=True, mode=None, inplace=False):super().__init__()self.inplace = inplaceself.relu = nn.ReLU(inplace=inplace)self.scale = nn.Parameter(scale_value * torch.ones(1),requires_grad=scale_learnable)self.bias = nn.Parameter(bias_value * torch.ones(1),requires_grad=bias_learnable)def forward(self, x):return self.scale * self.relu(x)**2 + self.biasclass MF_Attention(nn.Module):"""Vanilla self-attention from Transformer: https://arxiv.org/abs/1706.03762.Modified from timm."""def __init__(self, dim, head_dim=32, num_heads=None, qkv_bias=False,attn_drop=0., proj_drop=0., proj_bias=False, **kwargs):super().__init__()self.head_dim = head_dimself.scale = head_dim ** -0.5self.num_heads = num_heads if num_heads else dim // head_dimif self.num_heads == 0:self.num_heads = 1self.attention_dim = self.num_heads * self.head_dimself.qkv = nn.Linear(dim, self.attention_dim * 3, bias=qkv_bias)self.attn_drop = nn.Dropout(attn_drop)self.proj = nn.Linear(self.attention_dim, dim, bias=proj_bias)self.proj_drop = nn.Dropout(proj_drop)def forward(self, x):B, H, W, C = x.shapeN = H * Wqkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)q, k, v = qkv.unbind(0)   # make torchscript happy (cannot use tensor as tuple)attn = (q @ k.transpose(-2, -1)) * self.scaleattn = attn.softmax(dim=-1)attn = self.attn_drop(attn)x = (attn @ v).transpose(1, 2).reshape(B, H, W, self.attention_dim)x = self.proj(x)x = self.proj_drop(x)return xclass RandomMixing(nn.Module):def __init__(self, num_tokens=196, **kwargs):super().__init__()self.random_matrix = nn.parameter.Parameter(data=torch.softmax(torch.rand(num_tokens, num_tokens), dim=-1), requires_grad=False)def forward(self, x):B, H, W, C = x.shapex = x.reshape(B, H*W, C)x = torch.einsum('mn, bnc -> bmc', self.random_matrix, x)x = x.reshape(B, H, W, C)return xclass LayerNormGeneral(nn.Module):r""" General LayerNorm for different situations.Args:affine_shape (int, list or tuple): The shape of affine weight and bias.Usually the affine_shape=C, but in some implementation, like torch.nn.LayerNorm,the affine_shape is the same as normalized_dim by default. To adapt to different situations, we offer this argument here.normalized_dim (tuple or list): Which dims to compute mean and variance. scale (bool): Flag indicates whether to use scale or not.bias (bool): Flag indicates whether to use scale or not.We give several examples to show how to specify the arguments.LayerNorm (https://arxiv.org/abs/1607.06450):For input shape of (B, *, C) like (B, N, C) or (B, H, W, C),affine_shape=C, normalized_dim=(-1, ), scale=True, bias=True;For input shape of (B, C, H, W),affine_shape=(C, 1, 1), normalized_dim=(1, ), scale=True, bias=True.Modified LayerNorm (https://arxiv.org/abs/2111.11418)that is idental to partial(torch.nn.GroupNorm, num_groups=1):For input shape of (B, N, C),affine_shape=C, normalized_dim=(1, 2), scale=True, bias=True;For input shape of (B, H, W, C),affine_shape=C, normalized_dim=(1, 2, 3), scale=True, bias=True;For input shape of (B, C, H, W),affine_shape=(C, 1, 1), normalized_dim=(1, 2, 3), scale=True, bias=True.For the several metaformer baslines,IdentityFormer, RandFormer and PoolFormerV2 utilize Modified LayerNorm without bias (bias=False);ConvFormer and CAFormer utilizes LayerNorm without bias (bias=False)."""def __init__(self, affine_shape=None, normalized_dim=(-1, ), scale=True, bias=True, eps=1e-5):super().__init__()self.normalized_dim = normalized_dimself.use_scale = scaleself.use_bias = biasself.weight = nn.Parameter(torch.ones(affine_shape)) if scale else Noneself.bias = nn.Parameter(torch.zeros(affine_shape)) if bias else Noneself.eps = epsdef forward(self, x):c = x - x.mean(self.normalized_dim, keepdim=True)s = c.pow(2).mean(self.normalized_dim, keepdim=True)x = c / torch.sqrt(s + self.eps)if self.use_scale:x = x * self.weightif self.use_bias:x = x + self.biasreturn xclass LayerNormWithoutBias(nn.Module):"""Equal to partial(LayerNormGeneral, bias=False) but faster, because it directly utilizes otpimized F.layer_norm"""def __init__(self, normalized_shape, eps=1e-5, **kwargs):super().__init__()self.eps = epsself.bias = Noneif isinstance(normalized_shape, int):normalized_shape = (normalized_shape,)self.weight = nn.Parameter(torch.ones(normalized_shape))self.normalized_shape = normalized_shapedef forward(self, x):return F.layer_norm(x, self.normalized_shape, weight=self.weight, bias=self.bias, eps=self.eps)class SepConv(nn.Module):r"""Inverted separable convolution from MobileNetV2: https://arxiv.org/abs/1801.04381."""def __init__(self, dim, expansion_ratio=2,act1_layer=StarReLU, act2_layer=nn.Identity, bias=False, kernel_size=7, padding=3,**kwargs, ):super().__init__()med_channels = int(expansion_ratio * dim)self.pwconv1 = nn.Linear(dim, med_channels, bias=bias)self.act1 = act1_layer()self.dwconv = nn.Conv2d(med_channels, med_channels, kernel_size=kernel_size,padding=padding, groups=med_channels, bias=bias) # depthwise convself.act2 = act2_layer()self.pwconv2 = nn.Linear(med_channels, dim, bias=bias)def forward(self, x):x = self.pwconv1(x)x = self.act1(x)x = x.permute(0, 3, 1, 2)x = self.dwconv(x)x = x.permute(0, 2, 3, 1)x = self.act2(x)x = self.pwconv2(x)return xclass Pooling(nn.Module):"""Implementation of pooling for PoolFormer: https://arxiv.org/abs/2111.11418Modfiled for [B, H, W, C] input"""def __init__(self, pool_size=3, **kwargs):super().__init__()self.pool = nn.AvgPool2d(pool_size, stride=1, padding=pool_size//2, count_include_pad=False)def forward(self, x):y = x.permute(0, 3, 1, 2)y = self.pool(y)y = y.permute(0, 2, 3, 1)return y - xclass Mlp(nn.Module):""" MLP as used in MetaFormer models, eg Transformer, MLP-Mixer, PoolFormer, MetaFormer baslines and related networks.Mostly copied from timm."""def __init__(self, dim, mlp_ratio=4, out_features=None, act_layer=StarReLU, drop=0., bias=False, **kwargs):super().__init__()in_features = dimout_features = out_features or in_featureshidden_features = int(mlp_ratio * in_features)drop_probs = to_2tuple(drop)self.fc1 = nn.Linear(in_features, hidden_features, bias=bias)self.act = act_layer()self.drop1 = nn.Dropout(drop_probs[0])self.fc2 = nn.Linear(hidden_features, out_features, bias=bias)self.drop2 = nn.Dropout(drop_probs[1])def forward(self, x):x = self.fc1(x)x = self.act(x)x = self.drop1(x)x = self.fc2(x)x = self.drop2(x)return xclass ConvolutionalGLU(nn.Module):def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.) -> None:super().__init__()out_features = out_features or in_featureshidden_features = hidden_features or in_featureshidden_features = int(2 * hidden_features / 3)self.fc1 = nn.Conv2d(in_features, hidden_features * 2, 1)self.dwconv = nn.Sequential(nn.Conv2d(hidden_features, hidden_features, kernel_size=3, stride=1, padding=1, bias=True, groups=hidden_features),act_layer())self.fc2 = nn.Conv2d(hidden_features, out_features, 1)self.drop = nn.Dropout(drop)# def forward(self, x):#     x, v = self.fc1(x).chunk(2, dim=1)#     x = self.dwconv(x) * v#     x = self.drop(x)#     x = self.fc2(x)#     x = self.drop(x)#     return xdef forward(self, x):x_shortcut = xx, v = self.fc1(x).chunk(2, dim=1)x = self.dwconv(x) * vx = self.drop(x)x = self.fc2(x)x = self.drop(x)return x_shortcut + xclass MetaFormerBlock(nn.Module):"""Implementation of one MetaFormer block."""def __init__(self, dim,token_mixer=nn.Identity, mlp=Mlp,norm_layer=partial(LayerNormWithoutBias, eps=1e-6),drop=0., drop_path=0.,layer_scale_init_value=None, res_scale_init_value=None):super().__init__()self.norm1 = norm_layer(dim)self.token_mixer = token_mixer(dim=dim, drop=drop)self.drop_path1 = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.layer_scale1 = Scale(dim=dim, init_value=layer_scale_init_value) \if layer_scale_init_value else nn.Identity()self.res_scale1 = Scale(dim=dim, init_value=res_scale_init_value) \if res_scale_init_value else nn.Identity()self.norm2 = norm_layer(dim)self.mlp = mlp(dim=dim, drop=drop)self.drop_path2 = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.layer_scale2 = Scale(dim=dim, init_value=layer_scale_init_value) \if layer_scale_init_value else nn.Identity()self.res_scale2 = Scale(dim=dim, init_value=res_scale_init_value) \if res_scale_init_value else nn.Identity()def forward(self, x):x = x.permute(0, 2, 3, 1)x = self.res_scale1(x) + \self.layer_scale1(self.drop_path1(self.token_mixer(self.norm1(x))))x = self.res_scale2(x) + \self.layer_scale2(self.drop_path2(self.mlp(self.norm2(x))))return x.permute(0, 3, 1, 2)class MetaFormerCGLUBlock(nn.Module):"""Implementation of one MetaFormer block."""def __init__(self, dim,token_mixer=nn.Identity, mlp=ConvolutionalGLU,norm_layer=partial(LayerNormWithoutBias, eps=1e-6),drop=0., drop_path=0.,layer_scale_init_value=None, res_scale_init_value=None):super().__init__()self.norm1 = norm_layer(dim)self.token_mixer = token_mixer(dim=dim, drop=drop)self.drop_path1 = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.layer_scale1 = Scale(dim=dim, init_value=layer_scale_init_value) \if layer_scale_init_value else nn.Identity()self.res_scale1 = Scale(dim=dim, init_value=res_scale_init_value) \if res_scale_init_value else nn.Identity()self.norm2 = norm_layer(dim)self.mlp = mlp(dim, drop=drop)self.drop_path2 = DropPath(drop_path) if drop_path > 0. else nn.Identity()self.layer_scale2 = Scale(dim=dim, init_value=layer_scale_init_value) \if layer_scale_init_value else nn.Identity()self.res_scale2 = Scale(dim=dim, init_value=res_scale_init_value) \if res_scale_init_value else nn.Identity()def forward(self, x):x = x.permute(0, 2, 3, 1)x = self.res_scale1(x) + \self.layer_scale1(self.drop_path1(self.token_mixer(self.norm1(x))))x = self.res_scale2(x.permute(0, 3, 1, 2)) + \self.layer_scale2(self.drop_path2(self.mlp(self.norm2(x).permute(0, 3, 1, 2))))return x

（二）SegNext_Attention注意力机制

本文引用的SegNext注意力机制能够更精准地识别棉花点云的边界和细节特征。该机制通过增强模型对点云数据内部特征的表征能力，有效提升了点云处理的精度。该机制基于编码器-解码器框架构建：编码器部分引入了多尺度卷积注意力模块（MSCA），专注于挖掘点云数据中的深层特征以提升特征提取效率；解码器部分则采用轻量化的Hamburger模块，进一步整合全局上下文信息，并将特征转化为精确的分割结果。

SegNext注意力机制的核心在于其显著增强空间注意力的能力。MSCA模块由三个关键组件构成：深度卷积层用于聚合局部特征信息，多分支深度条带卷积层专门捕捉不同尺度的上下文信息，以及1×1卷积层用于建模通道间相关性。通过这种精心设计，MSCA模块在几乎不增加参数量的前提下，显著提升了模型捕获和利用多尺度特征的能力。

图5 SegNext_Attention注意机制结构

SegNext_Attention.py

import torch
from torch import nn, Tensor, LongTensor
from torch.nn import init
import torch.nn.functional as F
import torchvisionclass SegNext_Attention(nn.Module):# SegNext NeurIPS 2022# https://github.com/Visual-Attention-Network/SegNeXt/tree/maindef __init__(self, dim):super().__init__()self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)self.conv0_1 = nn.Conv2d(dim, dim, (1, 7), padding=(0, 3), groups=dim)self.conv0_2 = nn.Conv2d(dim, dim, (7, 1), padding=(3, 0), groups=dim)self.conv1_1 = nn.Conv2d(dim, dim, (1, 11), padding=(0, 5), groups=dim)self.conv1_2 = nn.Conv2d(dim, dim, (11, 1), padding=(5, 0), groups=dim)self.conv2_1 = nn.Conv2d(dim, dim, (1, 21), padding=(0, 10), groups=dim)self.conv2_2 = nn.Conv2d(dim, dim, (21, 1), padding=(10, 0), groups=dim)self.conv3 = nn.Conv2d(dim, dim, 1)def forward(self, x):u = x.clone()attn = self.conv0(x)attn_0 = self.conv0_1(attn)attn_0 = self.conv0_2(attn_0)attn_1 = self.conv1_1(attn)attn_1 = self.conv1_2(attn_1)attn_2 = self.conv2_1(attn)attn_2 = self.conv2_2(attn_2)attn = attn + attn_0 + attn_1 + attn_2attn = self.conv3(attn)return attn * u

（三）改进的区域生长算法

本文提出了一种改进的区域生长算法，专为将不同叶片的点云分割为独立实例而设计。通过引入距离约束，该算法有效解决了相邻叶片因粘连和平面特性导致的过分割问题。

改进区域生长算法的核心原理是：从种子点出发，逐步扩展区域，纳入满足几何约束（基于法向量角度、曲率及距离阈值）的邻近点，从而实现粘连叶片结构的精准分割。

具体流程如下：首先对输入点云进行预处理，包括法向量计算和KD树构建以加速邻域搜索。随后计算每个点的局部邻域协方差矩阵，并通过奇异值分解（SVD）获取曲率值。选择曲率最低的点作为初始种子点，因其通常位于叶片平面区域，可有效减少生成的分段数量。通过引入法向量角度阈值，算法增强了对叶片几何与拓扑特征的适应性，确保叶片的准确识别与分离。算法通过判断相邻点法向量的角度和曲率是否满足预设阈值，来决定是否将邻近点归入当前区域。若角度和曲率条件均满足，则选择最近邻点作为新种子点，并继续迭代；若仅满足法向量角度阈值，则将该点分类但不设为新种子。

然而，当叶片轴两侧的叶片相互粘连时，传统区域生长方法难以有效分割。虽然增大法向量角度阈值可部分缓解此问题，但也可能导致不同位置点簇因法向量角度相似而出现过分割。为此，算法在传统区域生长方法中引入距离约束。改进后的算法对粘连且共面的叶片点云部分施加距离约束，动态调整平面距离以精准分离共面但空间上独立的粘连叶片，显著提升分割精度，并实现更精确的边界点云提取。

五、结论

本文提出了一种基于点云语义分割框架的CotSegNet网络，用于精准提取棉花茎、叶等关键表型特征。通过综合模块设计与消融实验，CotSegNet被应用于本研究构建的棉花点云数据集，在器官分割任务中展现出卓越的性能和高效的推理速度。CotSegNet的核心在于提出的CGLUConvFormer注意力机制和引入的SegNext注意力机制。其中，CGLUConvFormer利用倒置深度可分离卷积高效捕捉棉花器官的局部几何特征（如叶片边缘、茎秆走向），并结合卷积门控线性单元动态筛选通道特征，显著提升了对苗期棉花顶部新叶等小器官的敏感性，实现了茎、叶、叶柄等器官的精准分割。SegNext注意力机制则通过空间-通道双路径注意力进一步增强模型的全局上下文感知能力：在空间路径中，采用空洞卷积金字塔捕捉多尺度上下文信息，确保模型能理解器官间的空间关系；在通道路径中，通过可学习的通道权重动态调整特征图，抑制背景噪声干扰的同时强化关键区域（如茎叶连接处）的特征表达。

查看全文

http://www.xdnf.cn/news/1413505.html