YOLO11改进-注意力-引入自集成注意力机制SEAM解决遮挡标题 ...

郭卫东 · 2024-10-26 00:27:18

马上注册，结交更多好友，享用更多功能，让你轻松玩转社区。

您需要登录才可以下载或查看，没有账号？立即注册

x

            本篇文章将先容一个新的改进机制——SEAM，并阐述如何将其应用于YOLOv11中，明显提升模型性能。首先，我们将解析SEAM他做了什么，SEAM（Self-Ensembling Attention Mechanism）是一种自集成注意力机制，通过多视角特征融合和一致性正则化来加强模型的鲁棒性和泛化本领，特殊适用于处理遮挡标题和多尺度特征融合。随后，我们会详细分析如何将该模块与YOLOv11相结合，展示代码实现细节及其使用方法，最终显现这一改进对目标检测结果的积极影响。
1. SEAM结构先容

        左边是SEAM的架构，右边是部分为通道和空间混淆模块CSMM的结构。CSMM利用深度可分卷积来学习差别尺度的特征空间尺度与通道的相干性。
        SEAM是一种自集成注意力机制，旨在通过多视角特征融合和一致性正则化来加强模型的鲁棒性和泛化本领。

首先对输入特征Patch Embedding：
- 输入图像被分割成差别大小的patch（6, 7, 8），这些patch通过Patch Embedding层举行初步处理，生成特征表现。
CSMM模块：
- 深度可分离卷积：使用深度可分离卷积来学习空间维度和通道之间的相干性。这个操作分为两个步调：深度卷积：对每个通道独立举行卷积操作。逐点卷积：对所有通道举行1x1卷积，整合信息。
激活和归一化：
- GELU激活函数：在卷积操作后，使用GELU激活函数引入非线性。
- 批归一化：对特征图举行批归一化，稳固练习过程。
输出特征表现：
最终，将CSMM模块输出特征举行平均池化，融合多尺度特征的特征表现，加强了模型对差别尺度信息的捕捉本领。

通过这些步调，SEAM和CSMM能够有效地整合多尺度特征，提高模型在图像识别等使命中的性能。

2. YOLOv11与SEAM的结合

本文将YOLOv11模型的C2PSA模块中的注意力层替换LSKA，组合成C2PSA_LSKA模块，利用LSKA的分离卷积核特性，加强C2PSA模块的特征提取本领，同时保持计算复杂度较低。
3. SEAM代码部分

import torch
import torch.nn as nn
from .block import PSABlock,C2PSA
class Residual(nn.Module):
def __init__(self, fn):
super(Residual, self).__init__()
self.fn = fn
def forward(self, x):
return self.fn(x) + x
class SEAM(nn.Module):
def __init__(self, c1, c2, n=1, reduction=16):
super(SEAM, self).__init__()
if c1 != c2:
c2 = c1
self.DCovN = nn.Sequential(
# nn.Conv2d(c1, c2, kernel_size=3, stride=1, padding=1, groups=c1),
# nn.GELU(),
# nn.BatchNorm2d(c2),
*[nn.Sequential(
Residual(nn.Sequential(
nn.Conv2d(in_channels=c2, out_channels=c2, kernel_size=3, stride=1, padding=1, groups=c2),
nn.GELU(),
nn.BatchNorm2d(c2)
)),
nn.Conv2d(in_channels=c2, out_channels=c2, kernel_size=1, stride=1, padding=0, groups=1),
nn.GELU(),
nn.BatchNorm2d(c2)
) for i in range(n)]
)
self.avg_pool = torch.nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(c2, c2 // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(c2 // reduction, c2, bias=False),
nn.Sigmoid()
)
self._initialize_weights()
# self.initialize_layer(self.avg_pool)
self.initialize_layer(self.fc)
def forward(self, x):
b, c, _, _ = x.size()
y = self.DCovN(x)
y = self.avg_pool(y).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
y = torch.exp(y)
return x * y.expand_as(x)
def _initialize_weights(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.xavier_uniform_(m.weight, gain=1)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def initialize_layer(self, layer):
if isinstance(layer, (nn.Conv2d, nn.Linear)):
torch.nn.init.normal_(layer.weight, mean=0., std=0.001)
if layer.bias is not None:
torch.nn.init.constant_(layer.bias, 0)
class PSABlock_SEAM(PSABlock):
def __init__(self, c, qk_dim =16 , pdim=32, shortcut=True) -> None:
"""Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
super().__init__(c)
self.ffn = SEAM(c,c)
class C2PSA_SEAM(C2PSA):
def __init__(self, c1, c2, n=1, e=0.5):
"""Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
super().__init__(c1, c2)
assert c1 == c2
self.c = int(c1 * e)
self.m = nn.Sequential(*(PSABlock_SEAM(self.c, qk_dim =16 , pdim=32) for _ in range(n)))
if __name__ =='__main__':
ASSA_Attention = SEAM(256,256)
#创建一个输入张量
batch_size = 1
input_tensor=torch.randn(batch_size, 256, 64, 64 )
#运行模型并打印输入和输出的形状
output_tensor =ASSA_Attention(input_tensor)
print("Input shape:",input_tensor.shape)
print("0utput shape:",output_tensor.shape)