当前位置: 首页 > article >正文

YOLO11改进-注意力-引入多尺度注意力聚合(MSAA)模块

          YOLOv11 的网络结构由多个卷积层和池化层组成,这些层逐步提取图像的特征,并在不同的尺度上进行目标检测。它还采用了一些先进的技术,如自适应锚框、多尺度训练等,以适应不同大小和形状的目标。

1. 多尺度注意力聚合(MSAA)模块介绍   

        对来自backbone的特征进行细化处理。通过空间和通道两个路径的操作,增强了空间和通道方面的特征信息,使得输出的特征图在空间和通道维度上都更加优质。

        1. 在空间细化路径中,通过对不同核大小的卷积进行求和以及一系列的空间特征聚合操作,实现了多尺度空间信息的融合。

        2. 在通道聚合路径中,通过全局平均池化、卷积和激活等操作生成通道注意力图,并与空间细化后的图相结合,实现了通道维度上的多尺度信息融合。

2. YOLOv11与MSAA的结合   

        原论文是将MSAA 模块放在编码器与解码器之间,起到连接和增强特征传递的作用。因此本文将这个模块放在Neck部分的contact层后面,弥补跨层拼接特征时可能存在的特征提取不充分或多尺度信息融合不足的问题。。

3. MSAA代码部分

import torch
import torch.nn as nn

class ChannelAttentionModule(nn.Module):
    def __init__(self, in_channels, reduction=4):
        super(ChannelAttentionModule, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.fc = nn.Sequential(
            nn.Conv2d(in_channels, in_channels // reduction, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(in_channels // reduction, in_channels, 1, bias=False)
        )
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = self.fc(self.avg_pool(x))
        max_out = self.fc(self.max_pool(x))
        out = avg_out + max_out
        return self.sigmoid(out)

class SpatialAttentionModule(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttentionModule, self).__init__()
        self.conv1 = nn.Conv2d(2, 1, kernel_size, padding=kernel_size//2, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avg_out = torch.mean(x, dim=1, keepdim=True)
        max_out, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avg_out, max_out], dim=1)
        x = self.conv1(x)
        return self.sigmoid(x)

class MSAA(nn.Module):
    def __init__(self, in_channels, out_channels, factor=4.0):
        super(MSAA, self).__init__()
        dim = int(out_channels // factor)
        self.down = nn.Conv2d(in_channels, dim, kernel_size=1, stride=1)
        self.conv_3x3 = nn.Conv2d(dim, dim, kernel_size=3, stride=1, padding=1)
        self.conv_5x5 = nn.Conv2d(dim, dim, kernel_size=5, stride=1, padding=2)
        self.conv_7x7 = nn.Conv2d(dim, dim, kernel_size=7, stride=1, padding=3)
        self.spatial_attention = SpatialAttentionModule()
        self.channel_attention = ChannelAttentionModule(dim)
        self.up = nn.Conv2d(dim, out_channels, kernel_size=1, stride=1)
        self.down_2 = nn.Conv2d(in_channels, dim, kernel_size=1, stride=1)

    def forward(self, x):
        x = self.down(x)
        x = x * self.channel_attention(x)
        x_3x3 = self.conv_3x3(x)
        x_5x5 = self.conv_5x5(x)
        x_7x7 = self.conv_7x7(x)
        x_s = x_3x3 + x_5x5 + x_7x7
        x_s = x_s * self.spatial_attention(x_s)

        x_out = self.up(x_s + x)

        return x_out



if __name__ =='__main__':

    MSAA = MSAA(256,256)
    #创建一个输入张量
    batch_size = 8
    input_tensor=torch.randn(batch_size, 256, 64, 64 )
    #运行模型并打印输入和输出的形状
    output_tensor =MSAA(input_tensor)
    print("Input shape:",input_tensor.shape)
    print("0utput shape:",output_tensor.shape)

 4. 将MSAA引入到YOLOv11中

第一: 将下面的核心代码复制到D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\nn路径下,如下图所示。

第二:在task.py中导入MSAA包

第三:在task.py中的模型配置部分下面代码

第四:将模型配置文件复制到YOLOV11.YAMY文件中

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs

# YOLO11n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  - [-1, 2, C3k2, [256, False, 0.25]]
  - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  - [-1, 2, C3k2, [512, False, 0.25]]
  - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  - [-1, 2, C3k2, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  - [-1, 2, C3k2, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 9
  - [-1, 2, C2PSA, [1024]] # 10

# YOLO11n head
head:
  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  - [-1, 1, MSAA, []] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 14

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  - [-1, 1, MSAA, []] # cat backbone P4
  - [-1, 2, C3k2, [256, False]] # 18 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 14], 1, Concat, [1]] # cat head P4
  - [-1, 1, MSAA, []] # cat backbone P4
  - [-1, 2, C3k2, [512, False]] # 22 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]] # cat head P5
  - [-1, 1, MSAA, []] # cat backbone P4
  - [-1, 2, C3k2, [1024, True]] # 26 (P5/32-large)

  - [[18, 22, 26], 1, Detect, [nc]] # Detect(P3, P4, P5)

第五:运行成功


from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld

if __name__=="__main__":


    # 使用自己的YOLOv11.yamy文件搭建模型并加载预训练权重训练模型
    model = YOLO(r"D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\cfg\models\11\yolo11_MSAA.yaml")\
        .load(r'D:\bilibili\model\YOLO11\ultralytics-main\yolo11n.pt')  # build from YAML and transfer weights

    results = model.train(data=r'D:\bilibili\model\ultralytics-main\ultralytics\cfg\datasets\VOC_my.yaml',
                          epochs=100, imgsz=640, batch=8,amp=False)




http://www.kler.cn/a/386103.html

相关文章:

  • Python读写Excel的全面教程
  • 【Qt】报错error: undefined reference to `vtable for的最简单解决
  • 【3D Slicer】的小白入门使用指南四
  • ElasticSearch-全文检索(一)基本介绍
  • flink cdc 应用
  • JSON.stringify的应用说明
  • 基于STM32的智能家居安防AI系统:OpenCV、TCP/HTTP、RFID、UART技术设计思路
  • 大模型微调技术 --> P-Tuning v1和 P-Tuning v2
  • 深度学习鲁棒性、公平性和泛化性的联系
  • Laravel 安全实践:如何防止 XSS 攻击
  • 网站访问在TCP/IP四层模型中的流程
  • 第01章 Linux概述及系统环境搭建
  • 基于SSM(Spring + Spring MVC + MyBatis)框架的咖啡馆管理系统
  • 测度论原创(三)
  • AOP基于注解的切面表达式
  • 【自然语言处理与大模型】大模型(LLM)基础知识②
  • Linux基础学习笔记
  • MySQL库操作
  • MAC 安装 brew及其常用命令
  • 十七:Spring Boot (2)-- spring-boot-starter-web 依赖详解
  • 论文略读:GRAG:GraphRetrieval-Augmented Generation
  • Windows10 上安装 Docker 失败
  • 苍穹外卖day09超出配送范围前端不提示问题
  • el-scrollbar 动态更新内容 鼠标滚轮无效
  • Linux(CentOS)设置防火墙开放8080端口,运行jar包,接收请求
  • PHP实现身份证OCR识别API接口