YOLOv11模型改进-模块-引入多尺度前馈网络MSFN 用于解决噪声
本篇文章将介绍一个新的改进机制——多尺度前馈网络MSFN,并阐述如何将其应用于YOLOv11中,显著提升模型性能。首先,多尺度前馈网络(Multi-scale Feed-forward Network,MSFN)是一种用于提升特征处理能力和效果的网络结构,在相关研究中被用于高光谱图像去噪任务。随后,我们将详细讨论他的模型结构,以及如何将MSFN 模块与YOLOv11相结合,以提升目标检测的性能。
1. 多尺度前馈网络MSFN结构介绍
它主要是为了解决 Transformer 中原始前馈网络(FFN)单尺度特征聚合的局限性。原始 FFN 在进行特征处理时,仅从单一尺度聚合信息,所包含的信息有限,无法充分挖掘和利用图像中的多尺度特征,从而影响了去噪等任务的性能。
1. 通道扩展:采用两个卷积来扩展特征通道。这种操作可以增加特征的维度,为后续的多尺度处理提供更丰富的信息基础。扩展比率,意味着经过这两个卷积操作后,特征通道数量会按照相应比例增加。
2. 并行路径与门控机制:输入特征在两条并行路径中进行处理。引入门控机制来增强非线性变换。具体通过两条路径特征的逐元素乘积实现。这个机制能够动态地调整不同路径特征的贡献,使得网络能够更好地适应不同的输入特征和任务需求。
3. 特征提取:在较低路径中,使用深度卷积进行特征提取。深度卷积可以对每个通道单独进行卷积操作,有效地提取局部特征,同时减少计算量。在较高路径中,采用多尺度扩张卷积进行多尺度特征提取。这里使用了两个扩张率分别为 2 和 3 的扩张卷积。扩张卷积通过在卷积核元素之间插入空格的方式,扩大了卷积核的感受野,能够在不增加参数量的情况下获取更广泛的上下文信息,从而实现多尺度特征的提取。
2. YOLOv11与MSFN的结合
本文将YOLOv11模型的C2PSA模块中的ffn层替换MSFN,组合成C2PSA_MSFN模块,解决 C2PSA模块中原始前馈网络(FFN)单尺度特征聚合的局限性
3. MSFN代码部分
import torch.nn as nn
import torch.nn.functional as F
from .block import PSABlock,C2PSA
## Multi-Scale Feed-Forward Network (MSFN)
class FeedForward(nn.Module):
def __init__(self, dim, ffn_expansion_factor = 2.66, bias = False,):
super(FeedForward, self).__init__()
hidden_features = int(dim*ffn_expansion_factor)
self.project_in = nn.Conv3d(dim, hidden_features*3, kernel_size=(1,1,1), bias=bias)
self.dwconv1 = nn.Conv3d(hidden_features, hidden_features, kernel_size=(3,3,3), stride=1, dilation=1, padding=1, groups=hidden_features, bias=bias)
# self.dwconv2 = nn.Conv3d(hidden_features, hidden_features, kernel_size=(3,3,3), stride=1, dilation=2, padding=2, groups=hidden_features, bias=bias)
# self.dwconv3 = nn.Conv3d(hidden_features, hidden_features, kernel_size=(3,3,3), stride=1, dilation=3, padding=3, groups=hidden_features, bias=bias)
self.dwconv2 = nn.Conv2d(hidden_features, hidden_features, kernel_size=(3,3), stride=1, dilation=2, padding=2, groups=hidden_features, bias=bias)
self.dwconv3 = nn.Conv2d(hidden_features, hidden_features, kernel_size=(3,3), stride=1, dilation=3, padding=3, groups=hidden_features, bias=bias)
self.project_out = nn.Conv3d(hidden_features, dim, kernel_size=(1,1,1), bias=bias)
def forward(self, x):
x = x.unsqueeze(2)
x = self.project_in(x)
x1,x2,x3 = x.chunk(3, dim=1)
x1 = self.dwconv1(x1).squeeze(2)
x2 = self.dwconv2(x2.squeeze(2))
x3 = self.dwconv3(x3.squeeze(2))
# x1 = self.dwconv1(x1)
# x2 = self.dwconv2(x2)
# x3 = self.dwconv3(x3)
x = F.gelu(x1)*x2*x3
x = x.unsqueeze(2)
x = self.project_out(x)
x = x.squeeze(2)
return x
class PSABlock_MSFN(PSABlock):
def __init__(self, c, qk_dim =16 , pdim=32, shortcut=True) -> None:
"""Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
super().__init__(c)
self.ffn = FeedForward(c)
class C2PSA_MSFN(C2PSA):
def __init__(self, c1, c2, n=1, e=0.5):
"""Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
super().__init__(c1, c2)
assert c1 == c2
self.c = int(c1 * e)
self.m = nn.Sequential(*(PSABlock_MSFN(self.c, qk_dim =16 , pdim=32) for _ in range(n)))
4. 将MSFN引入到YOLOv11中
第一: 将下面的核心代码复制到D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\nn路径下,如下图所示。
第二:在task.py中导入MSFN包
第三:在task.py中的模型配置部分下面代码
第四:将模型配置文件复制到YOLOV11.YAMY文件中
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA_MSFN, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
第五:运行成功
from ultralytics.models import NAS, RTDETR, SAM, YOLO, FastSAM, YOLOWorld
if __name__=="__main__":
# 使用自己的YOLOv11.yamy文件搭建模型并加载预训练权重训练模型
model = YOLO(r"D:\bilibili\model\YOLO11\ultralytics-main\ultralytics\cfg\models\11\yolo11_msfn.yaml")\
.load(r'D:\bilibili\model\YOLO11\ultralytics-main\yolo11n.pt') # build from YAML and transfer weights
results = model.train(data=r'D:\bilibili\model\ultralytics-main\ultralytics\cfg\datasets\VOC_my.yaml',
epochs=100, imgsz=640, batch=8)