当前位置：首页 > article >正文

深度学习每周学习总结J5（DenseNet-121 +SE 算法实战与解析 - 猴痘识别）

article 2025/2/21 2:59:16

🍨 本文为🔗365天深度学习训练营中的学习记录博客
🍖 原作者：K同学啊 | 接辅导、项目定制

0. 总结

数据导入及处理部分：本次数据导入没有使用torchvision自带的数据集，需要将原始数据进行处理包括数据导入，查看数据分类情况，定义transforms，进行数据类型转换等操作。

划分数据集：划定训练集测试集后，再使用torch.utils.data中的DataLoader()分别加载上一步处理好的训练及测试数据，查看批处理维度.

模型构建部分：DenseNet-121 + SE模块

设置超参数：在这之前需要定义损失函数，学习率（动态学习率），以及根据学习率定义优化器（例如SGD随机梯度下降），用来在训练中更新参数，最小化损失函数。

定义训练函数：函数的传入的参数有四个，分别是设置好的DataLoader(),定义好的模型，损失函数，优化器。函数内部初始化损失准确率为0，接着开始循环，使用DataLoader()获取一个批次的数据，对这个批次的数据带入模型得到预测值，然后使用损失函数计算得到损失值。接下来就是进行反向传播以及使用优化器优化参数，梯度清零放在反向传播之前或者是使用优化器优化之后都是可以的，一般是默认放在反向传播之前。

定义测试函数：函数传入的参数相比训练函数少了优化器，只需传入设置好的DataLoader(),定义好的模型，损失函数。此外除了处理批次数据时无需再设置梯度清零、返向传播以及优化器优化参数，其余部分均和训练函数保持一致。

训练过程：定义训练次数，有几次就使用整个数据集进行几次训练，初始化四个空list分别存储每次训练及测试的准确率及损失。使用model.train()开启训练模式，调用训练函数得到准确率及损失。使用model.eval()将模型设置为评估模式，调用测试函数得到准确率及损失。接着就是将得到的训练及测试的准确率及损失存储到相应list中并合并打印出来，得到每一次整体训练后的准确率及损失。

结果可视化

模型的保存，调取及使用。在PyTorch中，通常使用 torch.save(model.state_dict(), ‘model.pth’) 保存模型的参数，使用 model.load_state_dict(torch.load(‘model.pth’)) 加载参数。

需要改进优化的地方：确保模型和数据的一致性，都存到GPU或者CPU;注意numclasses不要直接用默认的1000，需要根据实际数据集改进；实例化模型也要注意numclasses这个参数；此外注意测试模型需要用（3,224,224）3表示通道数，这和tensorflow定义的顺序是不用的（224,224,3），做代码转换时需要注意。

import torch
import torch.nn as nn
import torchvision
from torchvision import datasets,transforms
from torch.utils.data import DataLoader
import torchvision.models as models
import torch.nn.functional as F
from collections import OrderedDict 


import os,PIL,pathlib
import matplotlib.pyplot as plt
import warnings

warnings.filterwarnings('ignore') # 忽略警告信息

plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False   # 用来正常显示负号
plt.rcParams['figure.dpi'] = 100 # 分辨率

1. 设置GPU

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device

device(type='cuda')

2. 导入数据及处理部分

# 获取数据分布情况
path_dir = './data/mpox_recognize/'
path_dir = pathlib.Path(path_dir)

paths = list(path_dir.glob('*'))
# classNames = [str(path).split("\\")[-1] for path in paths] # ['Bananaquit', 'Black Skimmer', 'Black Throated Bushtiti', 'Cockatoo']
classNames = [path.parts[-1] for path in paths]
classNames

['Monkeypox', 'Others']

# 定义transforms 并处理数据
train_transforms = transforms.Compose([
    transforms.Resize([224,224]),      # 将输入图片resize成统一尺寸
    transforms.RandomHorizontalFlip(), # 随机水平翻转
    transforms.ToTensor(),             # 将PIL Image 或 numpy.ndarray 装换为tensor,并归一化到[0,1]之间
    transforms.Normalize(              # 标准化处理 --> 转换为标准正太分布（高斯分布），使模型更容易收敛
        mean = [0.485,0.456,0.406],    # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
        std = [0.229,0.224,0.225]
    )
])
test_transforms = transforms.Compose([
    transforms.Resize([224,224]),
    transforms.ToTensor(),
    transforms.Normalize(
        mean = [0.485,0.456,0.406],
        std = [0.229,0.224,0.225]
    )
])
total_data = datasets.ImageFolder('./data/mpox_recognize/',transform = train_transforms)
total_data

Dataset ImageFolder
    Number of datapoints: 2142
    Root location: ./data/mpox_recognize/
    StandardTransform
Transform: Compose(
               Resize(size=[224, 224], interpolation=bilinear, max_size=None, antialias=True)
               RandomHorizontalFlip(p=0.5)
               ToTensor()
               Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
           )

total_data.class_to_idx

{'Monkeypox': 0, 'Others': 1}

3. 划分数据集

# 划分数据集
train_size = int(len(total_data) * 0.8)
test_size = len(total_data) - train_size

train_dataset,test_dataset = torch.utils.data.random_split(total_data,[train_size,test_size])
train_dataset,test_dataset

(<torch.utils.data.dataset.Subset at 0x18230109120>,
 <torch.utils.data.dataset.Subset at 0x182300d2cb0>)

# 定义DataLoader用于数据集的加载

batch_size = 32

train_dl = torch.utils.data.DataLoader(
    train_dataset,
    batch_size = batch_size,
    shuffle = True,
    num_workers = 1
)
test_dl = torch.utils.data.DataLoader(
    test_dataset,
    batch_size = batch_size,
    shuffle = True,
    num_workers = 1
)

# 观察数据维度
for X,y in test_dl:
    print("Shape of X [N,C,H,W]: ",X.shape)
    print("Shape of y: ", y.shape,y.dtype)
    break

Shape of X [N,C,H,W]:  torch.Size([32, 3, 224, 224])
Shape of y:  torch.Size([32]) torch.int64

4. 模型构建部分

SE 模块

代码解释：

Squeeze操作：使用nn.AdaptiveAvgPool2d(1)来实现全局平均池化，它将输入张量的空间维度（H x W）池化成1x1大小，保留每个通道的平均值。
Excitation操作：将池化后的输出通过两个全连接层（fc1 和 fc2）。第一个全连接层的输出维度是filter_sq，然后通过ReLU激活，再经过第二个全连接层输出1个值，最后通过Sigmoid激活函数将值压缩到[0, 1]之间，表示每个通道的权重。
Scale操作：对输入的特征图进行按通道加权操作，输出加权后的特征图。

运行示例：

该代码中创建了一个SqueezeExcitationLayer实例并用一个形状为(1, 32, 32, 32)的输入张量进行测试，输出的形状将与输入形状相同，因为SE模块是一个逐通道加权操作，不改变空间维度。

# import torch
# import torch.nn as nn
# import torch.nn.functional as F

# class SqueezeExcitationLayer(nn.Module):
#     def __init__(self, filter_sq):
#         # filter_sq 是 Excitation 中第一个全连接层的输出通道数
#         super(SqueezeExcitationLayer, self).__init__()
#         self.filter_sq = filter_sq
#         self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # 等效于全局平均池化
#         self.fc1 = nn.Linear(1, filter_sq)  # 输入通道数是1（全局池化后的输出），输出通道数是filter_sq
#         self.relu = nn.ReLU()
#         self.fc2 = nn.Linear(filter_sq, 1)  # 最后的输出通道数为1（每个通道的权重）
#         self.sigmoid = nn.Sigmoid()

#     def forward(self, x):
#         # Squeeze阶段
#         squeeze = self.global_avg_pool(x)  # Shape: (batch_size, channels, 1, 1)
#         squeeze = squeeze.view(squeeze.size(0), -1)  # 拉平成(batch_size, channels)

#         # Excitation阶段
#         excitation = self.fc1(squeeze)  # Shape: (batch_size, filter_sq)
#         excitation = self.relu(excitation)
#         excitation = self.fc2(excitation)  # Shape: (batch_size, 1)
#         excitation = self.sigmoid(excitation)  # Shape: (batch_size, 1)

#         # Reshape back to match input dimensions for element-wise multiplication
#         excitation = excitation.view(excitation.size(0), excitation.size(1), 1, 1)  # Shape: (batch_size, channels, 1, 1)

#         # Scale input with excitation weights
#         scale = x * excitation  # Element-wise multiplication

#         return scale

# # 示例：创建一个SqueezeExcitation层并通过它传入一个dummy输入
# SE = SqueezeExcitationLayer(16)
# inputs = torch.zeros((1, 32, 32, 32))  # 输入张量，形状为 (batch_size, channels, height, width)
# output = SE(inputs)  # 执行前向传播
# print(output.shape)  # 输出形状

class SqueezeExcitationLayer(nn.Module):
    def __init__(self, num_input_features, filter_sq):
        super(SqueezeExcitationLayer, self).__init__()
        self.filter_sq = filter_sq
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # 等效于全局平均池化
        self.fc1 = nn.Linear(num_input_features, filter_sq)  # 输入特征为num_input_features，输出特征为filter_sq
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(filter_sq, num_input_features)  # 最后的输出通道数与输入的通道数相同
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Squeeze阶段
        squeeze = self.global_avg_pool(x)  # Shape: (batch_size, channels, 1, 1)
        squeeze = squeeze.view(squeeze.size(0), -1)  # 拉平成(batch_size, channels)

        # Excitation阶段
        excitation = self.fc1(squeeze)  # Shape: (batch_size, filter_sq)
        excitation = self.relu(excitation)
        excitation = self.fc2(excitation)  # Shape: (batch_size, num_input_features)
        excitation = self.sigmoid(excitation)  # Shape: (batch_size, num_input_features)

        # Reshape back to match input dimensions for element-wise multiplication
        excitation = excitation.view(excitation.size(0), excitation.size(1), 1, 1)  # Shape: (batch_size, channels, 1, 1)

        # Scale input with excitation weights
        scale = x * excitation  # Element-wise multiplication

        return scale

# 调用SE模块时，确保传入的参数正确
inputs = torch.zeros((1, 32, 32, 32))  # 示例输入张量，注意channels的位置
inputs = inputs.permute(0, 3, 1, 2)  # 将输入的维度从 (batch_size, height, width, channels) 转换为 (batch_size, channels, height, width)

se = SqueezeExcitationLayer(32, 16)  # 32是输入通道数，16是filter_sq
output = se(inputs)
print(output.shape)

torch.Size([1, 32, 32, 32])

出现 RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x32 and 1x16) 错误是因为在 SE 模块的 fc1 层中，输入的形状和权重矩阵的形状不匹配。这个问题发生在全连接层时，通常是由于输入尺寸不符合全连接层的预期。

问题的根源：
在 SqueezeExcitationLayer 中，我们对输入进行全局平均池化后，得到的输出是 (batch_size, channels, 1, 1)。然后，试图将这个输出展平成 (batch_size, channels)，并传递给全连接层（fc1）。然而，fc1 层的输入特征数应与输入的通道数匹配。错误的根本原因是 fc1 层的输入尺寸不匹配。

解决方法：
要确保输入的形状与 fc1 层的输入特征数匹配，我们应该在 fc1 层的输入时，确保其维度正确。具体地，我们需要使用正确的输入特征大小（即 num_input_features，它应为输入张量的通道数）。

关键修改：
输入维度：在调用 SqueezeExcitationLayer 时，我们确保了输入张量的维度是 (batch_size, channels, height, width)。因为PyTorch通常处理的图像数据格式是 (batch_size, channels, height, width)，而不是 (batch_size, height, width, channels)。
fc1 层的输入：fc1 层的输入特征数应与输入张量的通道数（num_input_features）匹配。在调用 SqueezeExcitationLayer 时，确保了这一点。
这样，您应该能够顺利执行前向传播并得到正确的输出形状。

改进的DenseNET

要在你现有的DenseNet代码中加入SE模块（Squeeze-and-Excitation），我们需要对DenseNet中的每个_DenseLayer做一些修改，确保在每个DenseLayer后加入SE模块。SE模块的作用是通过学习通道重要性来调整每个通道的权重。我们可以将SE模块添加到每个_DenseLayer的输出中。

具体修改步骤：

在_DenseLayer中加入SE模块，使得每个DenseLayer的输出都经过SE模块的加权调整。
在DenseNet构造函数中，对每个_DenseLayer实例化时加入SE模块。

# class _DenseLayer(nn.Sequential):
#     """Basic unit of DenseBlock (using bottleneck layer) """
#     def __init__(self, num_input_features, growth_rate, bn_size, drop_rate):
#         super(_DenseLayer, self).__init__()
#         self.add_module("norm1", nn.BatchNorm2d(num_input_features))
#         self.add_module("relu1", nn.ReLU(inplace=True))
#         self.add_module("conv1", nn.Conv2d(num_input_features, bn_size*growth_rate,
#                                            kernel_size=1, stride=1, bias=False))
#         self.add_module("norm2", nn.BatchNorm2d(bn_size*growth_rate))
#         self.add_module("relu2", nn.ReLU(inplace=True))
#         self.add_module("conv2", nn.Conv2d(bn_size*growth_rate, growth_rate,
#                                            kernel_size=3, stride=1, padding=1, bias=False))
#         self.drop_rate = drop_rate

#     def forward(self, x):
#         new_features = super(_DenseLayer, self).forward(x)
#         if self.drop_rate > 0:
#             new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
#         return torch.cat([x, new_features], 1)
    
# class _DenseBlock(nn.Sequential):
#     """DenseBlock"""
#     def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate):
#         super(_DenseBlock, self).__init__()
#         for i in range(num_layers):
#             layer = _DenseLayer(num_input_features+i*growth_rate, growth_rate, bn_size,
#                                 drop_rate)
#             self.add_module("denselayer%d" % (i+1,), layer)
            
# class _Transition(nn.Sequential):
#     """Transition layer between two adjacent DenseBlock"""
#     def __init__(self, num_input_feature, num_output_features):
#         super(_Transition, self).__init__()
#         self.add_module("norm", nn.BatchNorm2d(num_input_feature))
#         self.add_module("relu", nn.ReLU(inplace=True))
#         self.add_module("conv", nn.Conv2d(num_input_feature, num_output_features,
#                                           kernel_size=1, stride=1, bias=False))
#         self.add_module("pool", nn.AvgPool2d(2, stride=2))

        
# class DenseNet(nn.Module):
#     "DenseNet-BC model"
#     def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64,
#                  bn_size=4, compression_rate=0.5, drop_rate=0, num_classes=1000):
#         """
#         :param growth_rate: (int) number of filters used in DenseLayer, `k` in the paper
#         :param block_config: (list of 4 ints) number of layers in each DenseBlock
#         :param num_init_features: (int) number of filters in the first Conv2d
#         :param bn_size: (int) the factor using in the bottleneck layer
#         :param compression_rate: (float) the compression rate used in Transition Layer
#         :param drop_rate: (float) the drop rate after each DenseLayer
#         :param num_classes: (int) number of classes for classification
#         """
#         super(DenseNet, self).__init__()
#         # first Conv2d
#         self.features = nn.Sequential(OrderedDict([
#             ("conv0", nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, padding=3, bias=False)),
#             ("norm0", nn.BatchNorm2d(num_init_features)),
#             ("relu0", nn.ReLU(inplace=True)),
#             ("pool0", nn.MaxPool2d(3, stride=2, padding=1))
#         ]))

#         # DenseBlock
#         num_features = num_init_features
#         for i, num_layers in enumerate(block_config):
#             block = _DenseBlock(num_layers, num_features, bn_size, growth_rate, drop_rate)
#             self.features.add_module("denseblock%d" % (i + 1), block)
#             num_features += num_layers*growth_rate
#             if i != len(block_config) - 1:
#                 transition = _Transition(num_features, int(num_features*compression_rate))
#                 self.features.add_module("transition%d" % (i + 1), transition)
#                 num_features = int(num_features * compression_rate)

#         # final bn+ReLU
#         self.features.add_module("norm5", nn.BatchNorm2d(num_features))
#         self.features.add_module("relu5", nn.ReLU(inplace=True))

#         # classification layer
#         self.classifier = nn.Linear(num_features, num_classes)

#         # params initialization
#         for m in self.modules():
#             if isinstance(m, nn.Conv2d):
#                 nn.init.kaiming_normal_(m.weight)
#             elif isinstance(m, nn.BatchNorm2d):
#                 nn.init.constant_(m.bias, 0)
#                 nn.init.constant_(m.weight, 1)
#             elif isinstance(m, nn.Linear):
#                 nn.init.constant_(m.bias, 0)

#     def forward(self, x):
#         features = self.features(x)
#         out = F.avg_pool2d(features, 7, stride=1).view(features.size(0), -1)
#         out = self.classifier(out)
#         return out

import torch
import torch.nn as nn
import torch.nn.functional as F
from collections import OrderedDict

class SqueezeExcitationLayer(nn.Module):
    def __init__(self, num_input_features, filter_sq):
        super(SqueezeExcitationLayer, self).__init__()
        self.filter_sq = filter_sq
        self.global_avg_pool = nn.AdaptiveAvgPool2d(1)  # 等效于全局平均池化
        self.fc1 = nn.Linear(num_input_features, filter_sq)  # 输入通道数是num_input_features，输出通道数是filter_sq
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(filter_sq, num_input_features)  # 最后的输出通道数与输入的通道数相同
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # Squeeze阶段
        squeeze = self.global_avg_pool(x)  # Shape: (batch_size, channels, 1, 1)
        squeeze = squeeze.view(squeeze.size(0), -1)  # 拉平成(batch_size, channels)

        # Excitation阶段
        excitation = self.fc1(squeeze)  # Shape: (batch_size, filter_sq)
        excitation = self.relu(excitation)
        excitation = self.fc2(excitation)  # Shape: (batch_size, num_input_features)
        excitation = self.sigmoid(excitation)  # Shape: (batch_size, num_input_features)

        # Reshape back to match input dimensions for element-wise multiplication
        excitation = excitation.view(excitation.size(0), excitation.size(1), 1, 1)  # Shape: (batch_size, channels, 1, 1)

        # Scale input with excitation weights
        scale = x * excitation  # Element-wise multiplication

        return scale

class _DenseLayer(nn.Sequential):
    """Basic unit of DenseBlock (using bottleneck layer) """
    def __init__(self, num_input_features, growth_rate, bn_size, drop_rate, se_filter_sq=16):
        super(_DenseLayer, self).__init__()
        self.add_module("norm1", nn.BatchNorm2d(num_input_features))
        self.add_module("relu1", nn.ReLU(inplace=True))
        self.add_module("conv1", nn.Conv2d(num_input_features, bn_size*growth_rate,
                                           kernel_size=1, stride=1, bias=False))
        self.add_module("norm2", nn.BatchNorm2d(bn_size*growth_rate))
        self.add_module("relu2", nn.ReLU(inplace=True))
        self.add_module("conv2", nn.Conv2d(bn_size*growth_rate, growth_rate,
                                           kernel_size=3, stride=1, padding=1, bias=False))
        
        # 添加SE模块
        self.se = SqueezeExcitationLayer(growth_rate, se_filter_sq)

        self.drop_rate = drop_rate

    def forward(self, x):
        new_features = super(_DenseLayer, self).forward(x)
        new_features = self.se(new_features)  # 将SE模块加到特征图上
        
        if self.drop_rate > 0:
            new_features = F.dropout(new_features, p=self.drop_rate, training=self.training)
        
        return torch.cat([x, new_features], 1)

class _DenseBlock(nn.Sequential):
    """DenseBlock"""
    def __init__(self, num_layers, num_input_features, bn_size, growth_rate, drop_rate, se_filter_sq=16):
        super(_DenseBlock, self).__init__()
        for i in range(num_layers):
            layer = _DenseLayer(num_input_features+i*growth_rate, growth_rate, bn_size,
                                drop_rate, se_filter_sq)
            self.add_module("denselayer%d" % (i+1,), layer)

class _Transition(nn.Sequential):
    """Transition layer between two adjacent DenseBlock"""
    def __init__(self, num_input_feature, num_output_features):
        super(_Transition, self).__init__()
        self.add_module("norm", nn.BatchNorm2d(num_input_feature))
        self.add_module("relu", nn.ReLU(inplace=True))
        self.add_module("conv", nn.Conv2d(num_input_feature, num_output_features,
                                          kernel_size=1, stride=1, bias=False))
        self.add_module("pool", nn.AvgPool2d(2, stride=2))

class DenseNet(nn.Module):
    "DenseNet-BC model"
    def __init__(self, growth_rate=32, block_config=(6, 12, 24, 16), num_init_features=64,
                 bn_size=4, compression_rate=0.5, drop_rate=0, num_classes=1000, se_filter_sq=16):
        """
        :param growth_rate: (int) number of filters used in DenseLayer, `k` in the paper
        :param block_config: (list of 4 ints) number of layers in each DenseBlock
        :param num_init_features: (int) number of filters in the first Conv2d
        :param bn_size: (int) the factor using in the bottleneck layer
        :param compression_rate: (float) the compression rate used in Transition Layer
        :param drop_rate: (float) the drop rate after each DenseLayer
        :param num_classes: (int) number of classes for classification
        :param se_filter_sq: (int) the number of filters used in SE module's fully connected layer
        """
        super(DenseNet, self).__init__()

        # first Conv2d
        self.features = nn.Sequential(OrderedDict([ 
            ("conv0", nn.Conv2d(3, num_init_features, kernel_size=7, stride=2, padding=3, bias=False)),
            ("norm0", nn.BatchNorm2d(num_init_features)),
            ("relu0", nn.ReLU(inplace=True)),
            ("pool0", nn.MaxPool2d(3, stride=2, padding=1))
        ]))

        # DenseBlock
        num_features = num_init_features
        for i, num_layers in enumerate(block_config):
            block = _DenseBlock(num_layers, num_features, bn_size, growth_rate, drop_rate, se_filter_sq)
            self.features.add_module("denseblock%d" % (i + 1), block)
            num_features += num_layers * growth_rate
            if i != len(block_config) - 1:
                transition = _Transition(num_features, int(num_features * compression_rate))
                self.features.add_module("transition%d" % (i + 1), transition)
                num_features = int(num_features * compression_rate)

        # final bn+ReLU
        self.features.add_module("norm5", nn.BatchNorm2d(num_features))
        self.features.add_module("relu5", nn.ReLU(inplace=True))

        # classification layer
        self.classifier = nn.Linear(num_features, num_classes)

        # params initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.bias, 0)
                nn.init.constant_(m.weight, 1)
            elif isinstance(m, nn.Linear):
                nn.init.constant_(m.bias, 0)

    def forward(self, x):
        features = self.features(x)
        out = F.avg_pool2d(features, 7, stride=1).view(features.size(0), -1)
        out = self.classifier(out)
        return out

代码解释：

SqueezeExcitationLayer：实现了SE模块，它根据输入通道的重要性生成一个权重（通过全局平均池化和两个全连接层）。
_DenseLayer：在每个DenseLayer后添加了SE模块，并将SE模块的输出与输入特征图相乘，从而对每个通道进行加权。
_DenseBlock：每个DenseBlock中的_DenseLayer都调用了SE模块。
DenseNet：在DenseNet类中，您可以指定se_filter_sq参数，这控制SE模块中的全连接层的大小。

这样，您的DenseNet就可以利用SE模块来提升其表示能力了。

# # Now, instantiate and use the model
# densenet121 = DenseNet(num_init_features=64, # init_channel=64,
#                        growth_rate=32,
#                        block_config=(6,12,24,16),
#                        num_classes=len(classNames))  

# model = densenet121.to(device)
# model

# Now, instantiate and use the model
se_filter_sq = 16  # 可以根据需要调整SE模块的输出大小

densenet121 = DenseNet(
    num_init_features=64,  # init_channel=64,
    growth_rate=32,
    block_config=(6, 12, 24, 16),
    num_classes=len(classNames),  # 根据您的分类任务设置类别数
    se_filter_sq=se_filter_sq  # 传递SE模块的参数
)

model = densenet121.to(device)  # 将模型移动到指定的设备上
model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer3): _DenseLayer(
        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer4): _DenseLayer(
        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer5): _DenseLayer(
        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer6): _DenseLayer(
        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
    )
    (transition1): _Transition(
      (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)
    )
    (denseblock2): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer3): _DenseLayer(
        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer4): _DenseLayer(
        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer5): _DenseLayer(
        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer6): _DenseLayer(
        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer7): _DenseLayer(
        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer8): _DenseLayer(
        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer9): _DenseLayer(
        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer10): _DenseLayer(
        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer11): _DenseLayer(
        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer12): _DenseLayer(
        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
    )
    (transition2): _Transition(
      (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)
    )
    (denseblock3): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer3): _DenseLayer(
        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer4): _DenseLayer(
        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer5): _DenseLayer(
        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer6): _DenseLayer(
        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer7): _DenseLayer(
        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer8): _DenseLayer(
        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer9): _DenseLayer(
        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer10): _DenseLayer(
        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer11): _DenseLayer(
        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer12): _DenseLayer(
        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer13): _DenseLayer(
        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer14): _DenseLayer(
        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer15): _DenseLayer(
        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer17): _DenseLayer(
        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer18): _DenseLayer(
        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer19): _DenseLayer(
        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer20): _DenseLayer(
        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer21): _DenseLayer(
        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer22): _DenseLayer(
        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer23): _DenseLayer(
        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer24): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
    )
    (transition3): _Transition(
      (norm): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)
    )
    (denseblock4): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer2): _DenseLayer(
        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer3): _DenseLayer(
        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer4): _DenseLayer(
        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer5): _DenseLayer(
        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer6): _DenseLayer(
        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer7): _DenseLayer(
        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer8): _DenseLayer(
        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer9): _DenseLayer(
        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer10): _DenseLayer(
        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer11): _DenseLayer(
        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer12): _DenseLayer(
        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer13): _DenseLayer(
        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer14): _DenseLayer(
        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer15): _DenseLayer(
        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (se): SqueezeExcitationLayer(
          (global_avg_pool): AdaptiveAvgPool2d(output_size=1)
          (fc1): Linear(in_features=32, out_features=16, bias=True)
          (relu): ReLU()
          (fc2): Linear(in_features=16, out_features=32, bias=True)
          (sigmoid): Sigmoid()
        )
      )
    )
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu5): ReLU(inplace=True)
  )
  (classifier): Linear(in_features=1024, out_features=2, bias=True)
)

解释：
se_filter_sq：在模型中传递了se_filter_sq参数，用于控制SE模块的内部全连接层的大小。您可以根据实验的需要调整此值。
其余的部分（num_init_features，growth_rate，block_config，num_classes等）可以根据您的需求调整。
通过这样修改，您的模型现在会正确地包括SE模块，并且能够按预期运行。

# 查看模型详情
import torchsummary as summary
summary.summary(model,(3,224,224))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
       BatchNorm2d-5           [-1, 64, 56, 56]             128
              ReLU-6           [-1, 64, 56, 56]               0
            Conv2d-7          [-1, 128, 56, 56]           8,192
       BatchNorm2d-8          [-1, 128, 56, 56]             256
              ReLU-9          [-1, 128, 56, 56]               0
           Conv2d-10           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-11             [-1, 32, 1, 1]               0
           Linear-12                   [-1, 16]             528
             ReLU-13                   [-1, 16]               0
           Linear-14                   [-1, 32]             544
          Sigmoid-15                   [-1, 32]               0
SqueezeExcitationLayer-16           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-17             [-1, 32, 1, 1]               0
           Linear-18                   [-1, 16]             528
             ReLU-19                   [-1, 16]               0
           Linear-20                   [-1, 32]             544
          Sigmoid-21                   [-1, 32]               0
SqueezeExcitationLayer-22           [-1, 32, 56, 56]               0
      BatchNorm2d-23           [-1, 96, 56, 56]             192
             ReLU-24           [-1, 96, 56, 56]               0
           Conv2d-25          [-1, 128, 56, 56]          12,288
      BatchNorm2d-26          [-1, 128, 56, 56]             256
             ReLU-27          [-1, 128, 56, 56]               0
           Conv2d-28           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-29             [-1, 32, 1, 1]               0
           Linear-30                   [-1, 16]             528
             ReLU-31                   [-1, 16]               0
           Linear-32                   [-1, 32]             544
          Sigmoid-33                   [-1, 32]               0
SqueezeExcitationLayer-34           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-35             [-1, 32, 1, 1]               0
           Linear-36                   [-1, 16]             528
             ReLU-37                   [-1, 16]               0
           Linear-38                   [-1, 32]             544
          Sigmoid-39                   [-1, 32]               0
SqueezeExcitationLayer-40           [-1, 32, 56, 56]               0
      BatchNorm2d-41          [-1, 128, 56, 56]             256
             ReLU-42          [-1, 128, 56, 56]               0
           Conv2d-43          [-1, 128, 56, 56]          16,384
      BatchNorm2d-44          [-1, 128, 56, 56]             256
             ReLU-45          [-1, 128, 56, 56]               0
           Conv2d-46           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-47             [-1, 32, 1, 1]               0
           Linear-48                   [-1, 16]             528
             ReLU-49                   [-1, 16]               0
           Linear-50                   [-1, 32]             544
          Sigmoid-51                   [-1, 32]               0
SqueezeExcitationLayer-52           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-53             [-1, 32, 1, 1]               0
           Linear-54                   [-1, 16]             528
             ReLU-55                   [-1, 16]               0
           Linear-56                   [-1, 32]             544
          Sigmoid-57                   [-1, 32]               0
SqueezeExcitationLayer-58           [-1, 32, 56, 56]               0
      BatchNorm2d-59          [-1, 160, 56, 56]             320
             ReLU-60          [-1, 160, 56, 56]               0
           Conv2d-61          [-1, 128, 56, 56]          20,480
      BatchNorm2d-62          [-1, 128, 56, 56]             256
             ReLU-63          [-1, 128, 56, 56]               0
           Conv2d-64           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-65             [-1, 32, 1, 1]               0
           Linear-66                   [-1, 16]             528
             ReLU-67                   [-1, 16]               0
           Linear-68                   [-1, 32]             544
          Sigmoid-69                   [-1, 32]               0
SqueezeExcitationLayer-70           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-71             [-1, 32, 1, 1]               0
           Linear-72                   [-1, 16]             528
             ReLU-73                   [-1, 16]               0
           Linear-74                   [-1, 32]             544
          Sigmoid-75                   [-1, 32]               0
SqueezeExcitationLayer-76           [-1, 32, 56, 56]               0
      BatchNorm2d-77          [-1, 192, 56, 56]             384
             ReLU-78          [-1, 192, 56, 56]               0
           Conv2d-79          [-1, 128, 56, 56]          24,576
      BatchNorm2d-80          [-1, 128, 56, 56]             256
             ReLU-81          [-1, 128, 56, 56]               0
           Conv2d-82           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-83             [-1, 32, 1, 1]               0
           Linear-84                   [-1, 16]             528
             ReLU-85                   [-1, 16]               0
           Linear-86                   [-1, 32]             544
          Sigmoid-87                   [-1, 32]               0
SqueezeExcitationLayer-88           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-89             [-1, 32, 1, 1]               0
           Linear-90                   [-1, 16]             528
             ReLU-91                   [-1, 16]               0
           Linear-92                   [-1, 32]             544
          Sigmoid-93                   [-1, 32]               0
SqueezeExcitationLayer-94           [-1, 32, 56, 56]               0
      BatchNorm2d-95          [-1, 224, 56, 56]             448
             ReLU-96          [-1, 224, 56, 56]               0
           Conv2d-97          [-1, 128, 56, 56]          28,672
      BatchNorm2d-98          [-1, 128, 56, 56]             256
             ReLU-99          [-1, 128, 56, 56]               0
          Conv2d-100           [-1, 32, 56, 56]          36,864
AdaptiveAvgPool2d-101             [-1, 32, 1, 1]               0
          Linear-102                   [-1, 16]             528
            ReLU-103                   [-1, 16]               0
          Linear-104                   [-1, 32]             544
         Sigmoid-105                   [-1, 32]               0
SqueezeExcitationLayer-106           [-1, 32, 56, 56]               0
AdaptiveAvgPool2d-107             [-1, 32, 1, 1]               0
          Linear-108                   [-1, 16]             528
            ReLU-109                   [-1, 16]               0
          Linear-110                   [-1, 32]             544
         Sigmoid-111                   [-1, 32]               0
SqueezeExcitationLayer-112           [-1, 32, 56, 56]               0
     BatchNorm2d-113          [-1, 256, 56, 56]             512
            ReLU-114          [-1, 256, 56, 56]               0
          Conv2d-115          [-1, 128, 56, 56]          32,768
       AvgPool2d-116          [-1, 128, 28, 28]               0
     BatchNorm2d-117          [-1, 128, 28, 28]             256
            ReLU-118          [-1, 128, 28, 28]               0
          Conv2d-119          [-1, 128, 28, 28]          16,384
     BatchNorm2d-120          [-1, 128, 28, 28]             256
            ReLU-121          [-1, 128, 28, 28]               0
          Conv2d-122           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-123             [-1, 32, 1, 1]               0
          Linear-124                   [-1, 16]             528
            ReLU-125                   [-1, 16]               0
          Linear-126                   [-1, 32]             544
         Sigmoid-127                   [-1, 32]               0
SqueezeExcitationLayer-128           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-129             [-1, 32, 1, 1]               0
          Linear-130                   [-1, 16]             528
            ReLU-131                   [-1, 16]               0
          Linear-132                   [-1, 32]             544
         Sigmoid-133                   [-1, 32]               0
SqueezeExcitationLayer-134           [-1, 32, 28, 28]               0
     BatchNorm2d-135          [-1, 160, 28, 28]             320
            ReLU-136          [-1, 160, 28, 28]               0
          Conv2d-137          [-1, 128, 28, 28]          20,480
     BatchNorm2d-138          [-1, 128, 28, 28]             256
            ReLU-139          [-1, 128, 28, 28]               0
          Conv2d-140           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-141             [-1, 32, 1, 1]               0
          Linear-142                   [-1, 16]             528
            ReLU-143                   [-1, 16]               0
          Linear-144                   [-1, 32]             544
         Sigmoid-145                   [-1, 32]               0
SqueezeExcitationLayer-146           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-147             [-1, 32, 1, 1]               0
          Linear-148                   [-1, 16]             528
            ReLU-149                   [-1, 16]               0
          Linear-150                   [-1, 32]             544
         Sigmoid-151                   [-1, 32]               0
SqueezeExcitationLayer-152           [-1, 32, 28, 28]               0
     BatchNorm2d-153          [-1, 192, 28, 28]             384
            ReLU-154          [-1, 192, 28, 28]               0
          Conv2d-155          [-1, 128, 28, 28]          24,576
     BatchNorm2d-156          [-1, 128, 28, 28]             256
            ReLU-157          [-1, 128, 28, 28]               0
          Conv2d-158           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-159             [-1, 32, 1, 1]               0
          Linear-160                   [-1, 16]             528
            ReLU-161                   [-1, 16]               0
          Linear-162                   [-1, 32]             544
         Sigmoid-163                   [-1, 32]               0
SqueezeExcitationLayer-164           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-165             [-1, 32, 1, 1]               0
          Linear-166                   [-1, 16]             528
            ReLU-167                   [-1, 16]               0
          Linear-168                   [-1, 32]             544
         Sigmoid-169                   [-1, 32]               0
SqueezeExcitationLayer-170           [-1, 32, 28, 28]               0
     BatchNorm2d-171          [-1, 224, 28, 28]             448
            ReLU-172          [-1, 224, 28, 28]               0
          Conv2d-173          [-1, 128, 28, 28]          28,672
     BatchNorm2d-174          [-1, 128, 28, 28]             256
            ReLU-175          [-1, 128, 28, 28]               0
          Conv2d-176           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-177             [-1, 32, 1, 1]               0
          Linear-178                   [-1, 16]             528
            ReLU-179                   [-1, 16]               0
          Linear-180                   [-1, 32]             544
         Sigmoid-181                   [-1, 32]               0
SqueezeExcitationLayer-182           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-183             [-1, 32, 1, 1]               0
          Linear-184                   [-1, 16]             528
            ReLU-185                   [-1, 16]               0
          Linear-186                   [-1, 32]             544
         Sigmoid-187                   [-1, 32]               0
SqueezeExcitationLayer-188           [-1, 32, 28, 28]               0
     BatchNorm2d-189          [-1, 256, 28, 28]             512
            ReLU-190          [-1, 256, 28, 28]               0
          Conv2d-191          [-1, 128, 28, 28]          32,768
     BatchNorm2d-192          [-1, 128, 28, 28]             256
            ReLU-193          [-1, 128, 28, 28]               0
          Conv2d-194           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-195             [-1, 32, 1, 1]               0
          Linear-196                   [-1, 16]             528
            ReLU-197                   [-1, 16]               0
          Linear-198                   [-1, 32]             544
         Sigmoid-199                   [-1, 32]               0
SqueezeExcitationLayer-200           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-201             [-1, 32, 1, 1]               0
          Linear-202                   [-1, 16]             528
            ReLU-203                   [-1, 16]               0
          Linear-204                   [-1, 32]             544
         Sigmoid-205                   [-1, 32]               0
SqueezeExcitationLayer-206           [-1, 32, 28, 28]               0
     BatchNorm2d-207          [-1, 288, 28, 28]             576
            ReLU-208          [-1, 288, 28, 28]               0
          Conv2d-209          [-1, 128, 28, 28]          36,864
     BatchNorm2d-210          [-1, 128, 28, 28]             256
            ReLU-211          [-1, 128, 28, 28]               0
          Conv2d-212           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-213             [-1, 32, 1, 1]               0
          Linear-214                   [-1, 16]             528
            ReLU-215                   [-1, 16]               0
          Linear-216                   [-1, 32]             544
         Sigmoid-217                   [-1, 32]               0
SqueezeExcitationLayer-218           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-219             [-1, 32, 1, 1]               0
          Linear-220                   [-1, 16]             528
            ReLU-221                   [-1, 16]               0
          Linear-222                   [-1, 32]             544
         Sigmoid-223                   [-1, 32]               0
SqueezeExcitationLayer-224           [-1, 32, 28, 28]               0
     BatchNorm2d-225          [-1, 320, 28, 28]             640
            ReLU-226          [-1, 320, 28, 28]               0
          Conv2d-227          [-1, 128, 28, 28]          40,960
     BatchNorm2d-228          [-1, 128, 28, 28]             256
            ReLU-229          [-1, 128, 28, 28]               0
          Conv2d-230           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-231             [-1, 32, 1, 1]               0
          Linear-232                   [-1, 16]             528
            ReLU-233                   [-1, 16]               0
          Linear-234                   [-1, 32]             544
         Sigmoid-235                   [-1, 32]               0
SqueezeExcitationLayer-236           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-237             [-1, 32, 1, 1]               0
          Linear-238                   [-1, 16]             528
            ReLU-239                   [-1, 16]               0
          Linear-240                   [-1, 32]             544
         Sigmoid-241                   [-1, 32]               0
SqueezeExcitationLayer-242           [-1, 32, 28, 28]               0
     BatchNorm2d-243          [-1, 352, 28, 28]             704
            ReLU-244          [-1, 352, 28, 28]               0
          Conv2d-245          [-1, 128, 28, 28]          45,056
     BatchNorm2d-246          [-1, 128, 28, 28]             256
            ReLU-247          [-1, 128, 28, 28]               0
          Conv2d-248           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-249             [-1, 32, 1, 1]               0
          Linear-250                   [-1, 16]             528
            ReLU-251                   [-1, 16]               0
          Linear-252                   [-1, 32]             544
         Sigmoid-253                   [-1, 32]               0
SqueezeExcitationLayer-254           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-255             [-1, 32, 1, 1]               0
          Linear-256                   [-1, 16]             528
            ReLU-257                   [-1, 16]               0
          Linear-258                   [-1, 32]             544
         Sigmoid-259                   [-1, 32]               0
SqueezeExcitationLayer-260           [-1, 32, 28, 28]               0
     BatchNorm2d-261          [-1, 384, 28, 28]             768
            ReLU-262          [-1, 384, 28, 28]               0
          Conv2d-263          [-1, 128, 28, 28]          49,152
     BatchNorm2d-264          [-1, 128, 28, 28]             256
            ReLU-265          [-1, 128, 28, 28]               0
          Conv2d-266           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-267             [-1, 32, 1, 1]               0
          Linear-268                   [-1, 16]             528
            ReLU-269                   [-1, 16]               0
          Linear-270                   [-1, 32]             544
         Sigmoid-271                   [-1, 32]               0
SqueezeExcitationLayer-272           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-273             [-1, 32, 1, 1]               0
          Linear-274                   [-1, 16]             528
            ReLU-275                   [-1, 16]               0
          Linear-276                   [-1, 32]             544
         Sigmoid-277                   [-1, 32]               0
SqueezeExcitationLayer-278           [-1, 32, 28, 28]               0
     BatchNorm2d-279          [-1, 416, 28, 28]             832
            ReLU-280          [-1, 416, 28, 28]               0
          Conv2d-281          [-1, 128, 28, 28]          53,248
     BatchNorm2d-282          [-1, 128, 28, 28]             256
            ReLU-283          [-1, 128, 28, 28]               0
          Conv2d-284           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-285             [-1, 32, 1, 1]               0
          Linear-286                   [-1, 16]             528
            ReLU-287                   [-1, 16]               0
          Linear-288                   [-1, 32]             544
         Sigmoid-289                   [-1, 32]               0
SqueezeExcitationLayer-290           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-291             [-1, 32, 1, 1]               0
          Linear-292                   [-1, 16]             528
            ReLU-293                   [-1, 16]               0
          Linear-294                   [-1, 32]             544
         Sigmoid-295                   [-1, 32]               0
SqueezeExcitationLayer-296           [-1, 32, 28, 28]               0
     BatchNorm2d-297          [-1, 448, 28, 28]             896
            ReLU-298          [-1, 448, 28, 28]               0
          Conv2d-299          [-1, 128, 28, 28]          57,344
     BatchNorm2d-300          [-1, 128, 28, 28]             256
            ReLU-301          [-1, 128, 28, 28]               0
          Conv2d-302           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-303             [-1, 32, 1, 1]               0
          Linear-304                   [-1, 16]             528
            ReLU-305                   [-1, 16]               0
          Linear-306                   [-1, 32]             544
         Sigmoid-307                   [-1, 32]               0
SqueezeExcitationLayer-308           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-309             [-1, 32, 1, 1]               0
          Linear-310                   [-1, 16]             528
            ReLU-311                   [-1, 16]               0
          Linear-312                   [-1, 32]             544
         Sigmoid-313                   [-1, 32]               0
SqueezeExcitationLayer-314           [-1, 32, 28, 28]               0
     BatchNorm2d-315          [-1, 480, 28, 28]             960
            ReLU-316          [-1, 480, 28, 28]               0
          Conv2d-317          [-1, 128, 28, 28]          61,440
     BatchNorm2d-318          [-1, 128, 28, 28]             256
            ReLU-319          [-1, 128, 28, 28]               0
          Conv2d-320           [-1, 32, 28, 28]          36,864
AdaptiveAvgPool2d-321             [-1, 32, 1, 1]               0
          Linear-322                   [-1, 16]             528
            ReLU-323                   [-1, 16]               0
          Linear-324                   [-1, 32]             544
         Sigmoid-325                   [-1, 32]               0
SqueezeExcitationLayer-326           [-1, 32, 28, 28]               0
AdaptiveAvgPool2d-327             [-1, 32, 1, 1]               0
          Linear-328                   [-1, 16]             528
            ReLU-329                   [-1, 16]               0
          Linear-330                   [-1, 32]             544
         Sigmoid-331                   [-1, 32]               0
SqueezeExcitationLayer-332           [-1, 32, 28, 28]               0
     BatchNorm2d-333          [-1, 512, 28, 28]           1,024
            ReLU-334          [-1, 512, 28, 28]               0
          Conv2d-335          [-1, 256, 28, 28]         131,072
       AvgPool2d-336          [-1, 256, 14, 14]               0
     BatchNorm2d-337          [-1, 256, 14, 14]             512
            ReLU-338          [-1, 256, 14, 14]               0
          Conv2d-339          [-1, 128, 14, 14]          32,768
     BatchNorm2d-340          [-1, 128, 14, 14]             256
            ReLU-341          [-1, 128, 14, 14]               0
          Conv2d-342           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-343             [-1, 32, 1, 1]               0
          Linear-344                   [-1, 16]             528
            ReLU-345                   [-1, 16]               0
          Linear-346                   [-1, 32]             544
         Sigmoid-347                   [-1, 32]               0
SqueezeExcitationLayer-348           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-349             [-1, 32, 1, 1]               0
          Linear-350                   [-1, 16]             528
            ReLU-351                   [-1, 16]               0
          Linear-352                   [-1, 32]             544
         Sigmoid-353                   [-1, 32]               0
SqueezeExcitationLayer-354           [-1, 32, 14, 14]               0
     BatchNorm2d-355          [-1, 288, 14, 14]             576
            ReLU-356          [-1, 288, 14, 14]               0
          Conv2d-357          [-1, 128, 14, 14]          36,864
     BatchNorm2d-358          [-1, 128, 14, 14]             256
            ReLU-359          [-1, 128, 14, 14]               0
          Conv2d-360           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-361             [-1, 32, 1, 1]               0
          Linear-362                   [-1, 16]             528
            ReLU-363                   [-1, 16]               0
          Linear-364                   [-1, 32]             544
         Sigmoid-365                   [-1, 32]               0
SqueezeExcitationLayer-366           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-367             [-1, 32, 1, 1]               0
          Linear-368                   [-1, 16]             528
            ReLU-369                   [-1, 16]               0
          Linear-370                   [-1, 32]             544
         Sigmoid-371                   [-1, 32]               0
SqueezeExcitationLayer-372           [-1, 32, 14, 14]               0
     BatchNorm2d-373          [-1, 320, 14, 14]             640
            ReLU-374          [-1, 320, 14, 14]               0
          Conv2d-375          [-1, 128, 14, 14]          40,960
     BatchNorm2d-376          [-1, 128, 14, 14]             256
            ReLU-377          [-1, 128, 14, 14]               0
          Conv2d-378           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-379             [-1, 32, 1, 1]               0
          Linear-380                   [-1, 16]             528
            ReLU-381                   [-1, 16]               0
          Linear-382                   [-1, 32]             544
         Sigmoid-383                   [-1, 32]               0
SqueezeExcitationLayer-384           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-385             [-1, 32, 1, 1]               0
          Linear-386                   [-1, 16]             528
            ReLU-387                   [-1, 16]               0
          Linear-388                   [-1, 32]             544
         Sigmoid-389                   [-1, 32]               0
SqueezeExcitationLayer-390           [-1, 32, 14, 14]               0
     BatchNorm2d-391          [-1, 352, 14, 14]             704
            ReLU-392          [-1, 352, 14, 14]               0
          Conv2d-393          [-1, 128, 14, 14]          45,056
     BatchNorm2d-394          [-1, 128, 14, 14]             256
            ReLU-395          [-1, 128, 14, 14]               0
          Conv2d-396           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-397             [-1, 32, 1, 1]               0
          Linear-398                   [-1, 16]             528
            ReLU-399                   [-1, 16]               0
          Linear-400                   [-1, 32]             544
         Sigmoid-401                   [-1, 32]               0
SqueezeExcitationLayer-402           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-403             [-1, 32, 1, 1]               0
          Linear-404                   [-1, 16]             528
            ReLU-405                   [-1, 16]               0
          Linear-406                   [-1, 32]             544
         Sigmoid-407                   [-1, 32]               0
SqueezeExcitationLayer-408           [-1, 32, 14, 14]               0
     BatchNorm2d-409          [-1, 384, 14, 14]             768
            ReLU-410          [-1, 384, 14, 14]               0
          Conv2d-411          [-1, 128, 14, 14]          49,152
     BatchNorm2d-412          [-1, 128, 14, 14]             256
            ReLU-413          [-1, 128, 14, 14]               0
          Conv2d-414           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-415             [-1, 32, 1, 1]               0
          Linear-416                   [-1, 16]             528
            ReLU-417                   [-1, 16]               0
          Linear-418                   [-1, 32]             544
         Sigmoid-419                   [-1, 32]               0
SqueezeExcitationLayer-420           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-421             [-1, 32, 1, 1]               0
          Linear-422                   [-1, 16]             528
            ReLU-423                   [-1, 16]               0
          Linear-424                   [-1, 32]             544
         Sigmoid-425                   [-1, 32]               0
SqueezeExcitationLayer-426           [-1, 32, 14, 14]               0
     BatchNorm2d-427          [-1, 416, 14, 14]             832
            ReLU-428          [-1, 416, 14, 14]               0
          Conv2d-429          [-1, 128, 14, 14]          53,248
     BatchNorm2d-430          [-1, 128, 14, 14]             256
            ReLU-431          [-1, 128, 14, 14]               0
          Conv2d-432           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-433             [-1, 32, 1, 1]               0
          Linear-434                   [-1, 16]             528
            ReLU-435                   [-1, 16]               0
          Linear-436                   [-1, 32]             544
         Sigmoid-437                   [-1, 32]               0
SqueezeExcitationLayer-438           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-439             [-1, 32, 1, 1]               0
          Linear-440                   [-1, 16]             528
            ReLU-441                   [-1, 16]               0
          Linear-442                   [-1, 32]             544
         Sigmoid-443                   [-1, 32]               0
SqueezeExcitationLayer-444           [-1, 32, 14, 14]               0
     BatchNorm2d-445          [-1, 448, 14, 14]             896
            ReLU-446          [-1, 448, 14, 14]               0
          Conv2d-447          [-1, 128, 14, 14]          57,344
     BatchNorm2d-448          [-1, 128, 14, 14]             256
            ReLU-449          [-1, 128, 14, 14]               0
          Conv2d-450           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-451             [-1, 32, 1, 1]               0
          Linear-452                   [-1, 16]             528
            ReLU-453                   [-1, 16]               0
          Linear-454                   [-1, 32]             544
         Sigmoid-455                   [-1, 32]               0
SqueezeExcitationLayer-456           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-457             [-1, 32, 1, 1]               0
          Linear-458                   [-1, 16]             528
            ReLU-459                   [-1, 16]               0
          Linear-460                   [-1, 32]             544
         Sigmoid-461                   [-1, 32]               0
SqueezeExcitationLayer-462           [-1, 32, 14, 14]               0
     BatchNorm2d-463          [-1, 480, 14, 14]             960
            ReLU-464          [-1, 480, 14, 14]               0
          Conv2d-465          [-1, 128, 14, 14]          61,440
     BatchNorm2d-466          [-1, 128, 14, 14]             256
            ReLU-467          [-1, 128, 14, 14]               0
          Conv2d-468           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-469             [-1, 32, 1, 1]               0
          Linear-470                   [-1, 16]             528
            ReLU-471                   [-1, 16]               0
          Linear-472                   [-1, 32]             544
         Sigmoid-473                   [-1, 32]               0
SqueezeExcitationLayer-474           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-475             [-1, 32, 1, 1]               0
          Linear-476                   [-1, 16]             528
            ReLU-477                   [-1, 16]               0
          Linear-478                   [-1, 32]             544
         Sigmoid-479                   [-1, 32]               0
SqueezeExcitationLayer-480           [-1, 32, 14, 14]               0
     BatchNorm2d-481          [-1, 512, 14, 14]           1,024
            ReLU-482          [-1, 512, 14, 14]               0
          Conv2d-483          [-1, 128, 14, 14]          65,536
     BatchNorm2d-484          [-1, 128, 14, 14]             256
            ReLU-485          [-1, 128, 14, 14]               0
          Conv2d-486           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-487             [-1, 32, 1, 1]               0
          Linear-488                   [-1, 16]             528
            ReLU-489                   [-1, 16]               0
          Linear-490                   [-1, 32]             544
         Sigmoid-491                   [-1, 32]               0
SqueezeExcitationLayer-492           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-493             [-1, 32, 1, 1]               0
          Linear-494                   [-1, 16]             528
            ReLU-495                   [-1, 16]               0
          Linear-496                   [-1, 32]             544
         Sigmoid-497                   [-1, 32]               0
SqueezeExcitationLayer-498           [-1, 32, 14, 14]               0
     BatchNorm2d-499          [-1, 544, 14, 14]           1,088
            ReLU-500          [-1, 544, 14, 14]               0
          Conv2d-501          [-1, 128, 14, 14]          69,632
     BatchNorm2d-502          [-1, 128, 14, 14]             256
            ReLU-503          [-1, 128, 14, 14]               0
          Conv2d-504           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-505             [-1, 32, 1, 1]               0
          Linear-506                   [-1, 16]             528
            ReLU-507                   [-1, 16]               0
          Linear-508                   [-1, 32]             544
         Sigmoid-509                   [-1, 32]               0
SqueezeExcitationLayer-510           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-511             [-1, 32, 1, 1]               0
          Linear-512                   [-1, 16]             528
            ReLU-513                   [-1, 16]               0
          Linear-514                   [-1, 32]             544
         Sigmoid-515                   [-1, 32]               0
SqueezeExcitationLayer-516           [-1, 32, 14, 14]               0
     BatchNorm2d-517          [-1, 576, 14, 14]           1,152
            ReLU-518          [-1, 576, 14, 14]               0
          Conv2d-519          [-1, 128, 14, 14]          73,728
     BatchNorm2d-520          [-1, 128, 14, 14]             256
            ReLU-521          [-1, 128, 14, 14]               0
          Conv2d-522           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-523             [-1, 32, 1, 1]               0
          Linear-524                   [-1, 16]             528
            ReLU-525                   [-1, 16]               0
          Linear-526                   [-1, 32]             544
         Sigmoid-527                   [-1, 32]               0
SqueezeExcitationLayer-528           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-529             [-1, 32, 1, 1]               0
          Linear-530                   [-1, 16]             528
            ReLU-531                   [-1, 16]               0
          Linear-532                   [-1, 32]             544
         Sigmoid-533                   [-1, 32]               0
SqueezeExcitationLayer-534           [-1, 32, 14, 14]               0
     BatchNorm2d-535          [-1, 608, 14, 14]           1,216
            ReLU-536          [-1, 608, 14, 14]               0
          Conv2d-537          [-1, 128, 14, 14]          77,824
     BatchNorm2d-538          [-1, 128, 14, 14]             256
            ReLU-539          [-1, 128, 14, 14]               0
          Conv2d-540           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-541             [-1, 32, 1, 1]               0
          Linear-542                   [-1, 16]             528
            ReLU-543                   [-1, 16]               0
          Linear-544                   [-1, 32]             544
         Sigmoid-545                   [-1, 32]               0
SqueezeExcitationLayer-546           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-547             [-1, 32, 1, 1]               0
          Linear-548                   [-1, 16]             528
            ReLU-549                   [-1, 16]               0
          Linear-550                   [-1, 32]             544
         Sigmoid-551                   [-1, 32]               0
SqueezeExcitationLayer-552           [-1, 32, 14, 14]               0
     BatchNorm2d-553          [-1, 640, 14, 14]           1,280
            ReLU-554          [-1, 640, 14, 14]               0
          Conv2d-555          [-1, 128, 14, 14]          81,920
     BatchNorm2d-556          [-1, 128, 14, 14]             256
            ReLU-557          [-1, 128, 14, 14]               0
          Conv2d-558           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-559             [-1, 32, 1, 1]               0
          Linear-560                   [-1, 16]             528
            ReLU-561                   [-1, 16]               0
          Linear-562                   [-1, 32]             544
         Sigmoid-563                   [-1, 32]               0
SqueezeExcitationLayer-564           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-565             [-1, 32, 1, 1]               0
          Linear-566                   [-1, 16]             528
            ReLU-567                   [-1, 16]               0
          Linear-568                   [-1, 32]             544
         Sigmoid-569                   [-1, 32]               0
SqueezeExcitationLayer-570           [-1, 32, 14, 14]               0
     BatchNorm2d-571          [-1, 672, 14, 14]           1,344
            ReLU-572          [-1, 672, 14, 14]               0
          Conv2d-573          [-1, 128, 14, 14]          86,016
     BatchNorm2d-574          [-1, 128, 14, 14]             256
            ReLU-575          [-1, 128, 14, 14]               0
          Conv2d-576           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-577             [-1, 32, 1, 1]               0
          Linear-578                   [-1, 16]             528
            ReLU-579                   [-1, 16]               0
          Linear-580                   [-1, 32]             544
         Sigmoid-581                   [-1, 32]               0
SqueezeExcitationLayer-582           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-583             [-1, 32, 1, 1]               0
          Linear-584                   [-1, 16]             528
            ReLU-585                   [-1, 16]               0
          Linear-586                   [-1, 32]             544
         Sigmoid-587                   [-1, 32]               0
SqueezeExcitationLayer-588           [-1, 32, 14, 14]               0
     BatchNorm2d-589          [-1, 704, 14, 14]           1,408
            ReLU-590          [-1, 704, 14, 14]               0
          Conv2d-591          [-1, 128, 14, 14]          90,112
     BatchNorm2d-592          [-1, 128, 14, 14]             256
            ReLU-593          [-1, 128, 14, 14]               0
          Conv2d-594           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-595             [-1, 32, 1, 1]               0
          Linear-596                   [-1, 16]             528
            ReLU-597                   [-1, 16]               0
          Linear-598                   [-1, 32]             544
         Sigmoid-599                   [-1, 32]               0
SqueezeExcitationLayer-600           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-601             [-1, 32, 1, 1]               0
          Linear-602                   [-1, 16]             528
            ReLU-603                   [-1, 16]               0
          Linear-604                   [-1, 32]             544
         Sigmoid-605                   [-1, 32]               0
SqueezeExcitationLayer-606           [-1, 32, 14, 14]               0
     BatchNorm2d-607          [-1, 736, 14, 14]           1,472
            ReLU-608          [-1, 736, 14, 14]               0
          Conv2d-609          [-1, 128, 14, 14]          94,208
     BatchNorm2d-610          [-1, 128, 14, 14]             256
            ReLU-611          [-1, 128, 14, 14]               0
          Conv2d-612           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-613             [-1, 32, 1, 1]               0
          Linear-614                   [-1, 16]             528
            ReLU-615                   [-1, 16]               0
          Linear-616                   [-1, 32]             544
         Sigmoid-617                   [-1, 32]               0
SqueezeExcitationLayer-618           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-619             [-1, 32, 1, 1]               0
          Linear-620                   [-1, 16]             528
            ReLU-621                   [-1, 16]               0
          Linear-622                   [-1, 32]             544
         Sigmoid-623                   [-1, 32]               0
SqueezeExcitationLayer-624           [-1, 32, 14, 14]               0
     BatchNorm2d-625          [-1, 768, 14, 14]           1,536
            ReLU-626          [-1, 768, 14, 14]               0
          Conv2d-627          [-1, 128, 14, 14]          98,304
     BatchNorm2d-628          [-1, 128, 14, 14]             256
            ReLU-629          [-1, 128, 14, 14]               0
          Conv2d-630           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-631             [-1, 32, 1, 1]               0
          Linear-632                   [-1, 16]             528
            ReLU-633                   [-1, 16]               0
          Linear-634                   [-1, 32]             544
         Sigmoid-635                   [-1, 32]               0
SqueezeExcitationLayer-636           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-637             [-1, 32, 1, 1]               0
          Linear-638                   [-1, 16]             528
            ReLU-639                   [-1, 16]               0
          Linear-640                   [-1, 32]             544
         Sigmoid-641                   [-1, 32]               0
SqueezeExcitationLayer-642           [-1, 32, 14, 14]               0
     BatchNorm2d-643          [-1, 800, 14, 14]           1,600
            ReLU-644          [-1, 800, 14, 14]               0
          Conv2d-645          [-1, 128, 14, 14]         102,400
     BatchNorm2d-646          [-1, 128, 14, 14]             256
            ReLU-647          [-1, 128, 14, 14]               0
          Conv2d-648           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-649             [-1, 32, 1, 1]               0
          Linear-650                   [-1, 16]             528
            ReLU-651                   [-1, 16]               0
          Linear-652                   [-1, 32]             544
         Sigmoid-653                   [-1, 32]               0
SqueezeExcitationLayer-654           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-655             [-1, 32, 1, 1]               0
          Linear-656                   [-1, 16]             528
            ReLU-657                   [-1, 16]               0
          Linear-658                   [-1, 32]             544
         Sigmoid-659                   [-1, 32]               0
SqueezeExcitationLayer-660           [-1, 32, 14, 14]               0
     BatchNorm2d-661          [-1, 832, 14, 14]           1,664
            ReLU-662          [-1, 832, 14, 14]               0
          Conv2d-663          [-1, 128, 14, 14]         106,496
     BatchNorm2d-664          [-1, 128, 14, 14]             256
            ReLU-665          [-1, 128, 14, 14]               0
          Conv2d-666           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-667             [-1, 32, 1, 1]               0
          Linear-668                   [-1, 16]             528
            ReLU-669                   [-1, 16]               0
          Linear-670                   [-1, 32]             544
         Sigmoid-671                   [-1, 32]               0
SqueezeExcitationLayer-672           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-673             [-1, 32, 1, 1]               0
          Linear-674                   [-1, 16]             528
            ReLU-675                   [-1, 16]               0
          Linear-676                   [-1, 32]             544
         Sigmoid-677                   [-1, 32]               0
SqueezeExcitationLayer-678           [-1, 32, 14, 14]               0
     BatchNorm2d-679          [-1, 864, 14, 14]           1,728
            ReLU-680          [-1, 864, 14, 14]               0
          Conv2d-681          [-1, 128, 14, 14]         110,592
     BatchNorm2d-682          [-1, 128, 14, 14]             256
            ReLU-683          [-1, 128, 14, 14]               0
          Conv2d-684           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-685             [-1, 32, 1, 1]               0
          Linear-686                   [-1, 16]             528
            ReLU-687                   [-1, 16]               0
          Linear-688                   [-1, 32]             544
         Sigmoid-689                   [-1, 32]               0
SqueezeExcitationLayer-690           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-691             [-1, 32, 1, 1]               0
          Linear-692                   [-1, 16]             528
            ReLU-693                   [-1, 16]               0
          Linear-694                   [-1, 32]             544
         Sigmoid-695                   [-1, 32]               0
SqueezeExcitationLayer-696           [-1, 32, 14, 14]               0
     BatchNorm2d-697          [-1, 896, 14, 14]           1,792
            ReLU-698          [-1, 896, 14, 14]               0
          Conv2d-699          [-1, 128, 14, 14]         114,688
     BatchNorm2d-700          [-1, 128, 14, 14]             256
            ReLU-701          [-1, 128, 14, 14]               0
          Conv2d-702           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-703             [-1, 32, 1, 1]               0
          Linear-704                   [-1, 16]             528
            ReLU-705                   [-1, 16]               0
          Linear-706                   [-1, 32]             544
         Sigmoid-707                   [-1, 32]               0
SqueezeExcitationLayer-708           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-709             [-1, 32, 1, 1]               0
          Linear-710                   [-1, 16]             528
            ReLU-711                   [-1, 16]               0
          Linear-712                   [-1, 32]             544
         Sigmoid-713                   [-1, 32]               0
SqueezeExcitationLayer-714           [-1, 32, 14, 14]               0
     BatchNorm2d-715          [-1, 928, 14, 14]           1,856
            ReLU-716          [-1, 928, 14, 14]               0
          Conv2d-717          [-1, 128, 14, 14]         118,784
     BatchNorm2d-718          [-1, 128, 14, 14]             256
            ReLU-719          [-1, 128, 14, 14]               0
          Conv2d-720           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-721             [-1, 32, 1, 1]               0
          Linear-722                   [-1, 16]             528
            ReLU-723                   [-1, 16]               0
          Linear-724                   [-1, 32]             544
         Sigmoid-725                   [-1, 32]               0
SqueezeExcitationLayer-726           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-727             [-1, 32, 1, 1]               0
          Linear-728                   [-1, 16]             528
            ReLU-729                   [-1, 16]               0
          Linear-730                   [-1, 32]             544
         Sigmoid-731                   [-1, 32]               0
SqueezeExcitationLayer-732           [-1, 32, 14, 14]               0
     BatchNorm2d-733          [-1, 960, 14, 14]           1,920
            ReLU-734          [-1, 960, 14, 14]               0
          Conv2d-735          [-1, 128, 14, 14]         122,880
     BatchNorm2d-736          [-1, 128, 14, 14]             256
            ReLU-737          [-1, 128, 14, 14]               0
          Conv2d-738           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-739             [-1, 32, 1, 1]               0
          Linear-740                   [-1, 16]             528
            ReLU-741                   [-1, 16]               0
          Linear-742                   [-1, 32]             544
         Sigmoid-743                   [-1, 32]               0
SqueezeExcitationLayer-744           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-745             [-1, 32, 1, 1]               0
          Linear-746                   [-1, 16]             528
            ReLU-747                   [-1, 16]               0
          Linear-748                   [-1, 32]             544
         Sigmoid-749                   [-1, 32]               0
SqueezeExcitationLayer-750           [-1, 32, 14, 14]               0
     BatchNorm2d-751          [-1, 992, 14, 14]           1,984
            ReLU-752          [-1, 992, 14, 14]               0
          Conv2d-753          [-1, 128, 14, 14]         126,976
     BatchNorm2d-754          [-1, 128, 14, 14]             256
            ReLU-755          [-1, 128, 14, 14]               0
          Conv2d-756           [-1, 32, 14, 14]          36,864
AdaptiveAvgPool2d-757             [-1, 32, 1, 1]               0
          Linear-758                   [-1, 16]             528
            ReLU-759                   [-1, 16]               0
          Linear-760                   [-1, 32]             544
         Sigmoid-761                   [-1, 32]               0
SqueezeExcitationLayer-762           [-1, 32, 14, 14]               0
AdaptiveAvgPool2d-763             [-1, 32, 1, 1]               0
          Linear-764                   [-1, 16]             528
            ReLU-765                   [-1, 16]               0
          Linear-766                   [-1, 32]             544
         Sigmoid-767                   [-1, 32]               0
SqueezeExcitationLayer-768           [-1, 32, 14, 14]               0
     BatchNorm2d-769         [-1, 1024, 14, 14]           2,048
            ReLU-770         [-1, 1024, 14, 14]               0
          Conv2d-771          [-1, 512, 14, 14]         524,288
       AvgPool2d-772            [-1, 512, 7, 7]               0
     BatchNorm2d-773            [-1, 512, 7, 7]           1,024
            ReLU-774            [-1, 512, 7, 7]               0
          Conv2d-775            [-1, 128, 7, 7]          65,536
     BatchNorm2d-776            [-1, 128, 7, 7]             256
            ReLU-777            [-1, 128, 7, 7]               0
          Conv2d-778             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-779             [-1, 32, 1, 1]               0
          Linear-780                   [-1, 16]             528
            ReLU-781                   [-1, 16]               0
          Linear-782                   [-1, 32]             544
         Sigmoid-783                   [-1, 32]               0
SqueezeExcitationLayer-784             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-785             [-1, 32, 1, 1]               0
          Linear-786                   [-1, 16]             528
            ReLU-787                   [-1, 16]               0
          Linear-788                   [-1, 32]             544
         Sigmoid-789                   [-1, 32]               0
SqueezeExcitationLayer-790             [-1, 32, 7, 7]               0
     BatchNorm2d-791            [-1, 544, 7, 7]           1,088
            ReLU-792            [-1, 544, 7, 7]               0
          Conv2d-793            [-1, 128, 7, 7]          69,632
     BatchNorm2d-794            [-1, 128, 7, 7]             256
            ReLU-795            [-1, 128, 7, 7]               0
          Conv2d-796             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-797             [-1, 32, 1, 1]               0
          Linear-798                   [-1, 16]             528
            ReLU-799                   [-1, 16]               0
          Linear-800                   [-1, 32]             544
         Sigmoid-801                   [-1, 32]               0
SqueezeExcitationLayer-802             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-803             [-1, 32, 1, 1]               0
          Linear-804                   [-1, 16]             528
            ReLU-805                   [-1, 16]               0
          Linear-806                   [-1, 32]             544
         Sigmoid-807                   [-1, 32]               0
SqueezeExcitationLayer-808             [-1, 32, 7, 7]               0
     BatchNorm2d-809            [-1, 576, 7, 7]           1,152
            ReLU-810            [-1, 576, 7, 7]               0
          Conv2d-811            [-1, 128, 7, 7]          73,728
     BatchNorm2d-812            [-1, 128, 7, 7]             256
            ReLU-813            [-1, 128, 7, 7]               0
          Conv2d-814             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-815             [-1, 32, 1, 1]               0
          Linear-816                   [-1, 16]             528
            ReLU-817                   [-1, 16]               0
          Linear-818                   [-1, 32]             544
         Sigmoid-819                   [-1, 32]               0
SqueezeExcitationLayer-820             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-821             [-1, 32, 1, 1]               0
          Linear-822                   [-1, 16]             528
            ReLU-823                   [-1, 16]               0
          Linear-824                   [-1, 32]             544
         Sigmoid-825                   [-1, 32]               0
SqueezeExcitationLayer-826             [-1, 32, 7, 7]               0
     BatchNorm2d-827            [-1, 608, 7, 7]           1,216
            ReLU-828            [-1, 608, 7, 7]               0
          Conv2d-829            [-1, 128, 7, 7]          77,824
     BatchNorm2d-830            [-1, 128, 7, 7]             256
            ReLU-831            [-1, 128, 7, 7]               0
          Conv2d-832             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-833             [-1, 32, 1, 1]               0
          Linear-834                   [-1, 16]             528
            ReLU-835                   [-1, 16]               0
          Linear-836                   [-1, 32]             544
         Sigmoid-837                   [-1, 32]               0
SqueezeExcitationLayer-838             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-839             [-1, 32, 1, 1]               0
          Linear-840                   [-1, 16]             528
            ReLU-841                   [-1, 16]               0
          Linear-842                   [-1, 32]             544
         Sigmoid-843                   [-1, 32]               0
SqueezeExcitationLayer-844             [-1, 32, 7, 7]               0
     BatchNorm2d-845            [-1, 640, 7, 7]           1,280
            ReLU-846            [-1, 640, 7, 7]               0
          Conv2d-847            [-1, 128, 7, 7]          81,920
     BatchNorm2d-848            [-1, 128, 7, 7]             256
            ReLU-849            [-1, 128, 7, 7]               0
          Conv2d-850             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-851             [-1, 32, 1, 1]               0
          Linear-852                   [-1, 16]             528
            ReLU-853                   [-1, 16]               0
          Linear-854                   [-1, 32]             544
         Sigmoid-855                   [-1, 32]               0
SqueezeExcitationLayer-856             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-857             [-1, 32, 1, 1]               0
          Linear-858                   [-1, 16]             528
            ReLU-859                   [-1, 16]               0
          Linear-860                   [-1, 32]             544
         Sigmoid-861                   [-1, 32]               0
SqueezeExcitationLayer-862             [-1, 32, 7, 7]               0
     BatchNorm2d-863            [-1, 672, 7, 7]           1,344
            ReLU-864            [-1, 672, 7, 7]               0
          Conv2d-865            [-1, 128, 7, 7]          86,016
     BatchNorm2d-866            [-1, 128, 7, 7]             256
            ReLU-867            [-1, 128, 7, 7]               0
          Conv2d-868             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-869             [-1, 32, 1, 1]               0
          Linear-870                   [-1, 16]             528
            ReLU-871                   [-1, 16]               0
          Linear-872                   [-1, 32]             544
         Sigmoid-873                   [-1, 32]               0
SqueezeExcitationLayer-874             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-875             [-1, 32, 1, 1]               0
          Linear-876                   [-1, 16]             528
            ReLU-877                   [-1, 16]               0
          Linear-878                   [-1, 32]             544
         Sigmoid-879                   [-1, 32]               0
SqueezeExcitationLayer-880             [-1, 32, 7, 7]               0
     BatchNorm2d-881            [-1, 704, 7, 7]           1,408
            ReLU-882            [-1, 704, 7, 7]               0
          Conv2d-883            [-1, 128, 7, 7]          90,112
     BatchNorm2d-884            [-1, 128, 7, 7]             256
            ReLU-885            [-1, 128, 7, 7]               0
          Conv2d-886             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-887             [-1, 32, 1, 1]               0
          Linear-888                   [-1, 16]             528
            ReLU-889                   [-1, 16]               0
          Linear-890                   [-1, 32]             544
         Sigmoid-891                   [-1, 32]               0
SqueezeExcitationLayer-892             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-893             [-1, 32, 1, 1]               0
          Linear-894                   [-1, 16]             528
            ReLU-895                   [-1, 16]               0
          Linear-896                   [-1, 32]             544
         Sigmoid-897                   [-1, 32]               0
SqueezeExcitationLayer-898             [-1, 32, 7, 7]               0
     BatchNorm2d-899            [-1, 736, 7, 7]           1,472
            ReLU-900            [-1, 736, 7, 7]               0
          Conv2d-901            [-1, 128, 7, 7]          94,208
     BatchNorm2d-902            [-1, 128, 7, 7]             256
            ReLU-903            [-1, 128, 7, 7]               0
          Conv2d-904             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-905             [-1, 32, 1, 1]               0
          Linear-906                   [-1, 16]             528
            ReLU-907                   [-1, 16]               0
          Linear-908                   [-1, 32]             544
         Sigmoid-909                   [-1, 32]               0
SqueezeExcitationLayer-910             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-911             [-1, 32, 1, 1]               0
          Linear-912                   [-1, 16]             528
            ReLU-913                   [-1, 16]               0
          Linear-914                   [-1, 32]             544
         Sigmoid-915                   [-1, 32]               0
SqueezeExcitationLayer-916             [-1, 32, 7, 7]               0
     BatchNorm2d-917            [-1, 768, 7, 7]           1,536
            ReLU-918            [-1, 768, 7, 7]               0
          Conv2d-919            [-1, 128, 7, 7]          98,304
     BatchNorm2d-920            [-1, 128, 7, 7]             256
            ReLU-921            [-1, 128, 7, 7]               0
          Conv2d-922             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-923             [-1, 32, 1, 1]               0
          Linear-924                   [-1, 16]             528
            ReLU-925                   [-1, 16]               0
          Linear-926                   [-1, 32]             544
         Sigmoid-927                   [-1, 32]               0
SqueezeExcitationLayer-928             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-929             [-1, 32, 1, 1]               0
          Linear-930                   [-1, 16]             528
            ReLU-931                   [-1, 16]               0
          Linear-932                   [-1, 32]             544
         Sigmoid-933                   [-1, 32]               0
SqueezeExcitationLayer-934             [-1, 32, 7, 7]               0
     BatchNorm2d-935            [-1, 800, 7, 7]           1,600
            ReLU-936            [-1, 800, 7, 7]               0
          Conv2d-937            [-1, 128, 7, 7]         102,400
     BatchNorm2d-938            [-1, 128, 7, 7]             256
            ReLU-939            [-1, 128, 7, 7]               0
          Conv2d-940             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-941             [-1, 32, 1, 1]               0
          Linear-942                   [-1, 16]             528
            ReLU-943                   [-1, 16]               0
          Linear-944                   [-1, 32]             544
         Sigmoid-945                   [-1, 32]               0
SqueezeExcitationLayer-946             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-947             [-1, 32, 1, 1]               0
          Linear-948                   [-1, 16]             528
            ReLU-949                   [-1, 16]               0
          Linear-950                   [-1, 32]             544
         Sigmoid-951                   [-1, 32]               0
SqueezeExcitationLayer-952             [-1, 32, 7, 7]               0
     BatchNorm2d-953            [-1, 832, 7, 7]           1,664
            ReLU-954            [-1, 832, 7, 7]               0
          Conv2d-955            [-1, 128, 7, 7]         106,496
     BatchNorm2d-956            [-1, 128, 7, 7]             256
            ReLU-957            [-1, 128, 7, 7]               0
          Conv2d-958             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-959             [-1, 32, 1, 1]               0
          Linear-960                   [-1, 16]             528
            ReLU-961                   [-1, 16]               0
          Linear-962                   [-1, 32]             544
         Sigmoid-963                   [-1, 32]               0
SqueezeExcitationLayer-964             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-965             [-1, 32, 1, 1]               0
          Linear-966                   [-1, 16]             528
            ReLU-967                   [-1, 16]               0
          Linear-968                   [-1, 32]             544
         Sigmoid-969                   [-1, 32]               0
SqueezeExcitationLayer-970             [-1, 32, 7, 7]               0
     BatchNorm2d-971            [-1, 864, 7, 7]           1,728
            ReLU-972            [-1, 864, 7, 7]               0
          Conv2d-973            [-1, 128, 7, 7]         110,592
     BatchNorm2d-974            [-1, 128, 7, 7]             256
            ReLU-975            [-1, 128, 7, 7]               0
          Conv2d-976             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-977             [-1, 32, 1, 1]               0
          Linear-978                   [-1, 16]             528
            ReLU-979                   [-1, 16]               0
          Linear-980                   [-1, 32]             544
         Sigmoid-981                   [-1, 32]               0
SqueezeExcitationLayer-982             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-983             [-1, 32, 1, 1]               0
          Linear-984                   [-1, 16]             528
            ReLU-985                   [-1, 16]               0
          Linear-986                   [-1, 32]             544
         Sigmoid-987                   [-1, 32]               0
SqueezeExcitationLayer-988             [-1, 32, 7, 7]               0
     BatchNorm2d-989            [-1, 896, 7, 7]           1,792
            ReLU-990            [-1, 896, 7, 7]               0
          Conv2d-991            [-1, 128, 7, 7]         114,688
     BatchNorm2d-992            [-1, 128, 7, 7]             256
            ReLU-993            [-1, 128, 7, 7]               0
          Conv2d-994             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-995             [-1, 32, 1, 1]               0
          Linear-996                   [-1, 16]             528
            ReLU-997                   [-1, 16]               0
          Linear-998                   [-1, 32]             544
         Sigmoid-999                   [-1, 32]               0
SqueezeExcitationLayer-1000             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-1001             [-1, 32, 1, 1]               0
         Linear-1002                   [-1, 16]             528
           ReLU-1003                   [-1, 16]               0
         Linear-1004                   [-1, 32]             544
        Sigmoid-1005                   [-1, 32]               0
SqueezeExcitationLayer-1006             [-1, 32, 7, 7]               0
    BatchNorm2d-1007            [-1, 928, 7, 7]           1,856
           ReLU-1008            [-1, 928, 7, 7]               0
         Conv2d-1009            [-1, 128, 7, 7]         118,784
    BatchNorm2d-1010            [-1, 128, 7, 7]             256
           ReLU-1011            [-1, 128, 7, 7]               0
         Conv2d-1012             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-1013             [-1, 32, 1, 1]               0
         Linear-1014                   [-1, 16]             528
           ReLU-1015                   [-1, 16]               0
         Linear-1016                   [-1, 32]             544
        Sigmoid-1017                   [-1, 32]               0
SqueezeExcitationLayer-1018             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-1019             [-1, 32, 1, 1]               0
         Linear-1020                   [-1, 16]             528
           ReLU-1021                   [-1, 16]               0
         Linear-1022                   [-1, 32]             544
        Sigmoid-1023                   [-1, 32]               0
SqueezeExcitationLayer-1024             [-1, 32, 7, 7]               0
    BatchNorm2d-1025            [-1, 960, 7, 7]           1,920
           ReLU-1026            [-1, 960, 7, 7]               0
         Conv2d-1027            [-1, 128, 7, 7]         122,880
    BatchNorm2d-1028            [-1, 128, 7, 7]             256
           ReLU-1029            [-1, 128, 7, 7]               0
         Conv2d-1030             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-1031             [-1, 32, 1, 1]               0
         Linear-1032                   [-1, 16]             528
           ReLU-1033                   [-1, 16]               0
         Linear-1034                   [-1, 32]             544
        Sigmoid-1035                   [-1, 32]               0
SqueezeExcitationLayer-1036             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-1037             [-1, 32, 1, 1]               0
         Linear-1038                   [-1, 16]             528
           ReLU-1039                   [-1, 16]               0
         Linear-1040                   [-1, 32]             544
        Sigmoid-1041                   [-1, 32]               0
SqueezeExcitationLayer-1042             [-1, 32, 7, 7]               0
    BatchNorm2d-1043            [-1, 992, 7, 7]           1,984
           ReLU-1044            [-1, 992, 7, 7]               0
         Conv2d-1045            [-1, 128, 7, 7]         126,976
    BatchNorm2d-1046            [-1, 128, 7, 7]             256
           ReLU-1047            [-1, 128, 7, 7]               0
         Conv2d-1048             [-1, 32, 7, 7]          36,864
AdaptiveAvgPool2d-1049             [-1, 32, 1, 1]               0
         Linear-1050                   [-1, 16]             528
           ReLU-1051                   [-1, 16]               0
         Linear-1052                   [-1, 32]             544
        Sigmoid-1053                   [-1, 32]               0
SqueezeExcitationLayer-1054             [-1, 32, 7, 7]               0
AdaptiveAvgPool2d-1055             [-1, 32, 1, 1]               0
         Linear-1056                   [-1, 16]             528
           ReLU-1057                   [-1, 16]               0
         Linear-1058                   [-1, 32]             544
        Sigmoid-1059                   [-1, 32]               0
SqueezeExcitationLayer-1060             [-1, 32, 7, 7]               0
    BatchNorm2d-1061           [-1, 1024, 7, 7]           2,048
           ReLU-1062           [-1, 1024, 7, 7]               0
         Linear-1063                    [-1, 2]           2,050
================================================================
Total params: 7,080,258
Trainable params: 7,080,258
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 311.15
Params size (MB): 27.01
Estimated Total Size (MB): 338.73
----------------------------------------------------------------

5. 设置超参数：定义损失函数，学习率，以及根据学习率定义优化器等

# loss_fn = nn.CrossEntropyLoss() # 创建损失函数

# learn_rate = 1e-3 # 初始学习率
# def adjust_learning_rate(optimizer,epoch,start_lr):
#     # 每两个epoch 衰减到原来的0.98
#     lr = start_lr * (0.92 ** (epoch//2))
#     for param_group in optimizer.param_groups:
#         param_group['lr'] = lr
        
# optimizer = torch.optim.Adam(model.parameters(),lr=learn_rate)

# 调用官方接口示例
loss_fn = nn.CrossEntropyLoss()

learn_rate = 1e-4
lambda1 = lambda epoch:(0.92**(epoch//2))

optimizer = torch.optim.Adam(model.parameters(),lr = learn_rate)
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer,lr_lambda=lambda1) # 选定调整方法

6. 训练函数

# 训练函数
def train(dataloader,model,loss_fn,optimizer):
    size = len(dataloader.dataset) # 训练集大小
    num_batches = len(dataloader) # 批次数目
    
    train_loss,train_acc = 0,0
    
    for X,y in dataloader:
        X,y = X.to(device),y.to(device)
        
        # 计算预测误差
        pred = model(X)
        loss = loss_fn(pred,y)
        
        # 反向传播
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # 记录acc与loss
        train_acc += (pred.argmax(1)==y).type(torch.float).sum().item()
        train_loss += loss.item()
        
    train_acc /= size
    train_loss /= num_batches
    
    return train_acc,train_loss

7. 测试函数

# 测试函数
def test(dataloader,model,loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    
    test_acc,test_loss = 0,0
    
    with torch.no_grad():
        for X,y in dataloader:
            X,y = X.to(device),y.to(device)
            
            # 计算loss
            pred = model(X)
            loss = loss_fn(pred,y)
            
            test_acc += (pred.argmax(1)==y).type(torch.float).sum().item()
            test_loss += loss.item()
            
    test_acc /= size
    test_loss /= num_batches
    
    return test_acc,test_loss

8. 正式训练

import copy

epochs = 40

train_acc = []
train_loss = []
test_acc = []
test_loss = []

best_acc = 0.0

for epoch in range(epochs):
    
    # 更新学习率——使用自定义学习率时使用
    # adjust_learning_rate(optimizer,epoch,learn_rate)
    
    model.train()
    epoch_train_acc,epoch_train_loss = train(train_dl,model,loss_fn,optimizer)
    scheduler.step() # 更新学习率——调用官方动态学习率时使用
    
    model.eval()
    epoch_test_acc,epoch_test_loss = test(test_dl,model,loss_fn)
    
    # 保存最佳模型到 best_model
    if epoch_test_acc > best_acc:
        best_acc = epoch_test_acc
        best_model = copy.deepcopy(model)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    # 获取当前学习率
    lr = optimizer.state_dict()['param_groups'][0]['lr']
    
    template = ('Epoch:{:2d},Train_acc:{:.1f}%,Train_loss:{:.3f},Test_acc:{:.1f}%,Test_loss:{:.3f},Lr:{:.2E}')
    print(template.format(epoch+1,epoch_train_acc*100,epoch_train_loss,epoch_test_acc*100,epoch_test_loss,lr))

print('Done')

Epoch: 1,Train_acc:65.3%,Train_loss:0.633,Test_acc:68.5%,Test_loss:0.595,Lr:1.00E-04
Epoch: 2,Train_acc:70.1%,Train_loss:0.578,Test_acc:71.3%,Test_loss:0.562,Lr:9.20E-05
Epoch: 3,Train_acc:72.3%,Train_loss:0.542,Test_acc:75.5%,Test_loss:0.510,Lr:9.20E-05
Epoch: 4,Train_acc:74.0%,Train_loss:0.502,Test_acc:80.0%,Test_loss:0.457,Lr:8.46E-05
Epoch: 5,Train_acc:76.6%,Train_loss:0.474,Test_acc:76.7%,Test_loss:0.488,Lr:8.46E-05
Epoch: 6,Train_acc:79.3%,Train_loss:0.434,Test_acc:79.0%,Test_loss:0.440,Lr:7.79E-05
Epoch: 7,Train_acc:80.9%,Train_loss:0.423,Test_acc:83.0%,Test_loss:0.397,Lr:7.79E-05
Epoch: 8,Train_acc:83.3%,Train_loss:0.375,Test_acc:78.1%,Test_loss:0.433,Lr:7.16E-05
Epoch: 9,Train_acc:83.6%,Train_loss:0.360,Test_acc:82.5%,Test_loss:0.374,Lr:7.16E-05
Epoch:10,Train_acc:84.8%,Train_loss:0.333,Test_acc:88.3%,Test_loss:0.320,Lr:6.59E-05
Epoch:11,Train_acc:88.1%,Train_loss:0.294,Test_acc:87.4%,Test_loss:0.337,Lr:6.59E-05
Epoch:12,Train_acc:87.3%,Train_loss:0.293,Test_acc:84.6%,Test_loss:0.364,Lr:6.06E-05
Epoch:13,Train_acc:89.1%,Train_loss:0.257,Test_acc:88.6%,Test_loss:0.269,Lr:6.06E-05
Epoch:14,Train_acc:90.3%,Train_loss:0.238,Test_acc:84.6%,Test_loss:0.356,Lr:5.58E-05
Epoch:15,Train_acc:91.2%,Train_loss:0.210,Test_acc:84.4%,Test_loss:0.328,Lr:5.58E-05
Epoch:16,Train_acc:91.8%,Train_loss:0.202,Test_acc:89.3%,Test_loss:0.279,Lr:5.13E-05
Epoch:17,Train_acc:93.3%,Train_loss:0.165,Test_acc:89.3%,Test_loss:0.277,Lr:5.13E-05
Epoch:18,Train_acc:93.5%,Train_loss:0.168,Test_acc:89.5%,Test_loss:0.324,Lr:4.72E-05
Epoch:19,Train_acc:93.7%,Train_loss:0.173,Test_acc:87.9%,Test_loss:0.293,Lr:4.72E-05
Epoch:20,Train_acc:93.8%,Train_loss:0.156,Test_acc:90.7%,Test_loss:0.249,Lr:4.34E-05
Epoch:21,Train_acc:95.2%,Train_loss:0.122,Test_acc:89.3%,Test_loss:0.266,Lr:4.34E-05
Epoch:22,Train_acc:96.2%,Train_loss:0.123,Test_acc:90.7%,Test_loss:0.270,Lr:4.00E-05
Epoch:23,Train_acc:95.9%,Train_loss:0.124,Test_acc:89.5%,Test_loss:0.290,Lr:4.00E-05
Epoch:24,Train_acc:96.0%,Train_loss:0.118,Test_acc:91.4%,Test_loss:0.296,Lr:3.68E-05
Epoch:25,Train_acc:95.2%,Train_loss:0.131,Test_acc:91.4%,Test_loss:0.248,Lr:3.68E-05
Epoch:26,Train_acc:95.7%,Train_loss:0.113,Test_acc:90.4%,Test_loss:0.306,Lr:3.38E-05
Epoch:27,Train_acc:97.6%,Train_loss:0.077,Test_acc:93.7%,Test_loss:0.226,Lr:3.38E-05
Epoch:28,Train_acc:96.6%,Train_loss:0.089,Test_acc:91.8%,Test_loss:0.286,Lr:3.11E-05
Epoch:29,Train_acc:97.3%,Train_loss:0.084,Test_acc:92.8%,Test_loss:0.243,Lr:3.11E-05
Epoch:30,Train_acc:96.6%,Train_loss:0.093,Test_acc:91.8%,Test_loss:0.227,Lr:2.86E-05
Epoch:31,Train_acc:97.4%,Train_loss:0.075,Test_acc:93.7%,Test_loss:0.236,Lr:2.86E-05
Epoch:32,Train_acc:97.6%,Train_loss:0.073,Test_acc:92.1%,Test_loss:0.246,Lr:2.63E-05
Epoch:33,Train_acc:97.8%,Train_loss:0.066,Test_acc:93.0%,Test_loss:0.223,Lr:2.63E-05
Epoch:34,Train_acc:98.4%,Train_loss:0.053,Test_acc:92.1%,Test_loss:0.265,Lr:2.42E-05
Epoch:35,Train_acc:98.4%,Train_loss:0.056,Test_acc:91.6%,Test_loss:0.250,Lr:2.42E-05
Epoch:36,Train_acc:98.2%,Train_loss:0.062,Test_acc:92.5%,Test_loss:0.301,Lr:2.23E-05
Epoch:37,Train_acc:97.6%,Train_loss:0.068,Test_acc:93.5%,Test_loss:0.236,Lr:2.23E-05
Epoch:38,Train_acc:98.1%,Train_loss:0.049,Test_acc:91.8%,Test_loss:0.244,Lr:2.05E-05
Epoch:39,Train_acc:98.9%,Train_loss:0.043,Test_acc:94.2%,Test_loss:0.216,Lr:2.05E-05
Epoch:40,Train_acc:98.7%,Train_loss:0.045,Test_acc:92.8%,Test_loss:0.245,Lr:1.89E-05
Done

9. 结果可视化

epochs_range = range(epochs)

plt.figure(figsize = (12,3))

plt.subplot(1,2,1)
plt.plot(epochs_range,train_acc,label = 'Training Accuracy')
plt.plot(epochs_range,test_acc,label = 'Test Accuracy')
plt.legend(loc = 'lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1,2,2)
plt.plot(epochs_range,train_loss,label = 'Test Accuracy')
plt.plot(epochs_range,test_loss,label = 'Test Loss')
plt.legend(loc = 'lower right')
plt.title('Training and validation Loss')
plt.show()

在这里插入图片描述

10. 模型的保存

# 自定义模型保存
# 状态字典保存
torch.save(model.state_dict(),'./模型参数/J5_densenet121&SE_model_state_dict.pth') # 仅保存状态字典

# 定义模型用来加载参数
best_model = DenseNet(
    num_init_features=64,  # init_channel=64,
    growth_rate=32,
    block_config=(6, 12, 24, 16),
    num_classes=len(classNames),  # 根据您的分类任务设置类别数
    se_filter_sq=se_filter_sq  # 传递SE模块的参数
).to(device)

best_model.load_state_dict(torch.load('./模型参数/J5_densenet121&SE_model_state_dict.pth')) # 加载状态字典到模型

<All keys matched successfully>

11. 使用训练好的模型进行预测

# 指定路径图片预测
from PIL import Image
import torchvision.transforms as transforms

classes = list(total_data.class_to_idx) # classes = list(total_data.class_to_idx)

def predict_one_image(image_path,model,transform,classes):
    
    test_img = Image.open(image_path).convert('RGB')
    # plt.imshow(test_img) # 展示待预测的图片
    
    test_img = transform(test_img)
    img = test_img.to(device).unsqueeze(0)
    
    model.eval()
    output = model(img)
    print(output) # 观察模型预测结果的输出数据
    
    _,pred = torch.max(output,1)
    pred_class = classes[pred]
    print(f'预测结果是:{pred_class}')

# 预测训练集中的某张照片
predict_one_image(image_path='./data/mpox_recognize/Monkeypox/M01_01_04.jpg',
                 model = model,
                 transform = test_transforms,
                 classes = classes
                 )

tensor([[ 2.6228, -3.6656]], device='cuda:0', grad_fn=<AddmmBackward0>)
预测结果是:Monkeypox

classes

['Monkeypox', 'Others']

查看全文

http://www.kler.cn/a/396669.html

Java事务

制作图片马常用的五种方法总结

【AI协作】让所有用电脑的场景都能在ChatGPT里完成。Canvas ：新一代可视化交互，让AI易用易得

新手小白学习docker第八弹------实现MySQL主从复制搭建

tauri开发中，使用node将png图片转成苹果的icns图标格式，解决tauri icon生成的mac图标过大问题

高级java每日一道面试题-2024年11月07日-Redis篇-Redis有哪些功能?

演员王子辰—专注革命题材《前行者》后再出发

【软考】系统架构设计师-计算机系统基础（3）：嵌入式系统

搭建 PostgreSQL 主从架构

ElementUI的日期组件中禁止选择小时、分钟、秒

卡尔曼滤波：从理论到应用的简介

Android 中线程网络超时的处理

缓存及其不一致

Yocto - 使用Yocto开发嵌入式Linux系统_13 创建定制层

什么是 Go 语言？

【计算机体系架构】 MESI缓冲一致性

力扣每日一题 3261. 统计满足 K 约束的子字符串数量 II

DAY65||Bellman_ford 队列优化算法（又名SPFA）|bellman_ford之判断负权回路|bellman_ford之单源有限最短路

LogViewer NLog, Log4Net, Log4j 文本日志可视化

安全见闻1-5