当前位置：首页 > article >正文

【深度学习】Pytorch：自实现残差网络

article 2025/3/1 0:18:14

ResNet（残差网络）是由何凯明等人在2015年发表的论文《深度残差学习用于图像识别》中提出的一种开创性深度学习架构。它在ILSVRC 2015分类任务中获胜，并解决了深度神经网络中的退化问题，使得训练数百甚至数千层的网络成为可能。

残差网络

核心概念

ResNet的核心思想是使用恒等捷径或跳跃连接。ResNet不直接学习底层的映射 $H (x)$ ，而是强制网络学习残差映射 $F (x) = H (x) - x$ ，这使得优化变得更加容易。

ResNet的一个构建块可以表示为：

$y=F(x,{W_i})+x$
其中：

x：层的输入
F(x, {W_i})：学习到的残差映射
+x：跳跃连接

优势

在传统的深度神经网络中，随着层数的增加，网络会面临梯度消失或爆炸的问题，性能往往会下降而不是提升。ResNet通过引入残差连接解决了这一问题，使得梯度能够更顺畅地流动，从而确保极深网络的稳定训练。

PyTorch 实现

我们将使用 PyTorch 从头实现 ResNet，重点关注灵活性和代码清晰度。

导入库

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models

残差块

ResNet 的基础构建块（适用于小型网络，如 ResNet-18 和 ResNet-34）定义如下：

class BasicBlock(nn.Module):  
    expansion = 1  # 扩展因子，用于调整输出通道数  

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):  
        super(BasicBlock, self).__init__()  
        # 第一个卷积层：3x3 卷积，指定输入和输出通道，stride 可调整以控制步幅  
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)  
        self.bn1 = nn.BatchNorm2d(out_channels)  # 对第一个卷积层的输出进行批归一化  
        # 第二个卷积层：3x3 卷积，stride 固定为 1，保持输出大小  
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)  
        self.bn2 = nn.BatchNorm2d(out_channels)  # 对第二个卷积层的输出进行批归一化  
        self.downsample = downsample  # 下采样模块的引用，调节输入和输出的维度匹配  

    def forward(self, x):  
        identity = x  # 保存输入值，用于实现残差连接  
        if self.downsample is not None:  
            identity = self.downsample(x)  # 如果存在下采样模块，使用它调整输入维度  

        # 第一个卷积 + 批归一化 + ReLU 激活函数  
        out = self.conv1(x)  
        out = self.bn1(out)  # 批归一化以提高训练稳定性  
        out = F.relu(out)  # 使用 ReLU 激活非线性  

        # 第二个卷积 + 批归一化  
        out = self.conv2(out)  
        out = self.bn2(out)  # 批归一化  

        out += identity  # 将输入通过残差连接添加到卷积输出  
        out = F.relu(out)  # 对残差连接的结果应用 ReLU 激活函数  

        return out  # 返回最终的输出

ResNet 架构

我们定义一个通用的 ResNet 类，使用 BasicBlock 来实现 ResNet-18 和 ResNet-34。

class ResNet(nn.Module):  
    def __init__(self, block, layers, num_classes=1000):  
        super(ResNet, self).__init__()  
        self.in_channels = 64  # 初始输入通道数  

        # 初始卷积层：7x7 卷积，stride=2，padding=3，用于特征提取  
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)  
        self.bn1 = nn.BatchNorm2d(64)  # 对初始卷积层的输出进行批归一化  
        self.relu = nn.ReLU(inplace=True)  # ReLU 激活函数，inplace=True 以节省内存  
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)  # 最大池化层降低特征图的尺寸  

        # 创建四个残差层  
        self.layer1 = self._make_layer(block, 64, layers[0])  # 第一层，输出通道数为 64  
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)  # 第二层，输出通道数为 128，步幅为 2  
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)  # 第三层，输出通道数为 256，步幅为 2  
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)  # 第四层，输出通道数为 512，步幅为 2  

        # 全局平均池化层和全连接层  
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # 自适应平均池化，输出 shape 为 (N, C, 1, 1)  
        self.fc = nn.Linear(512 * block.expansion, num_classes)  # 全连接层，该层输出的维度为 num_classes  

    def _make_layer(self, block, out_channels, blocks, stride=1):  
        downsample = None  
        # 如果需要下采样（调整输入维度）  
        if stride != 1 or self.in_channels != out_channels * block.expansion:  
            downsample = nn.Sequential(  
                nn.Conv2d(self.in_channels, out_channels * block.expansion,  
                          kernel_size=1, stride=stride, bias=False),  # 1x1 卷积层用于调整通道数  
                nn.BatchNorm2d(out_channels * block.expansion),  # 批归一化  
            )  

        layers = []  # 将要构建的残差层列表  
        # 添加第一个残差块  
        layers.append(block(self.in_channels, out_channels, stride, downsample))  # 传入当前输入通道数、输出通道数和下采样  
        self.in_channels = out_channels * block.expansion  # 更新输入通道数  
        # 添加剩余的残差块  
        for _ in range(1, blocks):  
            layers.append(block(self.in_channels, out_channels))  # 连接剩余的块，不需要下采样  

        return nn.Sequential(*layers)  # 将所有层组合成一个顺序容器返回  

    def forward(self, x):  
        # 对输入进行前向传播  
        # 初始卷积层  
        x = self.conv1(x)  # 输入经过初始卷积层  
        x = self.bn1(x)  # 批归一化  
        x = self.relu(x)  # ReLU 激活  
        x = self.maxpool(x)  # 最大池化  

        # 通过四个残差层  
        x = self.layer1(x)  # 第一层的输出  
        x = self.layer2(x)  # 第二层的输出  
        x = self.layer3(x)  # 第三层的输出  
        x = self.layer4(x)  # 第四层的输出  

        # 全局平均池化和全连接层  
        x = self.avgpool(x)  # 自适应平均池化  
        x = torch.flatten(x, 1)  # 展平，flatten 使得特征图变成一维  
        x = self.fc(x)  # 经过全连接层，得到最终输出  

        return x  # 返回分类结果

实例化 ResNet-18/34

def resnet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])  # ResNet-18 的层结构
def resnet34():
    return ResNet(BasicBlock, [3, 4, 6, 3])  # ResNet-34 的层结构

# 检查是否有 GPU，如果有则使用 GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# 实例化模型并移动到设备
model = resnet18().to(device)

训练和评估

以下是一个简单的训练循环，用于训练 ResNet 模型。

import torch.optim as optim

# 超参数
num_epochs = 10  # 训练轮数
learning_rate = 0.001  # 学习率

# 损失函数和优化器
criterion = nn.CrossEntropyLoss()  # 交叉熵损失
optimizer = optim.Adam(model.parameters(), lr=learning_rate)  # Adam 优化器

# 训练循环
for epoch in range(num_epochs):
    model.train()  # 设置模型为训练模式
    running_loss = 0.0  # 记录当前轮次的损失

    for inputs, labels in train_loader:  # 假设 train_loader 已定义
        inputs, labels = inputs.to(device), labels.to(device)  # 将数据移动到设备

        # 梯度清零
        optimizer.zero_grad()

        # 前向传播
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # 反向传播和优化
        loss.backward()
        optimizer.step()

        running_loss += loss.item()  # 累加损失

    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}")

print("训练完成。")

更深的 ResNet 网络实现

`BottleneckBlock`

BottleneckBlock 是 ResNet-50 及更深模型的基础构建块，其结构如下：

class BottleneckBlock(nn.Module):
    expansion = 4  # 扩展因子，用于调整输出通道数

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(BottleneckBlock, self).__init__()
        # 第一个卷积层：1x1 卷积，用于降维
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        # 第二个卷积层：3x3 卷积，stride 可调整
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        # 第三个卷积层：1x1 卷积，用于升维
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample  # 下采样模块，用于调整输入维度

    def forward(self, x):
        identity = x  # 保存输入值，用于残差连接
        if self.downsample is not None:
            identity = self.downsample(x)  # 如果需要下采样，调整输入维度

        # 第一个卷积 + 批归一化 + ReLU
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        # 第二个卷积 + 批归一化 + ReLU
        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)
        # 第三个卷积 + 批归一化
        out = self.conv3(out)
        out = self.bn3(out)

        out += identity  # 残差连接
        out = self.relu(out)  # 最终激活函数

        return out

修改 ResNet 模型定义

通过调整残差块类型（BasicBlock 或 BottleneckBlock）和层结构，可以轻松实现不同的 ResNet 模型。以下是常用 ResNet 模型的对应配置：

模型名称	残差块类型	层结构
ResNet-18	`BasicBlock`	`[2, 2, 2, 2]`
ResNet-34	`BasicBlock`	`[3, 4, 6, 3]`
ResNet-50	`BottleneckBlock`	`[3, 4, 6, 3]`
ResNet-101	`BottleneckBlock`	`[3, 4, 23, 3]`
ResNet-152	`BottleneckBlock`	`[3, 8, 36, 3]`

根据需求选择合适的模型即可！

准确率验证

在训练深度学习模型时，除了监控损失值外，验证模型的准确率同样重要。以下是一个完整的准确率验证模块，可以与训练循环结合使用，以评估模型在验证集上的性能。

# 验证循环
def validate_model(model, val_loader, device):
    model.eval()  # 设置模型为评估模式
    correct = 0  # 记录预测正确的样本数
    total = 0  # 记录总样本数
    val_loss = 0.0  # 记录验证集上的损失

    with torch.no_grad():  # 禁用梯度计算
        for inputs, labels in val_loader:  # 假设 val_loader 已定义
            inputs, labels = inputs.to(device), labels.to(device)  # 将数据移动到设备

            # 前向传播
            outputs = model(inputs)
            loss = criterion(outputs, labels)  # 计算损失
            val_loss += loss.item()  # 累加损失

            # 计算预测结果
            _, predicted = torch.max(outputs.data, 1)  # 获取预测类别
            total += labels.size(0)  # 更新总样本数
            correct += (predicted == labels).sum().item()  # 更新正确预测数

    # 计算平均损失和准确率
    avg_loss = val_loss / len(val_loader)
    accuracy = 100 * correct / total

    print(f"Validation Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")
    return avg_loss, accuracy