当前位置: 首页 > article >正文

Pytorch微调深度学习模型

在公开数据训练了模型,有时候需要拿到自己的数据上微调。今天正好做了一下微调,在此记录一下微调的方法。用Pytorch还是比较容易实现的。

网上找了很多方法,以及Chatgpt也给了很多方法,但是不够简洁和容易理解。

大体步骤是:

1、加载训练好的模型。

2、冻结不想微调的层,设置想训练的层。(这里可以新建一个层替换原有层,也可以不新建层,直接微调原有层)

3、训练即可。

1、先加载一个模型

我这里是训练好的一个SqueezeNet模型,所有模型都适用。

## 加载要微调的模型
# 环境里必须有模型的框架,才能torch.load
from Model.main_SqueezeNet import SqueezeNet,Fire

model = torch.load("Model/SqueezeNet.pth").to(device)
print(model)
# 输出结果
SqueezeNet(
  (stem): Sequential(
    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (fire2): Fire(
    (squeeze): Sequential(
      (0): Conv2d(8, 4, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_1x1): Sequential(
      (0): Conv2d(4, 8, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_3x3): Sequential(
      (0): Conv2d(4, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (fire3): Fire(
    (squeeze): Sequential(
      (0): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_1x1): Sequential(
      (0): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_3x3): Sequential(
      (0): Conv2d(8, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (fire4): Fire(
    (squeeze): Sequential(
      (0): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_1x1): Sequential(
      (0): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
    (expand_3x3): Sequential(
      (0): Conv2d(8, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU(inplace=True)
    )
  )
  (conv10): Conv2d(16, 2, kernel_size=(1, 1), stride=(1, 1))
  (avg): AdaptiveAvgPool2d(output_size=1)
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)

print(model)时会显示模型每个层的名字。这里我想对conv10层进行微调,因为它是最后一个具有参数可以微调的层了。当然,如果最后一层是全连接的话,也建议微调最后全连接层。 

2、冻结不想训练的层。

这里就有两种不同的方法了:一是新建一个conv10层,替换掉原来的层。二是不新建,直接微调原来的层。

新建:

model.conv10 = nn.Conv2d(model.conv10.in_channels, model.conv10.out_channels, model.conv10.kernel_size, model.conv10.stride)
print(model)

可以直接用model.conv10.in_channels等加载原来层的各种参数。这样就定义好了一个新的conv10层,并且已经替换进了模型中。

然后先冻结所有层(requires_grad = False),再放开conv10层(requires_grad = True)。

# 先冻结所有层
for param in model.parameters():
    param.requires_grad = False

# 仅对conv10层进行微调,如果在冻结后新定义了conv10层,这两行可以不写,默认有梯度
for param in model.conv10.parameters():
    param.requires_grad = True

如果不新建层,则不需要运行model.conv10 = nn.Conv2d那一行即可。直接开始冻结就可以。

 3、训练

这里一定要注意,optimizer里要设置参数 model.conv10.parameters(),而不是model.parameters()。这是让模型知道它将要训练哪些参数。

optimizer = optim.SGD(model.conv10.parameters(), lr=1e-2)

虽然上面已经冻结了不想训练的参数,但是这里最好还是写上model.conv10.parameters()。大家也可以试试不写行不行。

# 使用交叉熵损失函数
criterion = nn.CrossEntropyLoss()
# 只优化conv10层的参数
optimizer = optim.SGD(model.conv10.parameters(), lr=1e-2)
# 将模型移到GPU(如果可用)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 设置模型为训练模式
model.train()

num_epochs = 10
for epoch in range(num_epochs):
    # model.train()
    running_loss = 0.0
    correct = 0
    for x_train, y_train in data_loader:
        x_train, y_train = x_train.to(device), y_train.to(device)
        print(x_train.shape, y_train.shape)

        # 前向传播
        outputs = model(x_train)
        loss = criterion(outputs, y_train)

        # 反向传播和优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * x_train.size(0)

        # 统计训练集的准确率
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == y_train).sum().item()

    # 计算每个 epoch 的训练损失和准确率
    epoch_loss = running_loss / len(dataset)
    epoch_accuracy = 100 * correct / len(dataset)
    
    # if epoch % 5 == 0 or epoch == num_epochs-1 :
    print(f'Epoch [{epoch+1}/{num_epochs}]')
    print(f'Train Loss: {epoch_loss:.4f}, Train Accuracy: {epoch_accuracy:.2f}%')

输出显示Loss下降说明模型有在学习。 模型准确率从0变成100,还是非常有成就感的!当然我这里就用了一个样本来微调hhhh。

Epoch [1/10]
Train Loss: 0.8185, Train Accuracy: 0.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [2/10]
Train Loss: 0.7063, Train Accuracy: 0.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [3/10]
Train Loss: 0.6141, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [4/10]
Train Loss: 0.5385, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [5/10]
Train Loss: 0.4761, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [6/10]
Train Loss: 0.4244, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [7/10]
Train Loss: 0.3812, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [8/10]
Train Loss: 0.3449, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [9/10]
Train Loss: 0.3140, Train Accuracy: 100.00%
torch.Size([1, 1, 32, 16]) torch.Size([1])
Epoch [10/10]
Train Loss: 0.2876, Train Accuracy: 100.00%

4、验证一下确实是只有这个层参数变化了,而其他层参数没变。

在训练模型之前,看一下这个层的参数:

raw_parm = model.conv10.weight
print(raw_parm)
# 部分输出为
Parameter containing:
tensor([[[[-0.1621]],

         [[ 0.0288]],

         [[ 0.1275]],

         [[ 0.1584]],

         [[ 0.0248]],

         [[-0.2013]],

         [[-0.2086]],

         [[ 0.1460]],

         [[ 0.0566]],

         [[ 0.2897]],

         [[ 0.2898]],

         [[ 0.0610]],

         [[ 0.2172]],

         [[ 0.0860]],

         [[ 0.2730]],

         [[-0.1053]]],

训练后,也输出一下这个层的参数:

## 查看微调后模型的参数
tuned_parm = model.conv10.weight
print(tuned_parm)
# 部分输出为:
Parameter containing:
tensor([[[[-0.1446]],

         [[ 0.0365]],

         [[ 0.1490]],

         [[ 0.1783]],

         [[ 0.0424]],

         [[-0.1826]],

         [[-0.1903]],

         [[ 0.1636]],

         [[ 0.0755]],

         [[ 0.3092]],

         [[ 0.3093]],

         [[ 0.0833]],

         [[ 0.2405]],

         [[ 0.1049]],

         [[ 0.2925]],

         [[-0.0866]]],

可见这个层的参数确实是变了。

然后检查一下别的随便一个层:

训练前:

# 训练前
raw_parm = model.stem[0].weight
print(raw_parm)
# 部分输出为:
Parameter containing:
tensor([[[[-0.0723, -0.2151,  0.1123],
          [-0.2114,  0.0173, -0.1322],
          [-0.0819,  0.0748, -0.2790]]],


        [[[-0.0918, -0.2783, -0.3193],
          [ 0.0359,  0.2993, -0.3422],
          [ 0.1979,  0.2499, -0.0528]]],

训练后:

## 查看微调后模型的参数
tuned_parm = model.stem[0].weight
print(tuned_parm)
# 部分输出为:
Parameter containing:
tensor([[[[-0.0723, -0.2151,  0.1123],
          [-0.2114,  0.0173, -0.1322],
          [-0.0819,  0.0748, -0.2790]]],


        [[[-0.0918, -0.2783, -0.3193],
          [ 0.0359,  0.2993, -0.3422],
          [ 0.1979,  0.2499, -0.0528]]],

可见参数没有变化。说明这层没有进行学习。

5、为了让大家更容易全面理解,完整代码如下。

import torch
import numpy as np
import torch.optim as optim
import torch.nn as nn
from torchinfo import summary
from torch.utils.data import DataLoader, Dataset,TensorDataset
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, precision_score, recall_score, f1_score
import matplotlib.pyplot as plt
from imblearn.under_sampling import RandomUnderSampler # 多数样本下采样

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

## 加载微调数据
feats = np.load("feats_jn105.npy")
labels = np.array([0])
print(feats.shape)
print(labels.shape)

# 将data和labels转换为 PyTorch 张量
data_tensor = torch.tensor(feats, dtype = torch.float32, requires_grad=True)
labels_tensor = torch.tensor(labels, dtype = torch.long)

# 添加通道维度
# data_tensor = data_tensor.unsqueeze(1)  # 变为(num, 1, 32, 16)
batch_size = 15

# 创建 TensorDataset
dataset = TensorDataset(data_tensor, labels_tensor)
data_loader = DataLoader(dataset, batch_size = batch_size, shuffle = False)
input, label = next(iter(data_loader))
print(input.shape,label.shape)
# upyter nbconvert --to script ./Model/main_SqueezeNet.ipynb # 终端运行,ipynb转py

## 加载要微调的模型
# 环境里必须有模型的框架,才能torch.load
from Model.main_SqueezeNet import SqueezeNet,Fire

model = torch.load("Model/SqueezeNet.pth").to(device)
print(model)

# 为模型写一个新的层
# model.fc = nn.Linear(in_features = model.fc.in_features, out_features = model.fc.out_features)
model.conv10 = nn.Conv2d(model.conv10.in_channels, model.conv10.out_channels, model.conv10.kernel_size, model.conv10.stride)
print(model)

# 先冻结所有层
for param in model.parameters():
    param.requires_grad = False

# 仅对conv10层进行微调,如果在冻结后新定义了conv10层,这两行可以不写,默认有梯度
for param in model.conv10.parameters():
    param.requires_grad = True

raw_parm = model.stem[0].weight
print(raw_parm)
for name, param in model.named_parameters():
    print(name, param.requires_grad)

# 使用交叉熵损失函数
criterion = nn.CrossEntropyLoss()

# 只优化c10层的参数
optimizer = optim.SGD(model.conv10.parameters(), lr=1e-2)

# 将模型移到GPU(如果可用)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# 设置模型为训练模式
model.train()

num_epochs = 10
for epoch in range(num_epochs):
    # model.train()
    running_loss = 0.0
    correct = 0
    for x_train, y_train in data_loader:
        x_train, y_train = x_train.to(device), y_train.to(device)
        print(x_train.shape, y_train.shape)

        # 前向传播
        outputs = model(x_train)
        loss = criterion(outputs, y_train)

        # 反向传播和优化
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * x_train.size(0)

        # 统计训练集的准确率
        _, predicted = torch.max(outputs, 1)
        correct += (predicted == y_train).sum().item()

    # 计算每个 epoch 的训练损失和准确率
    epoch_loss = running_loss / len(dataset)
    epoch_accuracy = 100 * correct / len(dataset)
    
    # if epoch % 5 == 0 or epoch == num_epochs-1 :
    print(f'Epoch [{epoch+1}/{num_epochs}]')
    print(f'Train Loss: {epoch_loss:.4f}, Train Accuracy: {epoch_accuracy:.2f}%')

## 查看微调后模型的参数
tuned_parm = model.stem[0].weight
print(tuned_parm)


如有更好的方法,欢迎大家分享~


http://www.kler.cn/a/410555.html

相关文章:

  • Windows系统电脑安装TightVNC服务端结合内网穿透实现异地远程桌面
  • 浏览器缓存与协商缓存
  • Vue——【路由】
  • 详解Oracle表的类型(二)
  • IDEA 2024安装指南(含安装包以及使用说明 cannot collect jvm options 问题 四)
  • 问:Spring Boot应用监控组件工具,梳理一下?
  • linux僵尸线程清理
  • 【Redis 缓存策略】更新、穿透、雪崩、击穿、布隆过滤
  • C语言-数学基础问题
  • ArcGIS API for Javascript学习
  • git 命令之只提交文件的部分更改
  • Python多进程与多线程详解:全面指南
  • 硬中断关闭后的堆栈抓取方法
  • HarmonyOS4+NEXT星河版入门与项目实战(19)------状态管理 @Prop@Link@Provide@Consume
  • nodejs操作selenium-webdriver
  • HashMap的寻址算法(源码分析)
  • 路由器中继与桥接
  • WPF中如何让Textbox显示为一条直线
  • Kali Linux语言设置成中文
  • 硬盘(HDD)与固态硬盘(SSD)详细解读
  • WSL安装不同版本ubuntu(已有ubuntu20.04,再装ubuntu18.04)
  • Linux(Ubuntu)升级openssh至9.6版本
  • PyTorch2
  • 树链剖分(重链剖分)
  • ES实用面试题
  • 什么是 C++ 中的类型别名和 using 声明? 如何使用类型别名和 using 声明?