当前位置：首页 > article >正文

大模型开发（六）：LoRA项目——新媒体评论智能分类与信息抽取系统

article 2025/3/20 10:02:19

LoRA项目——新媒体评论智能分类与信息抽取系统

0 前言
1 项目介绍
- 1.1 项目功能
- 1.2 技术原理
- 1.3 软硬件环境
- 1.4 项目结构
2 数据介绍与处理
- 2.1 数据集介绍
- 2.2 数据处理
- 2.3 数据导入器
3 模型训练
- 3.1 配置文件
- 3.2 工具函数
- 3.3 模型训练
- 3.4 模型评估
4 模型推理

0 前言

微调里面，用的最多的是 LoRA 微调，这是一种参数高效微调的方式。

1 项目介绍

1.1 项目功能

这个项目我要完成两个工作，一是文本分类，二是信息抽取，分类我们前面已经讲过了，这里重点介绍以下信息抽取：
在这里插入图片描述
信息抽取的目的是获得知识图谱（即实体和实体之间的关系），圆圈表示实体，连线表示关系。

1.2 技术原理

在这里插入图片描述
图中左边时预训练过程，这个我们稍后介绍数据处理的时候会介绍。

所谓的LoRA微调，就是给预训练模型的线性层加一个旁支，比如原来是y=wx+b，现在变成了y=w*x+b+Δw*x，这里的Δw就是旁支权重，训练的时候是冻结原模型中的参数，只更新旁支的参数，旁支是先降维，后升维。
在这里插入图片描述

1.3 软硬件环境

硬件环境如下：

GPU:1卡
每卡显存:80 GB
CPU:8核
内存:48 GB
临时存储:50 GB

这里需要注意的是，80G的显存指的是单卡显存，不能用两张40G的显卡拼。如果本地没有这么大的显卡，建议在驱动云上进行操作。关于驱动云的使用，可以看这篇文章。

软件环境和前面一样，但有几个包的版本必须和下面一致：

transformers==4.27.1
peft==0.11.1
cpm_kernels

其他工具包缺少什么安装什么。

1.4 项目结构

项目的大致结构，和前面的Bert项目是一样的，这里不展开介绍
在这里插入图片描述

2 数据介绍与处理

2.1 数据集介绍

数据集里同时包含了分类和信息抽取数据，即把亮相功能的数据混合到了一起。微调的数据集分别在mixed_train_dataset.jsonl与mixed_dev_dataset.jsonl两个文件下面，训练集数据共计包含902条样本，测试集包含122条样本。

信息抽取数据示例如下：
在这里插入图片描述

这里context是提示词，target是要输出的目标，我们用python打印出来：
在这里插入图片描述

可以看到Instruction是系统提示词，Input是用户输入，Answer是提示模型给出答案，target是目标值。

从上面提示的内容可以看到，这里是要模型从输入的文本中提取主谓宾，subject是主语，predicate是谓语，object是宾语。（这一点我挺疑惑的，喜剧之王明明是宾语，为什么被标记成subject）

分类数据示例如下：

{"context": "Instruction: 你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。\nInput: 下面句子中描述的是一个什么？用列表的方式回答。\n\n价格便宜送货快，质量挺好的\nAnswer: ", "target": "[\"平板\"]"}

打印出来如下：

context
Instruction: 你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。
Input: 下面句子中描述的是一个什么？用列表的方式回答。

价格便宜送货快，质量挺好的
Answer: 
--------------------------------------------------------------------------------
target
["平板"]
--------------------------------------------------------------------------------

2.2 数据处理

注意，这里的数据处理因为调用了ChatGLM-6B模型，所以必须要装cuda版的Pytorch。

数据处理的代码在lora_chatglm/data_handle/data_preprocess.py中，包括了两个函数，其中核心函数为convert_example，另一个是统计文本与目标值的长度信息的函数get_max_length，后者的作用是统计文本与目标值的最大长度、平均长度、长度中位数，拿到这些信息的目的，是为了设置最大提示词长度时提供参考。

代码如下：

import json
# 返回的字符串包含有关异常的详细信
import traceback
import numpy as np
from tqdm import tqdm
from datasets import load_dataset
from transformers import AutoTokenizer
from functools import partial


def convert_example(examples: dict, tokenizer, max_source_seq_len: int,max_target_seq_len: int):
    """
    将样本数据转换为Prompt-tuning模型接收的输入数据。

    Args:
        examples (dict): 训练数据样本, e.g. -> {
                                                "text": [
                                                            '{"context": "年基准利率4.35%。从实际看...", "target": "2017年银行贷款基准利率"}',
                                                            ...
                                                ]
                                            }
        max_source_seq_len (int): prompt最大长度
        max_target_seq_len (int): 答案最大长度

    Returns:
        dict (str: np.array) -> tokenized_output = {
                            'input_ids': [[1525, 10, ...], [758, 2345, ...]],
                            'labels': [[822, 10, ...], [125, 58...]]
                        }
    """
    # 字典套列表，用于收集处理后的样本
    tokenized_output = {
        'input_ids': [],
        'labels': []
    }

    max_seq_length = max_source_seq_len + max_target_seq_len

    # 遍历样本
    for example in examples['text']:
        try:
            # 获取文本及目标值
            example = json.loads(example)
            context = example["context"]
            target = example["target"]

            # 文本转id
            prompts_ids = tokenizer.encode(
                text=context,
                add_special_tokens=False
            )

            # 标签转id
            target_ids = tokenizer.encode(
                text=target,
                add_special_tokens=False
            )

            # 如果文本长度超过了阈值，则截断
            if len(prompts_ids) >= max_source_seq_len:                                          
                prompts_ids = prompts_ids[:max_source_seq_len - 1]  # source 需要在结尾为 [gMASK] 留一个位置

            # 如果目标值长度超过了阈值，则截断
            if len(target_ids) >= max_target_seq_len - 1:                                       
                target_ids = target_ids[:max_target_seq_len - 2]    # target 需要留一个 <sop> 在开头和一个 <eop> token 在结尾

            # 构造输入文本
            # 将文本和目标值转成 source_ids + [gMASK] + <sop>[也可以是<bos>] + target_ids + <eop> 的形式
            input_ids = tokenizer.build_inputs_with_special_tokens(prompts_ids, target_ids)     

            # 寻找<bos>在input_ids中的位置（索引），这个位置就是context的长度， # bos 在 target 的第一位
            context_length = input_ids.index(tokenizer.bos_token_id)                            
            # print(f'context_length-->{context_length}')

            # 构造标签
            mask_position = context_length - 1    # [gMASK] 在 source 的最后一位，用于标识后面的内容都是需要生成的
            labels = [-100] * context_length + input_ids[mask_position + 1:]  # 从 bos 开始到 eop 都为 label
            # 标签中的[-100]表示该位置不计算损失

            # 给输入文本和标签进行填充
            pad_len = max_seq_length - len(input_ids)
            input_ids = input_ids + [tokenizer.pad_token_id] * pad_len
            labels = labels + [-100] * pad_len      # 标签中用[-100]来进行填充

            # 收集输入文本和标签
            tokenized_output['input_ids'].append(input_ids)
            tokenized_output['labels'].append(labels)
        except:
            print(f'"{example}" -> {traceback.format_exc()}')
            continue

    for k, v in tokenized_output.items():
        tokenized_output[k] = np.array(v)

    return tokenized_output


def get_max_length(tokenizer, dataset_file: str ):
    """
    测试数据集最大的输入/输出tokens是多少。

    Args:
        dataset_file (str): _description_
    """
    source_seq_len_list = []
    target_seq_len_list = []
    with open(dataset_file, 'r') as f:
        for line in tqdm(f.readlines()):
            line = json.loads(line)

            source_len = tokenizer.encode(line['context'])
            source_seq_len_list.append(len(source_len))

            target_len = tokenizer.encode(line['target'])
            target_seq_len_list.append(len(target_len))

    print(dataset_file)
    print(f"【Source Sequence】 Max: {max(source_seq_len_list)}, Avg: {int(sum(source_seq_len_list) / len(source_seq_len_list))}, Middle: {sorted(source_seq_len_list)[int(len(source_seq_len_list) / 2)]}.")
    print(f"【Target Sequence】 Max: {max(target_seq_len_list)}, Avg: {int(sum(target_seq_len_list) / len(target_seq_len_list))}, Middle: {sorted(target_seq_len_list)[int(len(target_seq_len_list) / 2)]}.")




if __name__ == '__main__':
    train_path = '../data/mixed_train_dataset.jsonl'
    pre_model = r'../../chatglm-6b'

    # 导入数据
    train_dataset = load_dataset('text', data_files={'train': train_path})

    # 创建分词器
    tokenizer = AutoTokenizer.from_pretrained(pre_model, trust_remote_code=True)

    # 数据处理
    tokenized_output = convert_example(examples=train_dataset['train'],
                                       tokenizer=tokenizer,
                                       max_source_seq_len=30,
                                       max_target_seq_len=20)
    print(len(tokenized_output["input_ids"][0]))
    print(len(tokenized_output["labels"][0]))

    # 获取文本与目标值的长度信息，包括最大长度、平均长度、长度中位数
    get_max_length(tokenizer, train_path)

输出

50
50
100%|████████████████████████████████████████████████████████████████████████████| 902/902 [00:01<00:00, 524.41it/s]
../data/mixed_train_dataset.jsonl
【Source Sequence】 Max: 908, Avg: 72, Middle: 65.
【Target Sequence】 Max: 402, Avg: 50, Middle: 45.

2.3 数据导入器

构建数据集导入器的代码和前面BERT项目大致一样，这里不展开讲，相关代码在data_handle/data_loader.py中，代码如下：

# coding:utf-8
from torch.utils.data import DataLoader
from transformers import default_data_collator, AutoTokenizer

import sys
sys.path.append('../')

from data_handle.data_preprocess import *


def get_data(tokenizer, train_path, dev_path, batch_size):
    # 导入数据集
    dataset = load_dataset('text', data_files={'train': train_path, 'dev': dev_path})

    # 函数式编程
    new_func = partial(convert_example,
                       tokenizer=tokenizer,
                       max_source_seq_len=100,
                       max_target_seq_len=100)

    # 批处理
    dataset = dataset.map(new_func, batched=True)

    # 创建数据集导入器
    train_dataset = dataset["train"]
    dev_dataset = dataset["dev"]
    train_dataloader = DataLoader(train_dataset,
                                  shuffle=True,
                                  collate_fn=default_data_collator,
                                  batch_size=batch_size)
    dev_dataloader = DataLoader(dev_dataset,
                                collate_fn=default_data_collator,
                                batch_size=batch_size)
    return train_dataloader, dev_dataloader


if __name__ == '__main__':
    # 配置信息
    pre_model = r'../../chatglm-6b'
    train_path = '../data/mixed_train_dataset.jsonl'
    dev_path = '../data/mixed_dev_dataset.jsonl'
    batch_size = 4

    # 创建分词器
    tokenizer = AutoTokenizer.from_pretrained(pre_model, trust_remote_code=True)

    # 创建数据集导入器
    train_dataloader, dev_dataloader = get_data(tokenizer, train_path, dev_path, batch_size)
    print(len(train_dataloader))
    print(len(dev_dataloader))


    for i, value in enumerate(train_dataloader):
        print(value['input_ids'].shape)
        print(value['labels'].shape)
        print(value['input_ids'][:, :20])
        break

输出

226
31
torch.Size([4, 200])
torch.Size([4, 200])
tensor([[ 37010,     12,      5,  76331,  83362,  92831, 103593,  64464,      6,
          77115,  65077,  72863,  63891,  66207,  63823,      4,   3430,     12,
          68327,  63914],
        [ 37010,     12,      5,  76331,  83362,  92831, 103593,  64464,      6,
          77115,  65077,  72863,  63891,  66207,  63823,      4,   3430,     12,
              5,  71694],
        [ 37010,     12,      5,  76331,  83362,  92831, 103593,  64464,      6,
          77115,  65077,  72863,  63891,  66207,  63823,      4,   3430,     12,
          68327,  63914],
        [ 37010,     12,      5,  76331,  83362,  92831, 103593,  64464,      6,
          77115,  65077,  72863,  63891,  66207,  63823,      4,   3430,     12,
          68327,  74351]])

可以看到，我们构建的每个样本的提示词，都是以37010开头，我们可以看看37010是什么：
在这里插入图片描述

可以看到，37010是单词▁Instruction对应的词表id。

3 模型训练

3.1 配置文件

代码在glm_config.py中：

# -*- coding:utf-8 -*-
import torch


class ProjectConfig(object):
    def __init__(self):
        self.device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
        # self.device = 'cpu'
        self.pre_model = r'../chatglm-6b'
        self.train_path = r'data/mixed_train_dataset.jsonl'
        self.dev_path = r'data/mixed_dev_dataset.jsonl'
        self.use_lora = True
        self.use_ptuning = False    # 是否使用prefix微调
        self.lora_rank = 8
        self.batch_size = 8
        self.epochs = 5
        self.learning_rate = 3e-5
        self.weight_decay = 0
        self.warmup_ratio = 0.06
        self.max_source_seq_len = 400
        self.max_target_seq_len = 300
        self.logging_steps = 10
        self.evaluate_freq = 10         # 每10个batch评估一次
        self.pre_seq_len = 128
        self.prefix_projection = False # 默认为False,即p-tuning,如果为True，即p-tuning-v2
        self.save_dir = r'checkpoints/ptune'


if __name__ == '__main__':
    pc = ProjectConfig()
    print(pc.save_dir)

这个没什么好讲的。

3.2 工具函数

工具函数在utils/common_utils.py中，包括了CastOutputToFloat类、second2time函数和save_model函数，它们的作用我们稍后介绍，代码如下：

import copy
import torch
import torch.nn as nn

import sys
sys.path.append("../")
from glm_config import ProjectConfig


pc = ProjectConfig()

class CastOutputToFloat(nn.Sequential):
    def forward(self, x):
        return super().forward(x).to(torch.float32)


def second2time(seconds: int):
    """
    将秒转换成时分秒。

    Args:
        seconds (int): _description_
    """
    m, s = divmod(seconds, 60)
    h, m = divmod(m, 60)
    return "%02d:%02d:%02d" % (h, m, s)


def save_model(
        model,
        cur_save_dir: str
    ):
    """
    存储当前模型。

    Args:
        cur_save_path (str): 存储路径。
    """
    if pc.use_lora:                       # merge lora params with origin model
        merged_model = copy.deepcopy(model)
        # 如果直接保存，只保存的是adapter也就是lora模型的参数
        merged_model = merged_model.merge_and_unload()  # 将原模型与lora分支进行合并，并删除lora分支
        merged_model.save_pretrained(cur_save_dir)
    else:
        model.save_pretrained(cur_save_dir)

这里CastOutputToFloat 类继承自 nn.Sequential，而 nn.Sequential 是一个容器，可以包含多个子模块。虽然 CastOutputToFloat 类没有显式定义 __init__ 方法，但我们可以通过传递子模块给 nn.Sequential 的构造函数来初始化它，也就是说，CastOutputToFloat使用的是nn.Sequential 的构造函数。

看下面的代码就比较容易明白

linear_layer = nn.Linear(in_features=10, out_features=20)
new_linear = CastOutputToFloat(linear_layer)	# 这里调用了nn.Sequential 的构造函数，相当于 nn.Sequential(linear_layer)

3.3 模型训练

训练代码在train.py中，代码如下：

import os
import time
import peft
import torch
from functools import partial
# autocast是PyTorch中一种混合精度的技术，可在保持数值精度的情况下提高训练速度和减少显存占用。
# 该方法混合精度训练，如果在CPU环境中不起任何作用
from torch.cuda.amp import autocast as autocast
from transformers import AutoTokenizer, AutoConfig, AutoModel, get_scheduler
from utils.common_utils import CastOutputToFloat, save_model, second2time
from data_handle.data_loader import get_data
from glm_config import ProjectConfig

pc = ProjectConfig()


def model2train():
    # 创建分词器
    tokenizer = AutoTokenizer.from_pretrained(pc.pre_model, trust_remote_code=True)

    # 从预训练模型中加载配置信息
    config = AutoConfig.from_pretrained(pc.pre_model, trust_remote_code=True)

    # 因为peft支持prefix-tuning，如果需要做prefix-tuning，则需要设置一些相关的参数
    if pc.use_ptuning:
        config.pre_seq_len = pc.pre_seq_len
        config.prefix_projection = pc.prefix_projection

    # 创建模型
    model = AutoModel.from_pretrained(pc.pre_model, config=config, trust_remote_code=True)

    # 将模型参数转为 float 类型，因为在保存的时候，为了减少内存，很有可能进行了 model.half() 操作
    model = model.float()

    # 打印模型结构
    print('*' * 80)
    print(model)

    # 因为lora只更新旁支的参数，因此原模型中的激活值不需要保存
    model.gradient_checkpointing_enable()
    # 梯度检查点是一种优化技术，用于在反向传播过程中降低内存使用
    # 启用检查点时，在前向传播过程中，模型不会保存每一层的所有激活值，
    # 在反向传播期间，当需要某一层的激活值时，它会从该层的输入开始重新计算这一层的输出。
    # 这意味着虽然总的计算量有所增加（因为需要重新计算一些激活值），但内存使用显著降低。

    # 确保输入张量的.requires_grad属性被设置为True
    model.enable_input_require_grads()
    # 这通常是为了让模型在进行某些操作（例如使用梯度惩罚、生成对抗样本等）时能够对输入数据计算梯度。
    # 默认情况下，输入数据的.requires_grad可能是False，这意味着不会为输入计算梯度。
    # 启用此功能后，可以允许模型在输入数据上执行更复杂的梯度相关的操作。

    # 不进行缓存，减少内存
    model.config.use_cache = False

    # 对于prefix微调，则要将prefix_encoder的参数也转为float类型
    if pc.use_ptuning:
        model.transformer.prefix_encoder.float()

    # 查看要加分支的结构
    print('-' * 80)
    print(f'model.lm_head-->{model.lm_head}')

    # 加上LoRA分支
    if pc.use_lora:
        # 给模型的线性层套上CastOutputToFloat，确保数据输出被转换为 torch.float32 类型
        model.lm_head = CastOutputToFloat(model.lm_head)  # 这里实际上是调用 nn.Sequential(model.lm_head)
        # 分支设置
        peft_config = peft.LoraConfig(
            task_type=peft.TaskType.CAUSAL_LM,  # 模型类型（这里是因果模型）
            inference_mode=False,  # 推理时为True，这与是否使用dropout有关系
            r=pc.lora_rank,  # 低秩矩阵维度
            lora_alpha=32,  # 缩放系数，将旁支加入到主分支时的缩放系数
            lora_dropout=0.1,  # 旁支参数的随机失活率
        )
        # 将原模型与LoRA分支进行组合
        model = peft.get_peft_model(model, peft_config)  # 后续训练只更新旁支的参数，这样更新的参数就少了

    print('&' * 80)
    print(f'model2-->{model}')

    model = model.to(pc.device)

    # 打印模型训练参数
    print('%' * 80)
    print('模型可训练参数', model.print_trainable_parameters())

    # 指定不需要权重衰减（L2正则化）的参数
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [
        {
            "params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
            "weight_decay": pc.weight_decay,
        },
        {
            "params": [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
        },
    ]

    # 创建优化器
    optimizer = torch.optim.AdamW(optimizer_grouped_parameters, lr=pc.learning_rate)

    # 创建数据集导入器
    train_dataloader, dev_dataloader = get_data(tokenizer, pc.train_path, pc.dev_path, pc.batch_size)

    # 计算batch数，以便于scheduler动态调整lr
    num_update_steps_per_epoch = len(train_dataloader)

    # 计算训练步数，它会被学习率调度器用来确定学习率的变化规律，确保学习率在整个训练过程中得以合理地调节
    max_train_steps = pc.epochs * num_update_steps_per_epoch

    # 预热阶段的训练步数
    warm_steps = int(pc.warmup_ratio * max_train_steps)

    # 创建学习率调度器
    lr_scheduler = get_scheduler(
        name='linear',
        optimizer=optimizer,
        num_warmup_steps=warm_steps,
        num_training_steps=max_train_steps,
    )

    # 训练
    loss_list = []
    tic_train = time.time()
    global_step, best_eval_loss = 0, float('inf')
    print()
    for epoch in range(1, pc.epochs + 1):
        print("Epoch {} 开始训练".format(epoch))
        for batch in train_dataloader:
            if pc.use_lora:
                # torch.cuda.amp.autocast是PyTorch中一种混合精度的技术（仅在GPU上训练时可使用）
                with autocast():
                    # 使用混合精度计算时，系统会自动选择性地使用较低精度（如 float16）和标准精度（float32）的数据类型来加速计算过程
                    # 也就是说，某些权重会在计算时转化成fp16，同时输入数据也会转为fp16，导致输出也是fp16，
                    # 但我们对模型的线性层套了一层CastOutputToFloat，使得输出数据为fp32
                    loss = model(
                        input_ids=batch['input_ids'].to(dtype=torch.long, device=pc.device),
                        labels=batch['labels'].to(dtype=torch.long, device=pc.device)
                    ).loss
            else:
                loss = model(
                    input_ids=batch['input_ids'].to(dtype=torch.long, device=pc.device),
                    labels=batch['labels'].to(dtype=torch.long, device=pc.device)
                ).loss
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            lr_scheduler.step()

            # 收集损失
            loss_list.append(float(loss.cpu().detach()))

            # 打印日志
            global_step += 1
            if global_step % pc.logging_steps == 0:
                time_diff = time.time() - tic_train
                loss_avg = sum(loss_list) / len(loss_list)
                print("global step %d ( %02.2f%% ) , epoch: %d, loss: %.5f, speed: %.2f step/s, ETA: %s"
                      % (global_step,
                         global_step / max_train_steps * 100,
                         epoch,
                         loss_avg,
                         pc.logging_steps / time_diff,
                         second2time(int(max_train_steps - global_step) / (pc.logging_steps / time_diff))
                         ))
                tic_train = time.time()

            # 评估并保存最佳模型
            if global_step % pc.evaluate_freq == 0:
                # cur_save_dir = os.path.join(pc.save_dir, "model_%d" % global_step)
                # save_model(model, cur_save_dir)
                # tokenizer.save_pretrained(cur_save_dir)     # todo 分词器保存来干什么？
                # print(f'Model has saved at {cur_save_dir}.')

                # 模型评估
                eval_loss = evaluate_model(model, dev_dataloader)
                print("Evaluation Loss: %.5f" % (eval_loss))

                # 保存最佳模型
                if eval_loss < best_eval_loss:
                    print(f"Global step {global_step}")
                    print(f"Min eval loss has been updated: {best_eval_loss:.5f} --> {eval_loss:.5f}")
                    best_eval_loss = eval_loss
                    cur_save_dir = os.path.join(pc.save_dir, "model_best")
                    save_model(model, cur_save_dir)
                    tokenizer.save_pretrained(cur_save_dir)  # todo 分词器保存来干什么？
                    print(f'Best model has saved at {cur_save_dir}.')
                    print()
                tic_train = time.time()

        print("Epoch {} 结束训练".format(epoch))
        print('-' * 80)
        print()


if __name__ == '__main__':
    model2train()

上面有三条语句需要额外解释一下：

model.gradient_checkpointing_enable()
model.enable_input_require_grads()
model.config.use_cache = False

第一句是启用梯度检查点，这里需要介绍以下梯度检查点的概念。在训练神经网络时，特别是在处理像Transformer这样的深层模型时，内存（显存）消耗是一个主要问题。这是因为反向传播算法需要存储前向传播过程中产生的所有中间激活值（即每一层的输出），以便在计算梯度时使用。对于非常深的网络，这些激活值可能会占用大量的内存（显存）。梯度检查点技术通过仅保存部分激活值，并在需要时重新计算其他激活值来减少内存（显存）使用。虽然这种方法会增加一些计算时间，因为它需要重新计算某些激活值，但它显著减少了内存（显存）需求，使得能够训练更大规模的模型或使用更大的批量大小。

第二句的作用我也不知道。

第三句的作用是禁用模型的缓存机制。具体来说，在某些 Transformer 模型中，缓存机制用于存储中间激活值（如自注意力机制的键值对），以便在生成序列时可以更快地进行解码。然而，缓存机制会增加显存的使用。通过将 model.config.use_cache 设置为 False，可以禁用这些缓存，从而减少显存占用。这对于训练阶段尤其有用，因为在训练过程中通常不需要缓存这些中间激活值，因为每个训练步骤都会重新计算这些值。

这里还有一个知识点，即with autocast()。 PyTorch 中用于实现自动混合精度训练（Automatic Mixed Precision, AMP）的上下文管理器。其主要作用是在训练深度学习模型时，通过自动选择性地使用较低精度（如 float16）和标准精度（float32）的数据类型来加速计算过程，并减少显存占用，同时autocast()会智能地决定哪些操作应该用float16执行，哪些操作需要保持float32以确保数值稳定性。

之所以只保存最优模型的权重，是因为模型权重有12个G，如果每次评估完都保存，那么磁盘空间不够用。
输出：

v /data/miniconda/envs/torch/bin/python /data/.code-server/ms-python.debugpy-2025.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 47533 -- /data/coding/lora_chatglm/train.py 
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 8/8 [00:08<00:00,  1.10s/it]
********************************************************************************
ChatGLMForConditionalGeneration(
  (transformer): ChatGLMModel(
    (word_embeddings): Embedding(130528, 4096)
    (layers): ModuleList(
      (0-27): 28 x GLMBlock(
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attention): SelfAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=4096, out_features=12288, bias=True)
          (dense): Linear(in_features=4096, out_features=4096, bias=True)
        )
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (mlp): GLU(
          (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
          (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
        )
      )
    )
    (final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4096, out_features=130528, bias=False)
)
--------------------------------------------------------------------------------
model.lm_head-->Linear(in_features=4096, out_features=130528, bias=False)
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
model2-->PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): ChatGLMForConditionalGeneration(
      (transformer): ChatGLMModel(
        (word_embeddings): Embedding(130528, 4096)
        (layers): ModuleList(
          (0-27): 28 x GLMBlock(
            (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (attention): SelfAttention(
              (rotary_emb): RotaryEmbedding()
              (query_key_value): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=12288, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=12288, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear(in_features=4096, out_features=4096, bias=True)
            )
            (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (mlp): GLU(
              (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
              (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
            )
          )
        )
        (final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): CastOutputToFloat(
        (0): Linear(in_features=4096, out_features=130528, bias=False)
      )
    )
  )
)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
trainable params: 3,670,016 || all params: 6,176,956,416 || trainable%: 0.0594
模型可训练参数 None
Epoch 1 开始训练
global step 10 ( 1.77% ) , epoch: 1, loss: 2.59817, speed: 1.90 step/s, ETA: 00:04:52
(torch) root@eggbowwwilybkhdf-glow-578bf79bbb-sfb5h:/data/coding/lora_chatglm# ^C

(torch) root@eggbowwwilybkhdf-glow-578bf79bbb-sfb5h:/data/coding/lora_chatglm#  cd /data/coding/lora_chatglm ; /usr/bin/env /data/miniconda/envs/torch/bin/python /data/.code-server/ms-python.debugpy-2025.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher 46117 -- /data/coding/lora_chatglm/train.py 
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████| 8/8 [00:10<00:00,  1.26s/it]
********************************************************************************
ChatGLMForConditionalGeneration(
  (transformer): ChatGLMModel(
    (word_embeddings): Embedding(130528, 4096)
    (layers): ModuleList(
      (0-27): 28 x GLMBlock(
        (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (attention): SelfAttention(
          (rotary_emb): RotaryEmbedding()
          (query_key_value): Linear(in_features=4096, out_features=12288, bias=True)
          (dense): Linear(in_features=4096, out_features=4096, bias=True)
        )
        (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
        (mlp): GLU(
          (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
          (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
        )
      )
    )
    (final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=4096, out_features=130528, bias=False)
)
--------------------------------------------------------------------------------
model.lm_head-->Linear(in_features=4096, out_features=130528, bias=False)
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
model2-->PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): ChatGLMForConditionalGeneration(
      (transformer): ChatGLMModel(
        (word_embeddings): Embedding(130528, 4096)
        (layers): ModuleList(
          (0-27): 28 x GLMBlock(
            (input_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (attention): SelfAttention(
              (rotary_emb): RotaryEmbedding()
              (query_key_value): lora.Linear(
                (base_layer): Linear(in_features=4096, out_features=12288, bias=True)
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.1, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=4096, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=12288, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear(in_features=4096, out_features=4096, bias=True)
            )
            (post_attention_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
            (mlp): GLU(
              (dense_h_to_4h): Linear(in_features=4096, out_features=16384, bias=True)
              (dense_4h_to_h): Linear(in_features=16384, out_features=4096, bias=True)
            )
          )
        )
        (final_layernorm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): CastOutputToFloat(
        (0): Linear(in_features=4096, out_features=130528, bias=False)
      )
    )
  )
)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
trainable params: 3,670,016 || all params: 6,176,956,416 || trainable%: 0.0594
模型可训练参数 None

Epoch 1 开始训练
global step 10 ( 1.77% ) , epoch: 1, loss: 2.61709, speed: 1.91 step/s, ETA: 00:04:50
Evaluation Loss: 6.00277
Global step 10
Min eval loss has been updated: inf --> 6.00277
Best model has saved at checkpoints/ptune/model_best.

/data/miniconda/envs/torch/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
global step 20 ( 3.54% ) , epoch: 1, loss: 2.63276, speed: 1.58 step/s, ETA: 00:05:44
Evaluation Loss: 5.65405
Global step 20
Min eval loss has been updated: 6.00277 --> 5.65405
Best model has saved at checkpoints/ptune/model_best.

global step 30 ( 5.31% ) , epoch: 1, loss: 2.62342, speed: 1.58 step/s, ETA: 00:05:39
Evaluation Loss: 5.04809
Global step 30
Min eval loss has been updated: 5.65405 --> 5.04809
Best model has saved at checkpoints/ptune/model_best.

......

global step 110 ( 19.47% ) , epoch: 1, loss: 1.45005, speed: 1.58 step/s, ETA: 00:04:47
Evaluation Loss: 0.80976
Global step 110
Min eval loss has been updated: 0.96355 --> 0.80976
Best model has saved at checkpoints/ptune/model_best.

Epoch 1 结束训练
--------------------------------------------------------------------------------

......

global step 560 ( 99.12% ) , epoch: 5, loss: 0.39793, speed: 1.59 step/s, ETA: 00:00:03
Evaluation Loss: 0.14574
Global step 560
Min eval loss has been updated: 0.14616 --> 0.14574
Best model has saved at checkpoints/ptune/model_best.

Epoch 5 结束训练
--------------------------------------------------------------------------------

代码中还调用了模型评估函数evaluate_model，我们稍后介绍。

3.4 模型评估

train.py中还有一个名为evaluate_model的函数，代码如下：

def evaluate_model(model, dev_dataloader):
    """
    在测试集上评估当前模型的训练效果。

    Args:
        model: 当前模型
        data_loader: 测试集的dataloader
    """
    model.eval()
    loss_list = []
    with torch.no_grad():
        for batch in dev_dataloader:
            if pc.use_lora:
                with autocast():
                    loss = model(
                        input_ids=batch['input_ids'].to(dtype=torch.long, device=pc.device),
                        labels=batch['labels'].to(dtype=torch.long, device=pc.device)
                    ).loss
            else:
                loss = model(
                    input_ids=batch['input_ids'].to(dtype=torch.long, device=pc.device),
                    labels=batch['labels'].to(dtype=torch.long, device=pc.device)
                ).loss
            loss_list.append(float(loss.cpu().detach()))
    model.train()
    return sum(loss_list) / len(loss_list)

4 模型推理

推理代码在inference.py中，代码如下：

import time
import torch

from transformers import AutoTokenizer, AutoModel
# torch.set_default_tensor_type(torch.cuda.HalfTensor)


def inference(
        model,
        tokenizer,
        instuction: str,
        sentence: str
    ):
    """
    模型 inference 函数。

    Args:
        instuction (str): _description_
        sentence (str): _description_

    Returns:
        _type_: _description_
    """
    with torch.no_grad():
        # 提示词工程
        input_text = f"Instruction: {instuction}\n"
        print(f'input_text1-->{input_text}')
        if sentence:
            input_text += f"Input: {sentence}\n"
        print(f'input_text2--》{input_text}')
        input_text += f"Answer: "
        print(f'input_text3--》{input_text}')

        # 转换为模型输入
        batch = tokenizer(input_text, return_tensors="pt")
        print(f'batch--->{batch["input_ids"].shape}')

        # 推理
        out = model.generate(
            input_ids=batch["input_ids"].to(device),
            max_new_tokens=max_new_tokens,
            temperature=0
        )
        # print(f'out-->{out}')

        # 推理结果后处理
        out_text = tokenizer.decode(out[0])
        print(f'out_text-->{out_text}')
        answer = out_text.split('Answer: ')[-1]  # 截取答案
        return answer


if __name__ == '__main__':
    from rich import print

    device = 'cuda:0'
    max_new_tokens = 300        # 生成文本最大长度
    # max_new_tokens减去提示词的长度，才是我们要的答案的最大长度，即'Answer: '后面的内容的长度
    model_path = "checkpoints/model_1800"

    # 创建分词器
    tokenizer = AutoTokenizer.from_pretrained(
        model_path,
        trust_remote_code=True
    )

    # 创建模型
    model = AutoModel.from_pretrained(
        model_path,
        trust_remote_code=True
    ).half().to(device)

    # 测试数据，一个做信息抽取，一个做分类
    samples = [
        {
            'instruction': "现在你是一个非常厉害的SPO抽取器。",
            "input": "下面这句中包含了哪些三元组，用json列表的形式回答，不要输出除json外的其他答案。\n\n73获奖记录人物评价：黄磊是一个特别幸运的演员，拍第一部戏就碰到了导演陈凯歌，而且在他的下一部电影《夜半歌声》中演对手戏的张国荣、吴倩莲、黎明等都是著名的港台演员。",
        },
        {
            'instruction': "你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。",
            "input": "下面子中的主语是什么类别，输出成列表形式。\n\n第N次入住了，就是方便去客户那里哈哈。还有啥说的"
        }
    ]

    # 推理
    start = time.time()
    for i, sample in enumerate(samples):
        print(f'sample-->{sample}')
        res = inference(
            model,
            tokenizer,
            sample['instruction'],
            sample['input']
        )
        print(f'res {i}: ')
        print(res)
    print(f'Used {round(time.time() - start, 2)}s.')

输出：

Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|█████████████████████████████████████████| 3/3 [00:19<00:00,  6.41s/it]
Some weights of the model checkpoint at checkpoints/ptune/model_best were not used when initializing ChatGLMForConditionalGeneration: ['lm_head.0.weight']
- This IS expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ChatGLMForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ChatGLMForConditionalGeneration were not initialized from the model checkpoint at checkpoints/ptune/model_best and are newly initialized: ['lm_head.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
sample-->{'instruction': '现在你是一个非常厉害的SPO抽取器。', 'input': 
'下面这句中包含了哪些三元组，用json列表的形式回答，不要输出除json外的其他答案。\n\n73获奖记录人物评价：
黄磊是一个特别幸运的演员，拍第一部戏就碰到了导演陈凯歌，而且在他的下一部电影《夜半歌声》中演对手戏的张
国荣、吴倩莲、黎明等都是著名的港台演员。'}
input_text1-->Instruction: 现在你是一个非常厉害的SPO抽取器。

input_text2--》Instruction: 现在你是一个非常厉害的SPO抽取器。
Input: 下面这句中包含了哪些三元组，用json列表的形式回答，不要输出除json外的其他答案。

73获奖记录人物评价：黄磊是一个特别幸运的演员，拍第一部戏就碰到了导演陈凯歌，而且在他的下一部电影《夜半
歌声》中演对手戏的张国荣、吴倩莲、黎明等都是著名的港台演员。

input_text3--》Instruction: 现在你是一个非常厉害的SPO抽取器。
Input: 下面这句中包含了哪些三元组，用json列表的形式回答，不要输出除json外的其他答案。

73获奖记录人物评价：黄磊是一个特别幸运的演员，拍第一部戏就碰到了导演陈凯歌，而且在他的下一部电影《夜半
歌声》中演对手戏的张国荣、吴倩莲、黎明等都是著名的港台演员。
Answer: 
batch--->torch.Size([1, 91])
/data/miniconda/envs/torch/lib/python3.10/site-packages/transformers/generation/utils.py:1201: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
The dtype of attention mask (torch.int64) is not bool
out_text-->Instruction: 现在你是一个非常厉害的SPO抽取器。
Input: 下面这句中包含了哪些三元组,用json列表的形式回答,不要输出除json外的其他答案。

73获奖记录人物评价:黄磊是一个特别幸运的演员,拍第一部戏就碰到了导演陈凯歌,而且在他的下一部电影《夜半歌声
》中演对手戏的张国荣、吴倩莲、黎明等都是著名的港台演员。
Answer: ```json
[{"predicate": "导演", "object_type": "人物", "subject_type": "影视作品", "object": "陈凯歌", 
"subject": "夜半歌声"}, {"predicate": "主演", "object_type": "人物", "subject_type": "影视作品", 
"object": "张国荣", "subject": "夜半歌声"}, {"predicate": "主演", "object_type": "人物", 
"subject_type": "
res 0: 
```json
[{"predicate": "导演", "object_type": "人物", "subject_type": "影视作品", "object": "陈凯歌", 
"subject": "夜半歌声"}, {"predicate": "主演", "object_type": "人物", "subject_type": "影视作品", 
"object": "张国荣", "subject": "夜半歌声"}, {"predicate": "主演", "object_type": "人物", 
"subject_type": "
sample-->{'instruction': '你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。', 'input': 
'下面子中的主语是什么类别，输出成列表形式。\n\n第N次入住了，就是方便去客户那里哈哈。还有啥说的'}
input_text1-->Instruction: 你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。

input_text2--》Instruction: 你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。
Input: 下面子中的主语是什么类别，输出成列表形式。

第N次入住了，就是方便去客户那里哈哈。还有啥说的

input_text3--》Instruction: 你现在是一个很厉害的阅读理解器，严格按照人类指令进行回答。
Input: 下面子中的主语是什么类别，输出成列表形式。

第N次入住了，就是方便去客户那里哈哈。还有啥说的
Answer: 
batch--->torch.Size([1, 53])
out_text-->Instruction: 你现在是一个很厉害的阅读理解器,严格按照人类指令进行回答。
Input: 下面子中的主语是什么类别,输出成列表形式。

第N次入住了,就是方便去客户那里哈哈。还有啥说的
Answer: ["洗浴"]
res 1: 
["洗浴"]
Used 4.23s.