当前位置：首页 > article >正文

Qwen大模型Lora微调-Windows

article 2025/2/28 5:33:03

环境要求

python 3.8 and above
pytorch 1.12 and above, 2.0 and above are recommended
transformers 4.32 and above
CUDA 11.4 and above are recommended (this is for GPU users, flash-attention users, etc.)

微调步骤

1. 资源下载

Qwen：https://github.com/QwenLM/Qwen
qwen1_8B模型：https://modelscope.cn/models/Qwen/Qwen-1_8B-Chat
torch：https://download.pytorch.org/whl/torch_stable.html
flash-attention：https://github.com/Dao-AILab/flash-attention/releases/

2. 环境安装

conda create -n qwen python==3.10.1

# 安装torch
pip install "F:\llm\ptorch\torch-2.1.2+cu121-cp310-cp310-win_amd64.whl"

# 依赖
cd F:\github\Qwen
pip install -r requirements.txt

# 模型推理 web依赖包 图形化界面
pip install -r requirements_web_demo.txt

# 直接安装如果有问题，那就手动下载，本地安装
pip install "peft<0.8.0" deepspeed

# 非必须，模型加速，使用上面的连接下载到本地然后安装，手动编译我3个小时没编译完
pip install F:\llm\flash_attn-2.4.1+cu121torch2.1cxx11abiFALSE-cp310-cp310-win_amd64.whl

# 模型
git clone https://www.modelscope.cn/Qwen/Qwen-1_8B-Chat.git

3. 准备微调数据

看官网的微调格式：

[{"id":"identity_0","conversations":[{"from":"user","value":"你好"},{"from":"assistant","value":"我是一个语言模型，我叫通义千问。"}]}]

准备数据如下：
DISC-Law-SFT-Triplet-released-Qwen.json

4. 修改微调参数

单GPU Lora训练
源代码在：Qwen/finetune/finetune_lora_single_gpu.sh
因为要在windows上运行，所以改成.bat文件

set CUDA_DEVICE_MAX_CONNECTIONS=1
set CUDA_VISIBLE_DEVICES=0

python finetune.py ^
  --model_name_or_path F:\github\Qwen-1_8B-Chat ^
  --data_path F:\llm\data\DISC-Law-SFT\DISC-Law-SFT-Triplet-released-Qwen.json ^
  --bf16 True ^
  --output_dir output_qwen_lora\law ^
  --num_train_epochs 1 ^
  --per_device_train_batch_size 8 ^
  --per_device_eval_batch_size 1 ^
  --gradient_accumulation_steps 8 ^
  --evaluation_strategy "no" ^
  --save_strategy "steps" ^
  --save_steps 1000 ^
  --save_total_limit 10 ^
  --learning_rate 3e-4 ^
  --weight_decay 0.1 ^
  --adam_beta2 0.95 ^
  --warmup_ratio 0.01 ^
  --lr_scheduler_type "cosine" ^
  --logging_steps 1 ^
  --report_to "none" ^
  --model_max_length 500 ^
  --lazy_preprocess True ^
  --gradient_checkpointing ^
  --use_lora

参数介绍：
MODEL：模型路径
DATA：自定义数据集路径
output_dir：输出模型路径
num_train_epochs: 设置训练的轮数
model_max_length：模型处理序列长度，根据自身数据定义
per_device_train_batch_size: 训练批处理大小设置
save_steps: 模型每n步保存一次

5. Lora模型训练

.\finetune\finetune_lora_single_gpu.bat

在这里插入图片描述

一共有1.9亿参数，我们要训练的参数有500w，占比2.8%
我们只训练一轮，能看出微调效果就行，不需要实际效果好，默认训练5轮，需要3小时50分钟，设置1轮只需要1个小时17分钟，如果不开启flash-attention，可能还要多花20~30分钟
显存不够的，可以降低一点批训练处理的大小
上面有个警告：sequence length is longer than the specified maximum sequence length for this model (649 > 512)
是训练数据太长了，代码默认会截取，只影响效果，但是扩大长度会导致训练时间变长一倍，所以不用管它

在这里插入图片描述
4060Ti 16G 的显卡基本跑满了

6. 合并模型

使用下面代码进行模型合并

import os
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
def save_model_and_tokenizer(path_to_adapter, new_model_directory):
    if not os.path.exists(path_to_adapter):
        raise FileNotFoundError(f"路径不存在。")
    if not os.path.exists(new_model_directory):
        os.makedirs(new_model_directory, exist_ok=True)
    try:
        model = AutoPeftModelForCausalLM.from_pretrained(
            path_to_adapter,
            device_map="auto",
            trust_remote_code=True
        ).eval()
        merged_model = model.merge_and_unload()
        merged_model.save_pretrained(
            new_model_directory, 
            max_shard_size="2048MB", 
            safe_serialization=True
        )
        tokenizer = AutoTokenizer.from_pretrained(
            path_to_adapter,
            trust_remote_code=True
        )
        save_tokenizer(tokenizer, new_model_directory)
    except Exception as e:
        print(f"{e}")
        raise
def save_tokenizer(tokenizer, directory):
    tokenizer.save_pretrained(directory)
if __name__=="__main__":
    lora_model_path="F:\\github\\Qwen\\output_qwen_lora\\law"
    new_model_directory = "F:\\github\\Qwen\\output_qwen_merge\\Qwen-1_8B-Chat_law_merge"
    save_model_and_tokenizer(lora_model_path, new_model_directory)

python .\qwen_lora_merge.py

7. 验证微调模型

针对性测试，我们直接根据训练数据进行提问

基于下列案件，推测可能的判决结果。\n被告人白某某在大东区小河沿公交车站乘坐被害人张某某驾驶的133路公交车，被告人白某某因未能下车而与司机张某某发生争执，并在该公交车行驶中用手拉拽档杆，被证人韩某某拉开后，被告人白某某又用手拉拽司机张某某的右胳膊，导致该车失控撞向右侧马路边停放的轿车和一个路灯杆，路灯杆折断后将福锅记炖品店的牌匾砸坏。经鉴定，公交车受损价值人民币5,189.9元，轿车受损价值人民币1,449.57元，路灯杆受损价值人民币2,927.15元，福锅记饭店牌匾受损价值人民币9,776元，本案损失价值共计人民币19,342.6元。

（1）老模型运行

 python web_demo.py --server-name 0.0.0.0 -c F:\github\Qwen-1_8B-Chat

打开浏览器：http://localhost:8000/ 进行对话

在这里插入图片描述

（2）新模型运行

python web_demo.py --server-name 0.0.0.0 -c  F:\github\Qwen\output_qwen_merge\Qwen-1_8B-Chat_law_merge

打开浏览器：http://localhost:8000/ 进行对话
在这里插入图片描述
可以看出来回答的格式已经变了，开头都是【根据《xxx》xxx的规定】回答的内容也有点像模像样，不过仔细看，其实规定条款找错了，模型可以多训练几轮，然后模型输入的最大长度调整为700，甚至样本数据再多一点，最后再看效果。

环境问题

运行微调脚本报错

在这里插入图片描述
我的环境和官网有差异，Accelerator这个函数没有dispatch_batches这个参数，手动注释掉
Lib\site-packages\transformers\trainer.py

        self.accelerator = Accelerator(
            #dispatch_batches=self.args.dispatch_batches,
            split_batches=self.args.split_batches,
            deepspeed_plugin=self.args.deepspeed_plugin,
            gradient_accumulation_plugin=gradient_accumulation_plugin,
        )

整体环境配置

torch                         2.1.2+cu121
flash_attn                    2.4.1
deepspeed                     0.15.5+unknown
peft                          0.7.1

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:55:00_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0