当前位置：首页 > article >正文

使用unsloth进行grpo训练报错及解决方法

article 2025/3/17 9:30:54

说明

前段时间用unsloth尝试了grpo训练，简单复现了deepseek用到的强化学习训练方法。期间遇到了很多问题，简单记录下解决办法。

问题1：pip install 报错

报错信息：

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1007)'))': /simple/diffusers/

我之前设置了清华的pip源。
所以我的解决方法是：pip后面加上–trusted-host

pip install xxx --trusted-host  pypi.tuna.tsinghua.edu.cn

问题2：docker内访问gpu报错

错误现象：import torch报错

/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 205: mapping of buffer object failed (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)

原因：docker 启动时忘记加–ipc=host了。
解决办法：docker run时加上–ipc=host

docker run -dit --gpus all --ipc=host   --name my_unsloth2 docker.1ms.run/uptospace/unsloth:latest

问题3：导入PatchFastRL出错

问题现象：

from unsloth import FastLanguageModel, PatchFastRL
PatchFastRL("GRPO", FastLanguageModel)

报错信息：
ImportError: cannot import name 'DDPOStableDiffusionPipeline' from 'trl.models' (/usr/local/lib/python3.10/dist-packages/trl/models/__init__.py)

The above exception was the direct cause of the following exception:
xxxxx

RuntimeError: Failed to import trl.trainer.alignprop_trainer because of the following error (look up to see its traceback):
cannot import name 'DDPOStableDiffusionPipeline' from 'trl.models' (/usr/local/lib/python3.10/dist-packages/trl/models/__init__.py)

解决办法：安装diffusers。参考https://github.com/unslothai/unsloth/issues/1637

pip install trl[diffusers]

问题4：加载GRPOTrainer报错

报错信息：

[rank0]:   File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 502, in __reduce__
[rank0]:     raise RuntimeError("LLMEngine should not be pickled!")
[rank0]: RuntimeError: LLMEngine should not be pickled!
[rank0]:[W305 01:49:49.129079890 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present,  but this warning has only been added since PyTorch 2.4 (function operator())

解决办法参考：https://github.com/unslothai/unsloth/issues/1632

解决办法：将PatchFastRL(“GRPO”, FastLanguageModel)放在import trl之前

问题5：学习率类型错误导致的异常

报错信息：

Exception: ['<=' not supported between instances of 'float' and 'str']

错误分析：传入的学习率类型是str，改为float类型就好了

查看全文

http://www.kler.cn/a/588171.html

netty中黏包，半包

HDR图像处理：色调映射和色域映射参数

蓝桥杯好题推荐----最大字段和

使用生成对抗网络（GAN）进行人脸老化生成的Python示例

【机器学习chp13--（下）】人工神经网络—优化算法

HCIP笔记整理（一）

conda的基本使用及pycharm里设置conda环境

AI绘画软件Stable Diffusion详解教程（11）：图生图进阶篇（局部用上传蒙版重绘）

7个 Vue 路由守卫的执行顺序

为训练大模型而努力-分享2W多张卡通头像的图片

蓝桥杯真题——洛谷 day 9 枚举、贪心、找规律

C语言之数据结构顺序表的实现

网页制作代码html制作一个网页模板

【Agent】OpenManus-Tool 详细分析

一周学会Flask3 Python Web开发-SQLAlchemy删除数据操作-班级模块

Ubuntu 下有线网络图标消失及无法连接网络的解决方案

Java 多线程编程：提升系统并发处理能力！

Touch panel功能不良分析

RAG的工作原理以及案例列举

2.8滑动窗口专题：最小覆盖子串

说明

问题1：pip install 报错

问题2：docker内访问gpu报错

问题3： 导入PatchFastRL出错

问题4：加载GRPOTrainer报错

问题5：学习率类型错误导致的异常

相关文章：

问题3：导入PatchFastRL出错