使用unsloth进行grpo训练报错及解决方法
说明
前段时间用unsloth尝试了grpo训练,简单复现了deepseek用到的强化学习训练方法。期间遇到了很多问题,简单记录下解决办法。
问题1:pip install 报错
报错信息:
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: EE certificate key too weak (_ssl.c:1007)'))': /simple/diffusers/
我之前设置了清华的pip源。
所以我的解决方法是:pip后面加上–trusted-host
pip install xxx --trusted-host pypi.tuna.tsinghua.edu.cn
问题2:docker内访问gpu报错
错误现象:import torch报错
/usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:129: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 205: mapping of buffer object failed (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
原因:docker 启动时忘记加–ipc=host了。
解决办法:docker run时加上–ipc=host
docker run -dit --gpus all --ipc=host --name my_unsloth2 docker.1ms.run/uptospace/unsloth:latest
问题3: 导入PatchFastRL出错
问题现象:
from unsloth import FastLanguageModel, PatchFastRL
PatchFastRL("GRPO", FastLanguageModel)
报错信息:
ImportError: cannot import name 'DDPOStableDiffusionPipeline' from 'trl.models' (/usr/local/lib/python3.10/dist-packages/trl/models/__init__.py)
The above exception was the direct cause of the following exception:
xxxxx
RuntimeError: Failed to import trl.trainer.alignprop_trainer because of the following error (look up to see its traceback):
cannot import name 'DDPOStableDiffusionPipeline' from 'trl.models' (/usr/local/lib/python3.10/dist-packages/trl/models/__init__.py)
解决办法:安装diffusers。参考https://github.com/unslothai/unsloth/issues/1637
pip install trl[diffusers]
问题4:加载GRPOTrainer报错
报错信息:
[rank0]: File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 502, in __reduce__
[rank0]: raise RuntimeError("LLMEngine should not be pickled!")
[rank0]: RuntimeError: LLMEngine should not be pickled!
[rank0]:[W305 01:49:49.129079890 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
解决办法参考:https://github.com/unslothai/unsloth/issues/1632
解决办法:将PatchFastRL(“GRPO”, FastLanguageModel)放在import trl之前
问题5:学习率类型错误导致的异常
报错信息:
Exception: ['<=' not supported between instances of 'float' and 'str']
错误分析:传入的学习率类型是str,改为float类型就好了