megatron训练gpt
一 安装docker环境
NVIDIA/Megatron-LM: Ongoing research training transformer models at scale (github.com)
1.1 拉镜像
docker pull nvcr.io/nvidia/pytorch:24.08-py3
1.2 下载megatron
git clone https://github.com/NVIDIA/Megatron-LM.git
切换为tag 为core 6的版本
执行命令:
git checkout tag
1.3 复制数据集
数据集格式为: