Debain12.9安装NCCL GPU通讯组件
Debain12.9安装NCCL GPU通讯组件
- 硬件信息
- 安装NCCL组件
- 测试NCCL组件
硬件信息
操作系统:Debain 12.9/Ubuntu 24.04
CPU:i7-10750H
内存:32G
显卡:GTX 1650(4G)
硬盘:SSD(1T)
系统安装时选择清华大学源
安装NCCL组件
wget https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -c cuda-keyring_1.1-1_all.deb.1
dpkg -i cuda-keyring_1.1-1_all.deb.1
cp /etc/apt/sources.list /etc/apt/sources.list.d/sources-testing.list
vi /etc/apt/sources.list.d/sources-testing.list
%s/bookworm/testing/g
apt update
apt install -y libc6-dev libc6
apt install -y libnccl2 libnccl-dev
mv /etc/apt/sources.list.d/sources-testing.list /etc/apt/sources.list.d/sources-testing.list.bak
ldconfig -p | grep libnccl
测试NCCL组件
git clone https://gitee.com/xqxyxchy/nccl-tests.git
cd nccl-tests && make
# ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <num_gpus>
# 替换 <num_gpus> 为 GPU 数量
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 1
# ./build/all_reduce_perf -b 8 -e 256M -f 2 -g <num_gpus> -c 1 -n 100 -m <IPs>
# 替换 <num_gpus> 为 GPU 数量
# 替换 <IPs> 为 参与机器IP,用,分割
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 1 -c 1 -n 100 -m 92.168.3.18,192.168.3.17