【AI】openEuler 22.03 LTS SP4安装 docker NVIDIA Container Toolkit
NVIDIA Container Toolkit
打开网址
Unsupported distribution or misconfigured repository settings | NVIDIA Container Toolkit
为方便离线安装,先下载过来
wget https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
mkdir rpms
yumdownloader --resolve --destdir=./rpms/ nvidia-container-toolkit
离线安装
# yum install ./*.rpm
Last metadata expiration check: 0:12:41 ago on Fri 21 Feb 2025 05:15:45 PM CST.
Dependencies resolved.
=================================================================================================================================================================
Package Architecture Version Repository Size
=================================================================================================================================================================
Installing:
libnvidia-container-tools x86_64 1.17.4-1 @commandline 40 k
libnvidia-container1 x86_64 1.17.4-1 @commandline 1.0 M
nvidia-container-toolkit x86_64 1.17.4-1 @commandline 1.2 M
nvidia-container-toolkit-base x86_64 1.17.4-1 @commandline 5.6 M
Transaction Summary
=================================================================================================================================================================
Install 4 Packages
Total size: 7.9 M
Installed size: 26 M
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : nvidia-container-toolkit-base-1.17.4-1.x86_64 1/4
Installing : libnvidia-container1-1.17.4-1.x86_64 2/4
Running scriptlet: libnvidia-container1-1.17.4-1.x86_64 2/4
Installing : libnvidia-container-tools-1.17.4-1.x86_64 3/4
Installing : nvidia-container-toolkit-1.17.4-1.x86_64 4/4
Running scriptlet: nvidia-container-toolkit-1.17.4-1.x86_64 4/4
Verifying : libnvidia-container1-1.17.4-1.x86_64 1/4
Verifying : libnvidia-container-tools-1.17.4-1.x86_64 2/4
Verifying : nvidia-container-toolkit-1.17.4-1.x86_64 3/4
Verifying : nvidia-container-toolkit-base-1.17.4-1.x86_64 4/4
Installed:
libnvidia-container-tools-1.17.4-1.x86_64 libnvidia-container1-1.17.4-1.x86_64 nvidia-container-toolkit-1.17.4-1.x86_64
nvidia-container-toolkit-base-1.17.4-1.x86_64
Complete!
Docker
手动下载最新版本
https://download.docker.com/linux/static/stable/x86_64/docker-28.0.0.tgz
wget https://download.docker.com/linux/static/stable/x86_64/docker-28.0.0.tgz
[root@localhost media]# tar -xvf docker-28.0.0.tgz
docker/
docker/containerd-shim-runc-v2
docker/containerd
docker/docker
docker/runc
docker/ctr
docker/dockerd
docker/docker-init
docker/docker-proxy
[root@localhost media]# mv -v docker/* /usr/local/bin/
renamed 'docker/containerd' -> '/usr/local/bin/containerd'
renamed 'docker/containerd-shim-runc-v2' -> '/usr/local/bin/containerd-shim-runc-v2'
renamed 'docker/ctr' -> '/usr/local/bin/ctr'
renamed 'docker/docker' -> '/usr/local/bin/docker'
renamed 'docker/dockerd' -> '/usr/local/bin/dockerd'
renamed 'docker/docker-init' -> '/usr/local/bin/docker-init'
renamed 'docker/docker-proxy' -> '/usr/local/bin/docker-proxy'
renamed 'docker/runc' -> '/usr/local/bin/runc'
[root@localhost media]# ll docker
total 0
[root@localhost media]# ll /usr/local/bin/
total 206856
-rwxr-xr-x. 1 1000 1000 40415384 Feb 20 06:11 containerd
-rwxr-xr-x. 1 1000 1000 13299864 Feb 20 06:11 containerd-shim-runc-v2
-rwxr-xr-x. 1 1000 1000 20394136 Feb 20 06:11 ctr
-rwxr-xr-x. 1 1000 1000 41532216 Feb 20 06:11 docker
-rwxr-xr-x. 1 1000 1000 76647872 Feb 20 06:11 dockerd
-rwxr-xr-x. 1 1000 1000 708448 Feb 20 06:11 docker-init
-rwxr-xr-x. 1 1000 1000 2377328 Feb 20 06:11 docker-proxy
-rwxr-xr-x. 1 1000 1000 16426200 Feb 20 06:11 runc
创建 /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/docker
EnvironmentFile=-/etc/sysconfig/docker-storage
EnvironmentFile=-/etc/sysconfig/docker-network
Environment=GOTRACEBACK=crash
Environment=GOTRACEBACK=crash
ExecStart=/usr/local/bin/dockerd $OPTIONS \
$DOCKER_STORAGE_OPTIONS \
$DOCKER_NETWORK_OPTIONS \
$INSECURE_REGISTRY
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=1048576
LimitNPROC=1048576
LimitCORE=infinity
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
[Install]
WantedBy=multi-user.target
nvidia-ctk配置runtime
[root@localhost media]# nvidia-ctk runtime configure --runtime=docker
INFO[0000] Config file does not exist; using empty config
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
[root@localhost media]# cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
启动Docker服务
[root@localhost media]# systemctl enable docker --now
Created symlink /etc/systemd/system/multi-user.target.wants/docker.service → /usr/lib/systemd/system/docker.service.
[root@localhost ~]# docker info
Client:
Version: 28.0.0
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 28.0.0
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: nvidia runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: bcc810d6b9066471b0b6fa75f557a15a1cbf31bb
runc version: v1.2.5-0-g59923ef
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 5.10.0-216.0.0.115.oe2203sp4.x86_64
Operating System: openEuler 22.03 (LTS-SP4)
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 30.46GiB
Name: localhost.localdomain
ID: e146eb60-c3e3-41d9-bf61-71e7cd5707f9
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
::1/128
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
验证Docker nvidia-smi
随便找个镜像,采用--gpus=all参数执行nvidia-smi,如果不配置--gpus参数,容器内没有注入nvidia-smi指令
[root@localhost ollama]# docker run --rm -it ubuntu:22.04 nvidia-smi -l 1
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "nvidia-smi": executable file not found in $PATH: unknown
Run 'docker run --help' for more information
[root@localhost ollama]# docker run --rm -it --gpus=all ubuntu:22.04 nvidia-smi -l 1
Fri Feb 21 10:08:47 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.10 Driver Version: 570.86.10 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:0C:00.0 Off | Off |
| 30% 27C P8 18W / 450W | 8173MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 4090 Off | 00000000:25:00.0 Off | Off |
| 30% 28C P8 28W / 450W | 7821MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA GeForce RTX 4090 Off | 00000000:32:00.0 Off | Off |
| 30% 27C P8 5W / 450W | 7821MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA GeForce RTX 4090 Off | 00000000:45:00.0 Off | Off |
| 30% 27C P8 30W / 450W | 7821MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA GeForce RTX 4090 Off | 00000000:58:00.0 Off | Off |
| 30% 28C P8 18W / 450W | 7327MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA GeForce RTX 4090 Off | 00000000:84:00.0 Off | Off |
| 30% 28C P8 21W / 450W | 7327MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA GeForce RTX 4090 Off | 00000000:D4:00.0 Off | Off |
| 30% 28C P8 22W / 450W | 8009MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Fri Feb 21 10:08:49 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.86.10 Driver Version: 570.86.10 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
参考
nstalling the NVIDIA Container Toolkit — NVIDIA Container Toolkit