nvidia驱动升级-ubuntu 1804
升级
1.从官网下载*.run驱动文件
2.卸载原始驱动
sudo /usr/bin/nvidia-uninstall
sudo apt-get --purge remove nvidia-\* # 可能不需要加-\
sudo apt-get purge nvidia-\* # 可能不需要加-\
sudo apt-get purge libnvidia-\* # 可能不需要加-\
sudo apt-get autoremove # 可能不需要或执行失败
3.关闭进程
sudo systemctl isolate multi-user.target
sudo modprobe -r nvidia-drm
3.1关闭自启动gpu服务
4.重启服务器
reboot
5.安装驱动
sh *.run
5.1安装成功后,开启自启动gpu服务
6.重启服务器
—或者直接执行
./Tesla-V100-NVIDIA-Linux-x86_64-535.183.06.run -no-x-check
进行覆盖安装,若提醒有程序占用,则关闭/etc/rc.local中的启动项,重启服务器再进行安装。
安装过程提示:Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if your kernel changes later.您要向DKMS注册内核模块源代码吗?如果您稍后更改了内核,这将允许DKMS自动构建新模块。 可选择No。
异常处理:
0.需要先卸载之前的驱动:
1.报内核不匹配
apt install linux-headers-generic
2.sh RTX8000-NVIDIA-Linux-x86_64-535.129.03.run 安装GPU驱动报错:
ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your
kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel
supports module unloading, and you still receive this message, then an error may have occurred that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to reboot your computer.
3.执行
bash RTX8000-NVIDIA-Linux-x86_64-535.129.03.run
4.重启
rebbot