【linux】open欧拉安装显卡驱动以及cuda12.8
文章目录
- 安装驱动步骤
- 安装cuda12.8
- 安装nvidia-container-toolkit(为docker提供gpu调用能力)
- 报错及解决方案
- 1. ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your ...
- 2. nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system.
- 3. ERROR: The Nouveau kernel driver is currently in use by your system. This
- 4. docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
- 5. Unsupported model IR version: 9, max supported IR version: 8
- 其他资料
安装驱动步骤
- 执行 ‘./NVIDIA-Linux-x86_64-570.133.20.run --no-opengl-files --no-x-check --no-nouveau-check’
选择第一个nvidia驱动,等待build
- 忽略x11报错,一直回车
- 成功
安装cuda12.8
./cuda_12.8.0_570.86.10_linux.run
注意这一步开始会卡一分多钟,记得接个水。 accept一下。
- 等成功,要添加path和library path 。
vim ~/.bashrc
添加
export PATH=/usr/local/cuda-12.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH
source ~/.bashrc
3. 测试nvcc --version 成功。
4. 测试nvidia-smi , 两张tesla t4
安装nvidia-container-toolkit(为docker提供gpu调用能力)
docker识别不到显卡, 解决方案:
下载源:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
可选,没有就可以跳过
sudo dnf-config-manager --enable nvidia-container-toolkit-experimental
安装nvidia-container-toolkit
sudo dnf install -y nvidia-container-toolkit
配置并重启docker
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
报错及解决方案
1. ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your …
dnf install kernel-devel-$(uname -r) kernel-headers
2. nvidia-installer was forced to guess the X library path ‘/usr/lib64’ and X module path ‘/usr/lib64/xorg/modules’; these paths were not queryable from the system.
./NVIDIA-Linux-x86_64-570.133.20.run --no-opengl-files --no-x-check --no-nouveau-check
3. ERROR: The Nouveau kernel driver is currently in use by your system. This
driver is incompatible with the NVIDIA driver, and must be disabled before proceeding. Please consult the NVIDIA driver README and your Linux distribution’s documentation for details on how to correctly disable the Nouveau kernel driver.
1.打开yast—软件管理—搜索nouveau,把列出来相关的并且已经安装的包卸载;
2.打开/etc/modprobe.d/50-blacklist.conf,在里面添加一条:
blacklist nouveau
4. docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]]
docker识别不到显卡, 解决方案:
下载源:
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
可选,没有就可以跳过
sudo dnf-config-manager --enable nvidia-container-toolkit-experimental
安装
sudo dnf install -y nvidia-container-toolkit
修改配置重启docker
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
5. Unsupported model IR version: 9, max supported IR version: 8
升级triton版本以支持更高的onnx格式
其他资料
- 英伟达驱动官网:https://www.nvidia.cn/drivers/lookup/
- cuda12.8以及末尾历史版本下载: https://developer.nvidia.com/cuda-12-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=RHEL&target_version=8&target_type=runfile_local