见面礼,动态查看gpu使用环境,每隔2秒钟自动执行一次 nvidia-smi
$ watch -n 2 nvidia-smi
1,找一台nv kmd列表中支持的 GPU 的电脑,安装ubuntu22.04
列表见 github of the kmd source code。
因为 cuda sdk 12.3支持最高到 ubuntu 22.04,故
下载 ubuntu 22.04...iso
rufus 刷U盘
重启电脑,F2F8F10F12一起按
进入 setup,修改启动顺序,选U盘第一
一步步安装好,
reboot
修改apt 国内源
为编译Linux kernel 安装软件:
- sudo apt update
- sudo apt upgrade
- sudo apt install build-essential
复制代码- sudo apt-get update && sudo apt-get install libncurses-dev && sudo apt-get install build-essential && sudo apt-get install flex bison && sudo apt-get install libssl-dev && sudo apt-get install binutils && sudo apt-get install libelf-dev && sudo apt-get install openssh-server && sudo apt-get install vim && sudo apt-get install bc && sudo apt-get install dwarves && sudo apt-get install zstd
复制代码- sudo apt-get update
- sudo apt-get install libssl-dev
- sudo apt-get install binutils
- sudo apt-get install libelf-dev
- sudo apt-get install dwarves
复制代码
2,重新编译安装Linux kernel
sudo apt install linux-source-6.5.0
- sudo apt install linux-source-6.5.0
- ls
- mkdir ex_kernel_linux_debug
- cd ex_kernel_linux_debug/
- ls
- cp /usr/src/linux-source-6.5.0.tar.bz2 ./
- tar -xvjf linux-source-6.5.0.tar.bz2
- cd linux-source-6.5.0/
- cp /boot/config-6.5.0-44-generic ./.config
- make oldconfig
复制代码 编译 kernel :
$ make -j
安装 kernel :
- $ sudo make modules_install
- $ sudo make install
- $ sudo reboot
复制代码
3, 安装 cuda sdk 12.3 但保留末了两步
按照nv官方步骤,先执行step1的安装
3.1 安装之前必要设置黑名单,官方指导
- https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#removing-cuda-toolkit-and-driver
复制代码 详细利用:
复制如下:
- 8.3.6. Ubuntu
- Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:
- blacklist nouveau
- options nouveau modeset=0
- Regenerate the kernel initramfs:
- sudo update-initramfs -u
复制代码
3.2 安装 cuda sdk step 1
利用链接:
- https://developer.nvidia.com/cuda-12-3-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_local
复制代码- wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
- sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
- wget https://developer.download.nvidia.com/compute/cuda/12.3.0/local_installers/cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
- sudo dpkg -i cuda-repo-ubuntu2204-12-3-local_12.3.0-545.23.06-1_amd64.deb
- sudo cp /var/cuda-repo-ubuntu2204-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
- sudo apt-get update
- sudo apt-get -y install cuda-toolkit-12-3
复制代码
4, 使用开源代码替换step2
4.1 下载编译 NV gpu 的开源 kmd
下载:
https://github.com/NVIDIA/open-gpu-kernel-modules
- git clone https://github.com/NVIDIA/open-gpu-kernel-modules.git
- cd open-gpu-kernel-modules
- git checkout 545.23.06
- git branch
复制代码
有时候下载会失败,下载下来后做好备份。大概 fork 到本身的github 账号后再clone
4.2 编译安装
- make clean
- make -j12
- sudo make modules_install
- sudo make install
- sudo reboot
复制代码
5,执行step 3 安装 cuda
- sudo apt-get install -y cuda-drivers-545
复制代码 测试:
$ nvidia-smi
$ ./vectorAdd
6,怎么验证这个kmd是从 源码安装的呢?
在开源代码中 加点printk等代码看看:
重新编译安装
$ make modules -j
$ make modules_install -j
重启电脑
$ sudo reboot
然后执行:
$ sudo dmesg
这名加载的是开源代码的ko文件。
运行APP:
备忘个链接:
Index of /XFree86/FreeBSD-x86_64/520.56.06
https://images.nvidia.com/content/pdf/nvswitch-technical-overview.pdf
- https://www.amax.com/unleashing-next-level-gpu-performance-with-nvidia-nvlink/
复制代码- https://www.nvidia.com/en-us/data-center/nvlink/
复制代码
- https://hc34.hotchips.org/assets/program/conference/day2/Network%20and%20Switches/NVSwitch%20HotChips%202022%20r5.pdf
复制代码
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。 |