八、服务器【Ubuntu】GPU-TeslaP100部署
服务器[Ubuntu]GPU-TeslaP100部署服务器[Ubuntu]GPU-TeslaP100部署一、初始设置1、禁用nouveau二、Nvidia驱动1、驱动下载链接:https://www.nvidia.cn/Download/index.aspx?lang=cn2、实际下载链接:3、安装三、CUDA【不需要安装】1、驱动下载链接:https://developer.nvidia.com
服务器[Ubuntu]GPU-TeslaP100部署
服务器[Ubuntu]GPU-TeslaP100部署
一、初始设置
1、禁用nouveau
lsmod | grep nouveau
无显示,则不必再设置。
1.1【ubuntu】设置为:
1.1.1 执行 sudo vim /etc/modprobe.d/blacklist.conf, 在文件末尾添加一句blacklist nouveau
1.1.2 执行sudo update-initramfs -u并重启
1.1.3 重启电脑后执行lsmod | grep nouveau,如果没有输出则说明禁用nouveau成功
1.2【centos】参考:https://sixiangdefairy.blog.csdn.net/article/details/108118951

二、Nvidia驱动
1、驱动下载链接:https://www.nvidia.cn/Download/index.aspx?lang=cn
2、实际下载链接:
3、安装
参考:https://www.nvidia.cn/Download/driverResults.aspx/169718/cn
i) `dpkg -i nvidia-driver-local-repo-ubuntu1604-460.32.03_1.0-1_amd64.deb’ for Ubuntu
ii) `apt-get update`
iii) `apt-get install cuda-drivers`
iv) `reboot`

3.1 如下图为,ubuntu18.04+teslaP100安装的driver440的版本安装成功图:

三、CUDA【不需要安装】
1、驱动下载链接:https://developer.nvidia.com/cuda-toolkit-archive
2、实际下载链接:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-ubuntu1604.pin
sudo mv cuda-ubuntu1604.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.2.1/local_installers/cuda-repo-ubuntu1604-11-2-local_11.2.1-460.32.03-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604-11-2-local_11.2.1-460.32.03-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1604-11-2-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
3、安装
四、CUDNN【不需要安装】
1、驱动下载链接:https://developer.nvidia.com/rdp/cudnn-archive
2、实际下载链接:
3、安装
将文件重命名, 以.tgz作为后缀, 然后使用tar -zxvf file.tgz命令解压即可
解压下载的文件,可以看到cuda文件夹,在当前目录打开终端,执行如下命令:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
4、测试:
查看CUDNN版本:
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

五、Docker安装
1、切换阿里云源
填入如下内容:
deb http://mirrors.aliyun.com/ubuntu/ xenial main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-security main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-updates main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-proposed main restricted universe multiverse
deb http://archive.ubuntu.com/ubuntu/ xenial-backports main restricted universe multiverse
apt-get update
2、安装docker
参考:https://blog.csdn.net/qq_27731689/article/details/92969266
#在Ubuntu系统中安装较为简单,官方提供了脚本供我们进行安装。
sudo apt install curl
curl -fsSL get.docker.com -o get-docker.sh
sudo sh get-docker.sh --mirror Aliyun
3、启动docker
参考:https://blog.csdn.net/qq_27731689/article/details/92969266
sudo systemctl enable docker
sudo systemctl start docker
4、问题解决(若无此问题,跳过):
问题:
root@ubuntu:/pro_setup/software/nvidia# sudo curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu16.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo: sudounable to resolve host ubuntu:
unable to resolve host ubuntu
解决方案:
vi /etc/hosts
填入:
127.0.1.1 ubuntu
5、nvidia-docker 离线安装成功!!
1)离线安装包下载:http://mirror.cs.uchicago.edu/nvidia-docker/nvidia-container-runtime/stable/ubuntu16.04/amd64/
2)安装步骤参考:https://blog.51cto.com/dldxzjr/2541070
3)安装:
准备以下几个安装包:
libnvidia-container1_1.0.1-1_amd64.deb
libnvidia-container-tools_1.0.1-1_amd64.deb
nvidia-container-runtime_3.1.4-1_amd64.deb
nvidia-container-toolkit_1.0.5-1_amd64.deb
安装:
sudo apt install ./lib* ./nvidia*
更新daemon.json
sudo tee /etc/docker/daemon.json <<EOF
{
"default-runtime":"nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
EOF
重启docker
sudo systemctl daemon-reload
sudo systemctl restart docker
sudo pkill -SIGHUP dockerd
测试:
也可通过:https://hub.docker.com/进行查询版本。
docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi
docker run --gpus all --rm nvidia/cuda nvidia-smi
昇腾计算产业是基于昇腾系列(HUAWEI Ascend)处理器和基础软件构建的全栈 AI计算基础设施、行业应用及服务,https://devpress.csdn.net/organization/setting/general/146749包括昇腾系列处理器、系列硬件、CANN、AI计算框架、应用使能、开发工具链、管理运维工具、行业应用及服务等全产业链
更多推荐

所有评论(0)