华为昇腾910B上用Kubernetes(K8s)部署LLM(Qwen3-32B)的详细步骤,保姆级命令及方法、下载链接等

在 华为昇腾910B(Ascend 910B 上使用 Kubernetes(K8s)部署大语言模型(LLM 的 保姆级详细步骤,包含完整命令、配置文件、官方下载链接及关键说明。

✅ 适用场景:生产环境、多节点集群、高可用推理服务
⚠️ 前提:已有一套运行中的 Kubernetes 集群,并且目标节点已安装昇腾910B硬件

一、整体流程概览

  1. 安装昇腾驱动与固件
  2. 安装 CANN 工具包(含 Runtime + Toolkit)
  3. 配置 Docker 支持 NPU
  4. 部署 Ascend Device Plugin 到 K8s
  5. 构建或拉取支持昇腾的 LLM 镜像(如 vLLM-Ascend)
  6. 编写 K8s Deployment/Service YAML 部署 LLM
  7. 验证与测试

二、详细步骤(含命令)

步骤 1:安装昇腾驱动与固件

📌 操作系统要求:openEuler 22.03 / Ubuntu 22.04(ARM64)

1.1 下载驱动 & 固件(官方)

以 CANN 7.0+ 为例(910B 推荐 CANN 7.0 或 8.0):

bash

# 示例:下载驱动(ARM64)

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend-hdk-910b-npu-driver_24.1.rc3_linux-aarch64.run

# 示例:下载固件

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Ascend-hdk-910b-npu-firmware_7.5.0.1.129.run

1.2 安装驱动 & 固件

bash

# 赋予执行权限

chmod +x Ascend-hdk-910b-npu-driver_*.run

chmod +x Ascend-hdk-910b-npu-firmware_*.run

# 安装固件(先装)

sudo ./Ascend-hdk-910b-npu-firmware_*.run --full

# 安装驱动

sudo ./Ascend-hdk-910b-npu-driver_*.run --full

# 重启生效

sudo reboot

1.3 验证安装

bash

npu-smi info

# 应显示 8 张 910B 卡信息

步骤 2:安装 CANN 工具包

CANN = Compute Architecture for Neural Networks(类似 CUDA)

2.1 下载 CANN(官方)

bash

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/7.0.1.1/Ascend-cann-toolkit_7.0.1.1_linux-aarch64.run

wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/7.0.1.1/Ascend-cann-kernels-910b_7.0.1.1_linux.run

2.2 安装 CANN

bash

chmod +x Ascend-cann-*.run

# 安装 toolkit(开发用)

sudo ./Ascend-cann-toolkit_7.0.1.1_linux-aarch64.run --install

# 安装 kernels(运行时算子)

sudo ./Ascend-cann-kernels-910b_7.0.1.1_linux.run –install

2.3 配置环境变量(/etc/profile

bash

echo 'export ASCEND_HOME=/usr/local/Ascend/ascend-toolkit/latest' >> /etc/profile

echo 'export PATH=$ASCEND_HOME/bin:$ASCEND_HOME/compiler/ccec_compiler/bin:$PATH' >> /etc/profile

echo 'export LD_LIBRARY_PATH=$ASCEND_HOME/lib64:$LD_LIBRARY_PATH' >> /etc/profile

source /etc/profile

步骤 3:配置 Docker 支持 NPU

需要安装 ascend-docker-runtime

3.1 安装 ascend-docker-runtime

bash

# 克隆官方仓库

git clone https://gitee.com/ascend/docker-runtime.git

cd docker-runtime

# 构建并安装

make

sudo make install

3.2 修改 /etc/docker/daemon.json

json

{

  "runtimes": {

    "ascend": {

      "path": "/usr/bin/ascend-docker-runtime"

    }

  },

  "default-runtime": "ascend"

}

3.3 重启 Docker

bash

sudo systemctl daemon-reload

sudo systemctl restart docker

3.4 测试容器内 NPU

bash

docker run -it --rm --device=/dev/davinci0 --privileged \

  ubuntu:22.04 npu-smi info

步骤 4:部署 Ascend Device Plugin 到 K8s

使 K8s 能识别并调度 NPU 资源(资源名:ascend.huawei.com/npu)

4.1 获取 Device Plugin YAML

GitHub/Gitee 仓库:
👉 https://gitee.com/oath4ai/ascend-device-plugin

4.2 应用 DaemonSet

yaml

# ascend-device-plugin.yaml

apiVersion: apps/v1

kind: DaemonSet

metadata:

  name: ascend-device-plugin-daemonset

  namespace: kube-system

spec:

  selector:

    matchLabels:

      name: ascend-device-plugin

  template:

    metadata:

      labels:

        name: ascend-device-plugin

    spec:

      tolerations:

      - key: node-role.kubernetes.io/master

        operator: Exists

        effect: NoSchedule

      containers:

      - image: swr.cn-south-1.myhuaweicloud.com/ascend/ascend-device-plugin:1.0.0

        name: ascend-device-plugin

        securityContext:

          privileged: true

        volumeMounts:

        - name: device-plugin

          mountPath: /var/lib/kubelet/device-plugins

        - name: dev

          mountPath: /dev

        - name: sys

          mountPath: /sys

      volumes:

      - name: device-plugin

        hostPath:

          path: /var/lib/kubelet/device-plugins

      - name: dev

        hostPath:

          path: /dev

      - name: sys

        hostPath:

          path: /sys

应用:

bash

kubectl apply -f ascend-device-plugin.yaml

4.3 验证插件运行

bash

kubectl get pods -n kube-system | grep ascend-device-plugin

# 应在每个 NPU 节点上运行一个 Pod

4.4 检查节点资源

bash

kubectl describe node <npu-node-name> | grep "ascend.huawei.com/npu"

# 输出:ascend.huawei.com/npu: 8

步骤 5:准备 LLM 镜像(以 vLLM-Ascend 为例)

5.1 拉取官方镜像(推荐)

bash

docker pull swr.cn-south-1.myhuaweicloud.com/ascend/vllm-ascend:0.9.0

或从 GitHub 构建:

bash

git clone https://github.com/vllm-project/vllm-ascend.git

cd vllm-ascend

docker build -t vllm-ascend:latest -f Dockerfile .

💡 镜像需内置 torch_npu、vllm-ascend 插件

步骤 6:编写 K8s 部署 YAML

yaml

# llm-deployment.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: qwen3-32b-ascend

spec:

  replicas: 1

  selector:

    matchLabels:

      app: qwen3-32b

  template:

    metadata:

      labels:

        app: qwen3-32b

    spec:

      containers:

      - name: vllm

        image: swr.cn-south-1.myhuaweicloud.com/ascend/vllm-ascend:0.9.0

        ports:

        - containerPort: 8000

        resources:

          limits:

            ascend.huawei.com/npu: 1  # 请求1张NPU卡

        command: ["python", "-m", "vllm.entrypoints.openai.api_server"]

        args:

          - "--model=/models/Qwen3-32B-A3B-Instruct"

          - "--tensor-parallel-size=1"

          - "--host=0.0.0.0"

          - "--port=8000"

        volumeMounts:

        - name: model-storage

          mountPath: /models

      volumes:

      - name: model-storage

        hostPath:

          path: /mnt/models  # 模型需提前下载到宿主机

---

apiVersion: v1

kind: Service

metadata:

  name: qwen3-32b-service

spec:

  type: LoadBalancer

  ports:

  - port: 8000

    targetPort: 8000

  selector:

app: qwen3-32b

部署:

bash

kubectl apply -f llm-deployment.yaml

步骤 7:验证服务

bash

# 查看 Pod 状态

kubectl get pods

# 查看日志

kubectl logs -f <pod-name>

# 测试 API(假设 Service IP 为 10.96.123.45)

curl http://10.96.123.45:8000/v1/completions \

  -H "Content-Type: application/json" \

  -d '{"model": "Qwen3-32B", "prompt": "Hello, how are you?", "max_tokens": 50}'

三、关键注意事项

  • 模型格式:需使用 A3B 格式(华为量化格式),可使用 mindie 转换
  • 网络策略:确保 K8s 节点间 HCCS 通信正常(用于多卡)
  • 权限问题:容器需 privileged: true 或正确配置 udev 规则
  • CANN 版本匹配:驱动、固件、CANN 必须版本兼容(参考华为兼容矩阵)

四、官方资源汇总

项目

链接

昇腾社区

https://www.hiascend.com/

CANN 下载

https://www.hiascend.com/software/cann

vLLM-Ascend

https://github.com/vllm-project/vllm-ascend

Device Plugin

https://gitee.com/oath4ai/ascend-device-plugin


如按上述步骤操作,即可在 K8s + 昇腾910B 上成功部署高性能 LLM 推理服务。建议先在单机 Docker 测试通过后再迁移到 K8s。

Logo

昇腾计算产业是基于昇腾系列(HUAWEI Ascend)处理器和基础软件构建的全栈 AI计算基础设施、行业应用及服务,https://devpress.csdn.net/organization/setting/general/146749包括昇腾系列处理器、系列硬件、CANN、AI计算框架、应用使能、开发工具链、管理运维工具、行业应用及服务等全产业链

更多推荐