基于昇腾快速上手Qwen3-VL-30B-A3B模型
此前昇腾一直同步支持Qwen系列模型,此次Qwen3-VL-30B-A3B-Thinking和Qwen3-VL-30B-A3B-Instruct模型一经发布开源,即在LLaMA Factory和vLLM中开箱即用,实现模型的0Day适配。Qwen3-VL是迄今为止Qwen系列中最强大的视觉语言模型,此次开源的Qwen3-VL-30B-A3B-Thinking和Qwen3-VL-30B-A3B-In
2025年10月4日,Qwen3系列模型发布并开源其新一代多模态模型:Qwen3-VL-30B-A3B-Thinking和Qwen3-VL-30B-A3B-Instruct。Qwen3-VL是一个多模态视觉语言模型系列,基于其前代产品,Qwen3-VL 在视觉理解方面实现了显著提升,同时保持了强大的纯文本处理能力。
此前昇腾一直同步支持Qwen系列模型,此次Qwen3-VL-30B-A3B-Thinking和Qwen3-VL-30B-A3B-Instruct模型一经发布开源,即在LLaMA Factory和vLLM中开箱即用,实现模型的0Day适配。
同时,模型权重已上线魔乐社区,欢迎开发者下载体验!
Qwen3-VL-30B-A3B-Thinking/Instruct模型介绍
Qwen3-VL是迄今为止Qwen系列中最强大的视觉语言模型,此次开源的Qwen3-VL-30B-A3B-Thinking和Qwen3-VL-30B-A3B-Instruct模型体积更小,性能依旧强劲,并集Qwen3-VL全部能力于一身!
模型仅需激活30亿参数,即可在STEM、视觉问答(VQA)、光学字符识别(OCR)、视频理解、智能体(Agent)任务等多个领域媲美GPT-5-Mini和Claude4-Sonnet,甚至表现更优。
基于昇腾快速上手Qwen3-VL-30B-A3B模型
本教程将手把手指导您完成Qwen3-VL-30B-A3B-Instruct模型的训练、推理部署流程。我们提供了详尽的步骤说明和最佳实践,确保您能够快速上手。
基于LLaMA Factory训练上手指导
环境配置
开发环境配置
|
软件 |
版本 |
|
Python |
3.10.18 |
|
PyTorch |
2.5.1 |
|
Transformers |
main |
|
LLaMA Factory |
main |
|
CANN |
8.2.RC |
安装Ascend Cann Toolkit和kernels
可参考安装指南进行安装。
https://www.hiascend.com/document/detail/zh/canncommercial/82RC1/softwareinst/instg/instg_0000.html?Mode=PmIns&InstallType=local&OS=Ubuntu&Software=cannToolKit
安装LLaMA Factory
执行以下指令即可安装LLaMA Factory及其依赖,此步骤会自动安装torch及torch_npu。
git clone https://github.com/hiyouga/LLaMA-Factory.gitcd LLaMA-Factorypip install-e".[torch-npu,metrics]"--no-build-isolationcd ..git clone https://github.com/huggingface/transformers.gitcd transformerspip install -e .pip install qwen_vl_utils
模型微调
数据集准备
微调支持alpaca和sharegpt两种主流格式,使用json文件存储,这里使用llava-en-zh-2k数据集作为演示。
训练数据按照格式准备好后,需要编写data/dataset_info.json文件,用于说明数据具体的情况,比如,如下配置说明了数据集名称为llava_2k_en,数据集的路径为llava-en-zh-2k/en/train-00000-of-00001.parquet,数据集格式为sharegpt,并将数据集的字段名称与标准的名称进行了映射。
{"llava_2k_en": {"file_name": "llava-en-zh-2k/en/train-00000-of-00001.parquet","formatting": "sharegpt","columns": {"messages": "messages","images": "images"},"tags": {"role_tag": "role","content_tag": "content","user_tag": "user","assistant_tag": "assistant"}},"llava_2k_zh": {"file_name": "llava-en-zh-2k/zh/train-00000-of-00001.parquet","formatting": "sharegpt","columns": {"messages": "messages","images": "images"},"tags": {"role_tag": "role","content_tag": "content","user_tag": "user","assistant_tag": "assistant"}}}
训练配置
LLaMA Factory套件提供了低代码配置化的方式启动训练流程,只需要编写一个train_sample.yaml配置文件,定义训练过程中需要的不同参数即可,这里以Qwen3-VL-30B-A3B-Instruct模型的lora微调为例子进行说明,train_sample.yaml配置文件如下所示:
### modelmodel_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instructimage_max_pixels: 262144video_max_pixels: 16384trust_remote_code: true### methodstage: sftdo_train: truefinetuning_type: loralora_rank: 8lora_target: allflash_attn: disableddisable_gradient_checkpointing: false### datasetdataset: llava_2k_en, llava_2k_zhtemplate: qwen3_vl_nothinkcutoff_len: 1024overwrite_cache: truepreprocessing_num_workers: 16dataloader_num_workers: 4enable_thinking: false### outputoutput_dir: saves/qwen3_vl-30b/lora/sftlogging_steps: 1save_steps: 50000plot_loss: trueoverwrite_output_dir: truesave_only_model: falsereport_to: none### trainper_device_train_batch_size: 8gradient_accumulation_steps: 1learning_rate: 1.0e-4max_steps: 500lr_scheduler_type: cosinewarmup_ratio: 0.1bf16: trueddp_timeout: 180000000resume_from_checkpoint: nullseed: 1234
关键参数说明:
配置了原始模型路径,微调方法为lora,模板设置为qwen3_vl_nothink,数据引用了数据集准备小节中定义的配置,设置每张卡的batch_size为8,并设置了模型保存的路径为saves/qwen3_vl-30b/lora/sft。
完整的参数说明可参考官方框架文档中的参数介绍。
训练启动
(1)根据NPU设备信息按需修改
examples/accelerate/fsdp_config.yaml配置文件中的num_processes参数,完整fsdp_config.yaml配置文件如下:
copute_environment: LOCAL_MACHINEdebug: falsedistributed_type: FSDPdowncast_bf16: 'no'fsdp_config:fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAPfsdp_backward_prefetch: BACKWARD_PREfsdp_forward_prefetch: falsefsdp_cpu_ram_efficient_loading: truefsdp_offload_params: falsefsdp_sharding_strategy: FULL_SHARDfsdp_state_dict_type: FULL_STATE_DICTfsdp_sync_module_states: truefsdp_use_orig_params: truemachine_rank: 0main_training_function: mainmixed_precision: bf16 # or fp16num_machines: 1 # the number of nodesnum_processes: 16 # the number of NPUs in all nodesrdzv_backend: staticsame_network: truetpu_env: []tpu_use_cluster: falsetpu_use_sudo: falseuse_cpu: false
(2)传入训练配置小节中的yaml配置文件路径,执行以下命令即可使用LLaMA Factory+FSDP框架训练Qwen3-VL-30B-A3B-Instruct模型。
accelerate launch \--config_file examples/accelerate/fsdp_config.yaml \src/train.py train_sample.yaml
训练过程中会输出日志,包含loss等,等待训练完成后,即可在配置中的output_dir获取到微调后的lora模型权重。
模型合并编写一个export_sample.yaml配置文件,定义模型权重合并过程中需要的不同参数。完整的参数说明可参考官方框架文档中的模型导出。
### modelmodel_name_or_path: Qwen/Qwen3-VL-30B-A3B-Instructadapter_name_or_path: saves/qwen3_vl-30b/lora/sfttemplate: qwen3_vlfinetuning_type: lora### exportexport_dir: saves/qwen3_vl-30b/exportexport_device: cpuexport_legacy_format: false执行脚本即可开始合并模型权重。llamafactory-cli export export_sample.yaml
微调后模型推理
将以下脚本中的模型路径及图片路径更换为实际路径即可运行微调后的模型推理。
from transformers import Qwen3VLForConditionalGeneration, AutoProcessorfrom qwen_vl_utils import process_vision_infomodel = Qwen3VLForConditionalGeneration.from_pretrained("saves/qwen3_vl-30b/export", torch_dtype="auto", device_map="npu:0")processor = AutoProcessor.from_pretrained("saves/qwen3_vl-30b/export")messages = [{"role": "user","content": [{"type": "image","image": "test_picture.jpg",},{"type": "text", "text": "Describe this image."},],}]# Preparation for inferencetext = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True,)image_inputs, video_inputs = process_vision_info(messages)inputs = processor(text=[text],images=image_inputs,videos=video_inputs,padding=True,return_tensors="pt",)inputs = inputs.to(model.device)# Inference: Generation of the outputgenerated_ids = model.generate(**inputs, max_new_tokens=128)generated_ids_trimmed = [out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)]output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False)print(output_text)
微调前推理结果如下:
['This is a dynamic, action-oriented photograph capturing two football players in the midst of a game.\n\n- **The Players**: The image features two male athletes wearing matching red and white uniforms with a prominent logo on their chests.\n - The player in the foreground is captured mid-celebration or reaction, with his mouth open as if shouting or cheering, and his right arm raised high.\n - The second player stands slightly behind him, looking towards the first player with an expression that could be interpreted as surprise, excitement, or encouragement.\n\n- **Setting and Atmosphere**: The scene takes place on a football field, indicated by the green']
微调后推理
['The image features two football players wearing red and white uniforms, standing on a field with excitement and enthusiasm. One player is pointing towards the crowd, while his teammate stands next to him, possibly cheering or celebrating with the first player.\n\nThere is a large audience present in the scene, as people can be seen in various positions around the players, such as sitting or standing. The crowd adds to the energetic atmosphere of the football match.']
基于vLLM推理上手指导
这里我们采用 vLLM Ascend 镜像的方式,在昇腾上运行Qwen3-VL-30B-A3B-Instruct模型。
NPU离线推理
启动容器
export IMAGE=quay.io/ascend/vllm-ascend:v0.11.0rc0docker run --rm \--name vllm-ascend \--device /dev/davinci0 \--device /dev/davinci_manager \--device /dev/devmm_svm \--device /dev/hisi_hdc \-v /usr/local/dcmi:/usr/local/dcmi \-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \-v /etc/ascend_install.info:/etc/ascend_install.info \-v /root/.cache:/root/.cache \-p 8000:8000 \-it $IMAGE bash
设置环境变量
# 从ModelScope加载模型来加速下载export VLLM_USE_MODELSCOPE=True# 设置`max_split_size_mb`来减少内存碎片export PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256
运行脚本
pip install qwen_vl_utils --extra-index-url https://download.pytorch.org/whl/cpu/
import gcimport torchfrom transformers import AutoProcessorfrom vllm import LLM, SamplingParamsfrom vllm.distributed.parallel_state import (destroy_distributed_environment,destroy_model_parallel)from qwen_vl_utils import process_vision_infoMODEL_PATH = "Qwen/Qwen3-VL-30B-A3B-Instruct"def clean_up():destroy_model_parallel()destroy_distributed_environment()gc.collect()torch.npu.empty_cache()llm = LLM(model=MODEL_PATH,tensor_parallel_size=2,distributed_executor_backend="mp",enable_expert_parallel=True,max_model_len=16384,limit_mm_per_prompt={"image": 10},)sampling_params = SamplingParams(max_tokens=512)image_messages = [{"role": "system", "content": "You are a helpful assistant."},{"role": "user","content": [{"type": "image","image": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png","min_pixels": 224 * 224,"max_pixels": 1280 * 28 * 28,},{"type": "text", "text": "Please provide a detailed description of this image"},],},]messages = image_messagesprocessor = AutoProcessor.from_pretrained(MODEL_PATH)prompt = processor.apply_chat_template(messages,tokenize=False,add_generation_prompt=True,)image_inputs, _, _ = process_vision_info(messages, return_video_kwargs=True)mm_data = {}if image_inputs is not None:mm_data["image"] = image_inputsllm_inputs = {"prompt": prompt,"multi_modal_data": mm_data,}outputs = llm.generate([llm_inputs], sampling_params=sampling_params)generated_text = outputs[0].outputs[0].textprint(generated_text)del llmclean_up()
若成功运行此脚本,你将看到如下所示的信息:
This image displays a logo, likely for a company or organization, featuring a combination of a graphic icon and text.- **The Icon**: On the left side, there is a geometric logo composed of several interlocking triangular shapes, forming a three-dimensional, hexagonal-like structure. The design has a light blue or cyan outline against a white background.- **The Text**: To the right of the icon, the name is presented in two separate lines.- The top line reads "TONGYI" in a blue, sans-serif font.- The bottom line reads "Qwen" in a black, sans-serif font. The letter "Q" is capitalized.- **Layout**: The overall design is simple and modern, with the blue and black elements contrasting with the white background.
多NPU在线服务
运行Docker容器以在多个NPU上启动vLLM服务:
# Update the vllm-ascend imageexport IMAGE=quay.io/ascend/vllm-ascend:v0.11.0rc0docker run --rm \--name vllm-ascend \--device /dev/davinci0 \--device /dev/davinci_manager \--device /dev/devmm_svm \--device /dev/hisi_hdc \-v /usr/local/dcmi:/usr/local/dcmi \-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \-v /etc/ascend_install.info:/etc/ascend_install.info \-v /root/.cache:/root/.cache \-p 8000:8000 \-e VLLM_USE_MODELSCOPE=True \-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \-it $IMAGE \vllm serve Qwen/Qwen3-VL-30B-A3B-Instruct \--served-model-name qwen3vl \--dtype bfloat16 \--max_model_len 16384 \--max-num-batched-tokens 16384 \--tensor-parallel-size 2 \--enable_expert_parallel
若服务成功启动,将看到如下所示的信息:
INFO: Started server process [44610]INFO: Waiting for application startup.INFO: Application startup complete.
一旦服务器启动,可以使用输入提示词来查询模型:
curl http://localhost:8000/v1/chat/completions \-H "Content-Type: application/json" \-d '{"model": "qwen3vl","messages": [{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": [{"type": "image_url", "image_url": {"url": "https://modelscope.oss-cn-beijing.aliyuncs.com/resource/qwen.png"}},{"type": "text", "text": "What is the text in the illustrate?"}]}]}'
若服务侧正常响应,客户端发送侧会打印如下信息:
INFO: 127.0.0.1:45880 - "POST /v1/chat/completions HTTP/1.1" 200 OKINFO 09-23 19:44:05 [loggers.py:123] Engine 000: Avg prompt throughput: 6.5 tokens/s, Avg generation throughput: 0.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%INFO 09-23 19:44:15 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
vllm服务侧打印如下日志:
{"id":"chatcmpl-7d35682041384faeb147660c93bd13f8","object":"chat.completion","created":1758627832,"model":"qwen3vl","choices":[{"index":0,"message":{"role":"assistant","content":"TONGYI Qwen","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":65,"total_tokens":72,"completion_tokens":7,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}
昇腾计算产业是基于昇腾系列(HUAWEI Ascend)处理器和基础软件构建的全栈 AI计算基础设施、行业应用及服务,https://devpress.csdn.net/organization/setting/general/146749包括昇腾系列处理器、系列硬件、CANN、AI计算框架、应用使能、开发工具链、管理运维工具、行业应用及服务等全产业链
更多推荐

所有评论(0)