当前位置：首页 > news >正文

8卡910B4-32G测试Qwen2.5-VL-72B-instruct模型兼容性

news 2025/6/5 16:41:26

测试结论：

序号	模型名称	模型大小	显存占用	8卡910B4-32G（256G）	备注
				是否支持？
1	Qwen2.5-VL-32B-Instruct	64G	188G	支持	You are using a model of type qwen2_5_vl to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
				quay.io/ascend/vllm-ascend:v0.7.3	Daemon start success!
				（https://vllm-ascend.readthedocs.io/en/stable/tutorials/single_npu_multimodal.html）	32B的图像识别效果不满足要求,务必要继续测试，肯定支持
				swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-800I-A2-py311-openeuler24.03-lts （https://modelers.cn/models/Models_Ecosystem/Qwen2.5-VL-32B-Instruct）
2	Qwen2.5-VL-72B-Instruct	137G	vllm框架下占用：208G	MindIE不支持(单卡分配提示显存不足)	MindIE:
				swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-800I-A2-py311-openeuler24.03-lts	4或8卡910B4-64G理论上可支持。
					RuntimeError: NPU out of memory. Tried to allocate 602.00 MiB (NPU 5; 29.50 GiB total capacity; 28.10 GiB already allocated; 28.10 GiB current active; 441.27 MiB free; 28.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.
				VLLM支持	vllm：
				quay.io/ascend/vllm-ascend:v0.7.3	输出不了json格式：
				（乱回答：quay.io/ascend/vllm-ascend:v0.8.5rc1）	WARNING 05-28 01:37:36 __init__.py:48] xgrammar is only supported on x86 CPUs. Falling back to use outlines instead.
					WARNING 05-28 01:37:36 __init__.py:84] xgrammar module cannot be imported successfully. Falling back to use outlines instead.
					WARNING 05-28 01:37:36 __init__.py:91] outlines does not support json_object. Falling back to use xgrammar instead.
					INFO: 219.141.177.34:2272 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
					ERROR: Exception in ASGI application
3	Qwen2.5-VL-72B-Instruct-AWQ	41G	/	都不支持	MindIE:raise AssertionError(f"weight {tensor_name} does not exist")
					>>> Exception:weight model.layers.0.self_attn.q_proj.weight does not exist
					vllm:ERROR 05-28 05:49:02 engine.py:400] KeyError: 'visual.blocks.0.attn.qkv.weight'

测试命令：

推理框架镜像	启动命令	测试命令
quay.io/ascend/vllm-ascend:v0.7.3	启动容器： docker run \ --name vllm-ascend \ --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -p 8000:8000 \ -itd quay.io/ascend/vllm-ascend:v0.7.3 进入容器： docker exec -it vllm-ascend bash 启动vllm服务： vllm serve Qwen/Qwen2.5-VL-32B-Instruct --tokenizer_mode="auto" --tokenizer_mode="auto" --dtype=bfloat16 --max_num_seqs=256 --tensor_parallel_size=8 --gpu-memory-utilization=0.98 --max-model-len=32768 & 停vllm服务： ps -ef\|grep python3\|grep -v grep\|awk '{print $2}'\|xargs kill -9 停容器： docker stop vllm-ascend docker rm vllm-ascend	curl http://110.165.26.90:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen/Qwen2.5-VL-72B-Instruct", "messages": [ {"role": "system", "content": "你是专业图片识别助手"}, {"role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://i-blog.csdnimg.cn/img_convert/4f479548fee019db69efeb29cacc1a94.jpeg"}}, {"type": "text", "text": "请描述图片内容？"} ]} ] }'
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-800I-A2-py311-openeuler24.03-lts	启动容器： docker run --name m1 --privileged=true -it -d --net=host --shm-size=200g --device=/dev/davinci0 --device=/dev/davinci1 --device=/dev/davinci2 --device=/dev/davinci3 --device=/dev/davinci4 --device=/dev/davinci5 --device=/dev/davinci6 --device=/dev/davinci7 --device=/dev/davinci_manager --device=/dev/hisi_hdc --device=/dev/devmm_svm --entrypoint=bash -w /models -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/sbin:/usr/local/sbin -v /root/.cache/modelscope/hub/models:/models -v /tmp:/tmp -v /etc/hccn.conf:/etc/hccn.conf -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie:1.0.0-800I-A2-py311-openeuler24.03-lts 进入容器： docker exec -it m1 bash 启动服务： cd /usr/local/Ascend/mindie/latest/mindie-service nohup ./bin/mindieservice_daemon > ./q.log 2>&1 & tail -f ./q.log 停服务： ps -ef\|grep mindieservice_daemon\|grep -v grep\|awk '{print $2}'\|xargs kill -9 停容器： docker stop m1 docker rm m1	curl 110.165.26.90:28858/v1/chat/completions -d ' { "model": "qwen2_vl", "messages": [{ "role": "user", "content": [ {"type": "image_url", "image_url": "https://i-blog.csdnimg.cn/img_convert/4f479548fee019db69efeb29cacc1a94.jpeg"}, {"type": "text", "text": "请描述图片内容"} ] }], "max_tokens": 512, "do_sample": true, "repetition_penalty": 1.00, "temperature": 0.01, "top_p": 0.001, "top_k": 1 }'

系统环境：

参考链接：

https://vllm-ascend.readthedocs.io/en/latest/quick_start.html
https://modelers.cn/models/Models_Ecosystem/Qwen2.5-VL-32B-Instruct

http://www.xdnf.cn/news/686755.html

相关文章：

（九）深度学习---自然语言处理基础

设计模式25——中介者模式

如何在 CentOS / RHEL 上修改 MySQL 默认数据目录？

【前端】【css预处理器】Sass与Less全面对比与构建对应知识体系

欧拉角转为旋转矩阵

X-plore v4.43.05 强大的安卓文件管理器-MOD解锁高级版手机平板/电视TV通用

欢乐熊大话蓝牙知识12:用 BLE 打造家庭 IoT 网络的三种方式

基于深度学习的三维图像生成项目开发方案

论文阅读笔记——In-Context Edit

macOS 风格番茄计时器：设计与实现详解

Spring Boot项目中实现单点登录(SSO)完整指南

Opera Neon发布该公司首款“AI代理”浏览器

性能指标 P99（99th Percentile）

Vue内置指令与自定义指令

模型微调参数入门：核心概念与全局视角

SpringBoot实战：高效获取视频资源

浏览器之禁止打开控制台【F12】

Linux中基础IO（下）

怎么快速判断一款MCU能否跑RTOS系统

FeignClient发送https请求时的证书验证原理分析

初识 Pytest：测试世界的智能助手

互联网大厂Java求职面试实战：Spring Boot微服务架构及Kafka消息处理示例解析

《异常链与统一异常处理机制设计：让 Java 项目更清晰可靠》

AI 赋能数据可视化：漏斗图制作的创新攻略

ABAQUS三维功能梯度多孔结构材料FGM轴压模拟

Spring AI 整合聊天模型之智谱AI

CloudCompare|点测量功能源码分析

如何手搓一个查询天气的mcp server

嵌入式学习笔记 - 新版Keil软件模拟时钟Xtal灰色不可更改的问题

Spring AI 官方文档 AIGC入门到实战 (1) 认识Spring AI