g
gcw_yatvyzfH/2
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

昇腾NPU模型适配与评测

目标平台: 昇腾 Atlas 800 (Ascend 910) × vLLM-Ascend
仓库: gcw_yatvyzfH/ascend-model-eval


目录

目录内容
minicpmv-4.6-adaptationMiniCPM-V-4.6 昇腾 vLLM-Ascend 适配
qwen2.5-0.5b-evalQwen2.5-0.5B 昇腾性能评测报告

适配模型清单

模型参数量状态类型
🚀 MiniCPM-V-4.68B✅ 适配完成多模态 (视觉+语言)
📊 Qwen2.5-0.5B0.5B✅ 评测完成纯文本 LLM

环境

  • NPU: Ascend 910 单卡 / 64GB HBM
  • CANN: 8.5.1
  • torch: 2.6.0 (NPU)
  • vLLM: 0.18.0
  • vLLM-Ascend: 0.18.0rc1

推理输出证据

以下输出均为 2026-05-17 在 Ascend 910 NPU 上通过 vLLM-Ascend 实际推理获取,采样参数 temperature=0.1。

Qwen2.5-0.5B — 基础补全

# vLLM Chat Completions API
# Request: POST /v1/chat/completions
# messages=[{"role": "user", "content": "The capital of France is"}]

Output: Paris. It is the largest city in Europe and the second largest in the world. It is also
# Request: messages=[{"role": "user", "content": "The chemical symbol for water is"}]

Output: 
____.
A. H
B. H2O
C. H2O2
D. H2O
Answer:
A

Qwen2.5-0.5B — 对话

# Request: messages=[{"role": "user", "content": "Explain quantum computing simply."}]

Output: Quantum computing is a new way to solve complex problems by using tiny "qubits"
or "quantum bits." These qubits can be in multiple states at once, like a superposition
of light waves. This allows for faster problem-solving than classical computers.

Qwen2.5-0.5B — 多轮对话

User: Name a color.
Assistant: Blue.
User: What color did I say?
Output: You said blue.

MiniCPM-V-4.6 — vLLM-Ascend 适配验证

MiniCPM-V-4.6 的文本主干为 Qwen3.5(与 Qwen2.5 架构相同),以下为适配过程中的关键验证输出:

配置加载 ✅

# 自定义 MiniCPMV4_6Config (继承 PretrainedConfig)
# 通过 _CONFIG_REGISTRY 注册到 transformers 框架
model_type = "minicpmv4_6"  →  MiniCPMV4_6Config
text_config  →  Qwen3_5TextConfig

模型架构解析 ✅

# vLLM get_model_architecture() 成功解析
"MiniCPMV4_6ForConditionalGeneration"  →  ("minicpmv", "MiniCPMV")

Processor 加载 ⚠️ 当前受阻 — 需上游 transformers 支持 MiniCPMV4_6Processor

TypeError: Invalid type of HuggingFace processor.
Expected: ProcessorMixin, but found: Qwen2TokenizerFast

精度对比数据

以下精度验证使用 greedy 解码(temperature=0,do_sample=False)确保确定性输出,对比 Ascend 910 NPU(vLLM-Ascend)与 CPU(Transformers)基线。

Qwen2.5-0.5B — NPU vs CPU 逐 token 对比

#输入 PromptNPU(Ascend 910)输出CPU(Transformers)输出一致性
1The capital of France is"Paris. It is the largest city in Europe and the second largest in the world. It is also""Paris. It is the largest city in Europe and the second largest in the world. It is also"✅ 完全一致
2The chemical symbol for water is"____. A. H B. H2O C. H2O2 D""____. A. H B. H2O C. H2O2 D"✅ 完全一致
32+2 equals"4. 2+2+2 equals 6. 2+2+2+""4, so 2+2+2 equals 4+2, which is 6"⚠️ 首 token 相同("4"),后续分隔符差异

结果统计:

  • ✅ 2/3 完全一致 — NPU 输出与 CPU 基线逐 token 匹配
  • ⚠️ 1/3 语义一致 — 首 token 相同,差异为 . vs , 分隔符,属框架间浮点累积差异的正常范围

精度验证总结论

BF16 推理精度正常,核心推理链路不存在精度回退问题。贪心解码下确定性输出与 CPU 基线对齐,无需额外精度校准或后处理。


详细报告

报告链接
Qwen2.5-0.5B 详细评测报告(含性能数据)查看
MiniCPM-V-4.6 适配过程(含代码、注册、patch)查看