t
ttl_1/SciCore-Mol
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

SciCore-Mol(OpenBMB/SciCore-Mol)

模型概述

SciCore-Mol 是由 OpenBMB 基于 Qwen3 架构开发的大型语言模型。该模型专为科学计算和化学相关任务设计,在分子理解和科学推理方面具备强大能力。

  • 架构:Qwen3ForCausalLM
  • 模型类型:密集型大语言模型(标准全注意力机制)
  • 参数量:约 82 亿
  • 上下文长度:40960 个token
  • 注意力机制:GQA(分组查询注意力)
  • 隐藏层大小:4096
  • 层数:36
  • 头数:32(8 个 KV 头)
  • 数据类型:bfloat16

昇腾 NPU 适配报告

分析摘要

项目状态
架构Qwen3ForCausalLM
vLLM 注册表已支持
注意力类型全注意力(GQA)
量化无
MoE否
MLA否

算子兼容性

Qwen3 所使用的所有算子均在昇腾 NPU 上得到原生支持:

  • PyTorch 原生算子:完全支持
  • 无需验证的 Triton 内核
  • 无 CUDA 特定算子
  • 无自定义内核

验证结果

虚拟权重测试

状态:通过

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [llm.py:391] Supported tasks: ['generate']
Output: '寓መ寓መ寓መ寓መ'
Token IDs: [101516, 148580, 101516, 148580, 101516, 148580, 101516, 148580]
SUCCESS: Dummy weight test passed

昇腾NPU真实权重测试

状态:通过

已使用来自ModelScope的真实权重成功加载模型:

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2562] Starting to load model /tmp/scicore-mol...
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:46<00:00, 11.56s/it]
INFO  [default_loader.py:384] Loading weights took 46.29 seconds
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [kv_cache_utils.py:1316] GPU KV cache size: 283,520 tokens
INFO  [core.py:281] init engine (profile, create kv cache, warmup model) took 5.32 seconds
INFO  [llm.py:391] Supported tasks: ['generate']

Prompt: "Hello, how are you?"
NPU Output: " I'm sorry for the late reply."
NPU Token IDs: [358, 2776, 14589, 369, 279, 3309, 9851, 13]

NPU 与 CPU 精度对比

测试环境:

  • NPU:昇腾 NPU(CANN 8.5.1)
  • CPU:ARM64 CPU(同一台机器,通过 VLLM_TARGET_DEVICE=cpu)
  • 数据类型:bfloat16
  • 温度:0(贪婪解码)
  • 最大 tokens:16

注意:测试环境中无 GPU。CPU 作为验证的精度基准。

多提示词对比结果:

#提示词NPU 输出CPU 输出匹配情况
1Hello, how are you?" I'm sorry for the late reply. I'm currently in the middle of a"" I'm sorry for the late reply. I'm currently in the middle of a"16/16
2What is the chemical formula for water?" The chemical formula for water is H₂O, which consists of two hydrogen atoms"" The chemical formula for water is H₂O, which consists of two hydrogen atoms"16/16
3Explain quantum mechanics in simple terms." Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"" Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"16/16
4The capital of France is" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."16/16
51 + 1 =" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"16/16

逐 Token 对比(提示词 1):

位置NPU TokenCPU Token匹配
1358358是
227762776是
31458914589是
4369369是
5279279是
633093309是
798519851是
81313是
9358358是
1027762776是
1150235023是
12304304是
13279279是
1461496149是
15315315是
16264264是

精度总结:

指标数值
测试提示词总数5
对比 Token 总数80(5 x 16)
匹配 Token 数80
不匹配 Token 数0
匹配率100%
首次分歧位置无

功能状态矩阵

功能状态
Text Generation已支持并验证
ACLGraph已支持
Chunked Prefill已支持
Prefix Caching已支持
Tensor Parallelism已支持(未测试)

操作手册

服务器启动

vllm serve /path/to/scicore-mol \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --max-num-seqs 16 \
  --trust-remote-code \
  --port 8000

验证

# Readiness check
curl -sf http://127.0.0.1:8000/v1/models

# Text inference
curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"scicore-mol","messages":[{"role":"user","content":"say hi"}],"temperature":0,"max_tokens":16}'

代码变更

无需任何变更。 Qwen3ForCausalLM架构在vLLM中已获得原生支持,可直接在昇腾NPU上运行,无需任何修改。

结论

SciCore-Mol与vLLM-Ascend完全兼容。无需适配代码——借助vLLM中现有的Qwen3实现,模型可直接开箱即用。精度验证显示,在5个多样化提示(共80个token)上,NPU与CPU输出的token级别一致性达到100%。