SciCore-Mol 是由 OpenBMB 基于 Qwen3 架构开发的大型语言模型。该模型专为科学计算和化学相关任务设计,在分子理解和科学推理方面具备强大能力。
| 项目 | 状态 |
|---|---|
| 架构 | Qwen3ForCausalLM |
| vLLM 注册表 | 已支持 |
| 注意力类型 | 全注意力(GQA) |
| 量化 | 无 |
| MoE | 否 |
| MLA | 否 |
Qwen3 所使用的所有算子均在昇腾 NPU 上得到原生支持:
状态:通过
INFO [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO [worker.py:357] Available KV cache memory: 38.94 GiB
INFO [llm.py:391] Supported tasks: ['generate']
Output: '寓መ寓መ寓መ寓መ'
Token IDs: [101516, 148580, 101516, 148580, 101516, 148580, 101516, 148580]
SUCCESS: Dummy weight test passed状态:通过
已使用来自ModelScope的真实权重成功加载模型:
INFO [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO [model_runner_v1.py:2562] Starting to load model /tmp/scicore-mol...
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:46<00:00, 11.56s/it]
INFO [default_loader.py:384] Loading weights took 46.29 seconds
INFO [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO [worker.py:357] Available KV cache memory: 38.94 GiB
INFO [kv_cache_utils.py:1316] GPU KV cache size: 283,520 tokens
INFO [core.py:281] init engine (profile, create kv cache, warmup model) took 5.32 seconds
INFO [llm.py:391] Supported tasks: ['generate']
Prompt: "Hello, how are you?"
NPU Output: " I'm sorry for the late reply."
NPU Token IDs: [358, 2776, 14589, 369, 279, 3309, 9851, 13]测试环境:
VLLM_TARGET_DEVICE=cpu)注意:测试环境中无 GPU。CPU 作为验证的精度基准。
多提示词对比结果:
| # | 提示词 | NPU 输出 | CPU 输出 | 匹配情况 |
|---|---|---|---|---|
| 1 | Hello, how are you? | " I'm sorry for the late reply. I'm currently in the middle of a" | " I'm sorry for the late reply. I'm currently in the middle of a" | 16/16 |
| 2 | What is the chemical formula for water? | " The chemical formula for water is H₂O, which consists of two hydrogen atoms" | " The chemical formula for water is H₂O, which consists of two hydrogen atoms" | 16/16 |
| 3 | Explain quantum mechanics in simple terms. | " Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles" | " Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles" | 16/16 |
| 4 | The capital of France is | " Paris. The capital of Italy is Rome. The capital of Spain is Madrid." | " Paris. The capital of Italy is Rome. The capital of Spain is Madrid." | 16/16 |
| 5 | 1 + 1 = | " 2\n\n# 1. Two Sum\n\nGiven an array of integers `" | " 2\n\n# 1. Two Sum\n\nGiven an array of integers `" | 16/16 |
逐 Token 对比(提示词 1):
| 位置 | NPU Token | CPU Token | 匹配 |
|---|---|---|---|
| 1 | 358 | 358 | 是 |
| 2 | 2776 | 2776 | 是 |
| 3 | 14589 | 14589 | 是 |
| 4 | 369 | 369 | 是 |
| 5 | 279 | 279 | 是 |
| 6 | 3309 | 3309 | 是 |
| 7 | 9851 | 9851 | 是 |
| 8 | 13 | 13 | 是 |
| 9 | 358 | 358 | 是 |
| 10 | 2776 | 2776 | 是 |
| 11 | 5023 | 5023 | 是 |
| 12 | 304 | 304 | 是 |
| 13 | 279 | 279 | 是 |
| 14 | 6149 | 6149 | 是 |
| 15 | 315 | 315 | 是 |
| 16 | 264 | 264 | 是 |
精度总结:
| 指标 | 数值 |
|---|---|
| 测试提示词总数 | 5 |
| 对比 Token 总数 | 80(5 x 16) |
| 匹配 Token 数 | 80 |
| 不匹配 Token 数 | 0 |
| 匹配率 | 100% |
| 首次分歧位置 | 无 |
| 功能 | 状态 |
|---|---|
| Text Generation | 已支持并验证 |
| ACLGraph | 已支持 |
| Chunked Prefill | 已支持 |
| Prefix Caching | 已支持 |
| Tensor Parallelism | 已支持(未测试) |
vllm serve /path/to/scicore-mol \
--dtype bfloat16 \
--max-model-len 4096 \
--max-num-seqs 16 \
--trust-remote-code \
--port 8000# Readiness check
curl -sf http://127.0.0.1:8000/v1/models
# Text inference
curl -s http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"scicore-mol","messages":[{"role":"user","content":"say hi"}],"temperature":0,"max_tokens":16}'无需任何变更。 Qwen3ForCausalLM架构在vLLM中已获得原生支持,可直接在昇腾NPU上运行,无需任何修改。
SciCore-Mol与vLLM-Ascend完全兼容。无需适配代码——借助vLLM中现有的Qwen3实现,模型可直接开箱即用。精度验证显示,在5个多样化提示(共80个token)上,NPU与CPU输出的token级别一致性达到100%。