SciCore-Mol（OpenBMB/SciCore-Mol）

模型概述

SciCore-Mol 是由 OpenBMB 基于 Qwen3 架构开发的大型语言模型。该模型专为科学计算和化学相关任务设计，在分子理解和科学推理方面具备强大能力。

架构：Qwen3ForCausalLM
模型类型：密集型大语言模型（标准全注意力机制）
参数量：约 82 亿
上下文长度：40960 个token
注意力机制：GQA（分组查询注意力）
隐藏层大小：4096
层数：36
头数：32（8 个 KV 头）
数据类型：bfloat16

昇腾 NPU 适配报告

分析摘要

项目	状态
架构	Qwen3ForCausalLM
vLLM 注册表	已支持
注意力类型	全注意力（GQA）
量化	无
MoE	否
MLA	否

算子兼容性

Qwen3 所使用的所有算子均在昇腾 NPU 上得到原生支持：

PyTorch 原生算子：完全支持
无需验证的 Triton 内核
无 CUDA 特定算子
无自定义内核

验证结果

虚拟权重测试

状态：通过

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [llm.py:391] Supported tasks: ['generate']
Output: '寓መ寓መ寓መ寓መ'
Token IDs: [101516, 148580, 101516, 148580, 101516, 148580, 101516, 148580]
SUCCESS: Dummy weight test passed

昇腾NPU真实权重测试

状态：通过

已使用来自ModelScope的真实权重成功加载模型：

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2562] Starting to load model /tmp/scicore-mol...
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:46<00:00, 11.56s/it]
INFO  [default_loader.py:384] Loading weights took 46.29 seconds
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [kv_cache_utils.py:1316] GPU KV cache size: 283,520 tokens
INFO  [core.py:281] init engine (profile, create kv cache, warmup model) took 5.32 seconds
INFO  [llm.py:391] Supported tasks: ['generate']

Prompt: "Hello, how are you?"
NPU Output: " I'm sorry for the late reply."
NPU Token IDs: [358, 2776, 14589, 369, 279, 3309, 9851, 13]

NPU 与 CPU 精度对比

测试环境：

NPU：昇腾 NPU（CANN 8.5.1）
CPU：ARM64 CPU（同一台机器，通过 VLLM_TARGET_DEVICE=cpu）
数据类型：bfloat16
温度：0（贪婪解码）
最大 tokens：16

注意：测试环境中无 GPU。CPU 作为验证的精度基准。

多提示词对比结果：

#	提示词	NPU 输出	CPU 输出	匹配情况
1	Hello, how are you?	" I'm sorry for the late reply. I'm currently in the middle of a"	" I'm sorry for the late reply. I'm currently in the middle of a"	16/16
2	What is the chemical formula for water?	" The chemical formula for water is H₂O, which consists of two hydrogen atoms"	" The chemical formula for water is H₂O, which consists of two hydrogen atoms"	16/16
3	Explain quantum mechanics in simple terms.	" Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"	" Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"	16/16
4	The capital of France is	" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."	" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."	16/16
5	1 + 1 =	" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"	" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"	16/16

逐 Token 对比（提示词 1）：

位置	NPU Token	CPU Token	匹配
1	358	358	是
2	2776	2776	是
3	14589	14589	是
4	369	369	是
5	279	279	是
6	3309	3309	是
7	9851	9851	是
8	13	13	是
9	358	358	是
10	2776	2776	是
11	5023	5023	是
12	304	304	是
13	279	279	是
14	6149	6149	是
15	315	315	是
16	264	264	是

精度总结：

指标	数值
测试提示词总数	5
对比 Token 总数	80（5 x 16）
匹配 Token 数	80
不匹配 Token 数	0
匹配率	100%
首次分歧位置	无

功能状态矩阵

功能	状态
Text Generation	已支持并验证
ACLGraph	已支持
Chunked Prefill	已支持
Prefix Caching	已支持
Tensor Parallelism	已支持（未测试）

操作手册

服务器启动

vllm serve /path/to/scicore-mol \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --max-num-seqs 16 \
  --trust-remote-code \
  --port 8000

验证

# Readiness check
curl -sf http://127.0.0.1:8000/v1/models

# Text inference
curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"scicore-mol","messages":[{"role":"user","content":"say hi"}],"temperature":0,"max_tokens":16}'

代码变更

无需任何变更。 Qwen3ForCausalLM架构在vLLM中已获得原生支持，可直接在昇腾NPU上运行，无需任何修改。

结论

SciCore-Mol与vLLM-Ascend完全兼容。无需适配代码——借助vLLM中现有的Qwen3实现，模型可直接开箱即用。精度验证显示，在5个多样化提示（共80个token）上，NPU与CPU输出的token级别一致性达到100%。

SciCore-Mol（OpenBMB/SciCore-Mol）

模型概述

SciCore-Mol 是由 OpenBMB 基于 Qwen3 架构开发的大型语言模型。该模型专为科学计算和化学相关任务设计，在分子理解和科学推理方面具备强大能力。

架构：Qwen3ForCausalLM
模型类型：密集型大语言模型（标准全注意力机制）
参数量：约 82 亿
上下文长度：40960 个token
注意力机制：GQA（分组查询注意力）
隐藏层大小：4096
层数：36
头数：32（8 个 KV 头）
数据类型：bfloat16

昇腾 NPU 适配报告

分析摘要

项目	状态
架构	Qwen3ForCausalLM
vLLM 注册表	已支持
注意力类型	全注意力（GQA）
量化	无
MoE	否
MLA	否

算子兼容性

Qwen3 所使用的所有算子均在昇腾 NPU 上得到原生支持：

PyTorch 原生算子：完全支持
无需验证的 Triton 内核
无 CUDA 特定算子
无自定义内核

验证结果

虚拟权重测试

状态：通过

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [llm.py:391] Supported tasks: ['generate']
Output: '寓መ寓መ寓መ寓መ'
Token IDs: [101516, 148580, 101516, 148580, 101516, 148580, 101516, 148580]
SUCCESS: Dummy weight test passed

昇腾NPU真实权重测试

状态：通过

已使用来自ModelScope的真实权重成功加载模型：

INFO  [model.py:533] Resolved architecture: Qwen3ForCausalLM
INFO  [model_runner_v1.py:2562] Starting to load model /tmp/scicore-mol...
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:46<00:00, 11.56s/it]
INFO  [default_loader.py:384] Loading weights took 46.29 seconds
INFO  [model_runner_v1.py:2589] Loading model weights took 15.2820 GB
INFO  [worker.py:357] Available KV cache memory: 38.94 GiB
INFO  [kv_cache_utils.py:1316] GPU KV cache size: 283,520 tokens
INFO  [core.py:281] init engine (profile, create kv cache, warmup model) took 5.32 seconds
INFO  [llm.py:391] Supported tasks: ['generate']

Prompt: "Hello, how are you?"
NPU Output: " I'm sorry for the late reply."
NPU Token IDs: [358, 2776, 14589, 369, 279, 3309, 9851, 13]

NPU 与 CPU 精度对比

测试环境：

NPU：昇腾 NPU（CANN 8.5.1）
CPU：ARM64 CPU（同一台机器，通过 VLLM_TARGET_DEVICE=cpu）
数据类型：bfloat16
温度：0（贪婪解码）
最大 tokens：16

注意：测试环境中无 GPU。CPU 作为验证的精度基准。

多提示词对比结果：

#	提示词	NPU 输出	CPU 输出	匹配情况
1	Hello, how are you?	" I'm sorry for the late reply. I'm currently in the middle of a"	" I'm sorry for the late reply. I'm currently in the middle of a"	16/16
2	What is the chemical formula for water?	" The chemical formula for water is H₂O, which consists of two hydrogen atoms"	" The chemical formula for water is H₂O, which consists of two hydrogen atoms"	16/16
3	Explain quantum mechanics in simple terms.	" Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"	" Quantum mechanics is a fundamental theory in physics that describes the behavior of the smallest particles"	16/16
4	The capital of France is	" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."	" Paris. The capital of Italy is Rome. The capital of Spain is Madrid."	16/16
5	1 + 1 =	" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"	" 2\n\n# 1. Two Sum\n\nGiven an array of integers `"	16/16

逐 Token 对比（提示词 1）：

位置	NPU Token	CPU Token	匹配
1	358	358	是
2	2776	2776	是
3	14589	14589	是
4	369	369	是
5	279	279	是
6	3309	3309	是
7	9851	9851	是
8	13	13	是
9	358	358	是
10	2776	2776	是
11	5023	5023	是
12	304	304	是
13	279	279	是
14	6149	6149	是
15	315	315	是
16	264	264	是

精度总结：

指标	数值
测试提示词总数	5
对比 Token 总数	80（5 x 16）
匹配 Token 数	80
不匹配 Token 数	0
匹配率	100%
首次分歧位置	无

功能状态矩阵

功能	状态
Text Generation	已支持并验证
ACLGraph	已支持
Chunked Prefill	已支持
Prefix Caching	已支持
Tensor Parallelism	已支持（未测试）

操作手册

服务器启动

vllm serve /path/to/scicore-mol \
  --dtype bfloat16 \
  --max-model-len 4096 \
  --max-num-seqs 16 \
  --trust-remote-code \
  --port 8000

验证

# Readiness check
curl -sf http://127.0.0.1:8000/v1/models

# Text inference
curl -s http://127.0.0.1:8000/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"scicore-mol","messages":[{"role":"user","content":"say hi"}],"temperature":0,"max_tokens":16}'

代码变更

无需任何变更。 Qwen3ForCausalLM架构在vLLM中已获得原生支持，可直接在昇腾NPU上运行，无需任何修改。