Shanghai_AI_Laboratory/internlm2-1_8b-reward

1. 简介

Shanghai_AI_Laboratory/internlm2-1_8b-reward 是一个基于 InternLM2 架构的奖励模型（Reward Model），参数量为 1.8B，用于对对话质量进行评分。该模型已在华为昇腾 Ascend910 NPU 上完成适配和验证。

模型架构：InternLM2ForRewardModel
参数量：1.8B
精度：float16
推理框架：PyTorch + torch_npu
权重下载地址（ModelScope）：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b-reward

2. 验证环境

组件	版本
NPU	Ascend910
CANN	25.5.2
PyTorch	2.9.0+cpu
torch_npu	2.9.0.post1+gitee7ba04
transformers	4.41.2+
Python	3.11.14

3. 使用方式

import torch
import torch_npu
from transformers import AutoModel, AutoTokenizer

model_path = "Shanghai_AI_Laboratory/internlm2-1_8b-reward"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
model = model.npu().eval()

# 对单个对话计算 reward score
conversation = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
]
score = model.get_score(tokenizer, conversation)
print(f"Reward score: {score}")

# 比较两个对话的质量
conversation2 = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "I don't know."},
]
result = model.compare(tokenizer, conversation, conversation2)
print(f"Conversation 1 better than conversation 2: {result}")

4. 推理验证

模型直接使用 PyTorch 在 NPU 上进行推理，验证结果：

模型成功加载并运行在 Ascend910 NPU 上
get_score 方法正常工作，正确输出 reward score
compare 方法正常工作，正确比较两个对话的质量

5. 性能参考

在 Ascend910 NPU 上对 10 个对话进行评分测试（不包含模型加载时间）：

指标	数值
设备	Ascend910
平均推理延迟	34.3 ms
总测试时间	0.343 s
吞吐量	29.17 requests/s

6. 精度评测

使用对话偏好判断进行精度验证：

测试用例	优质对话得分	劣质对话得分	偏好判断正确
法国首都问答	0.9385	-2.4316	是
量子计算解释	-0.4395	-2.8184	是

所有测试用例的偏好判断均正确，模型精度验证通过。

7. 注意事项

该模型是奖励模型（Reward Model），用于对对话进行评分，不支持文本生成
需要使用 AutoModel 加载，而不是 AutoModelForCausalLM
下载优先使用 ModelScope SDK
模型权重包含 2 个 safetensors 文件（约 3.17GB）
需使用 trust_remote_code=True 加载自定义代码

1. 简介

模型架构：InternLM2ForRewardModel

参数量：1.8B

精度：float16

推理框架：PyTorch + torch_npu

权重下载地址（ModelScope）：https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b-reward

组件

版本

NPU

Ascend910

CANN

25.5.2

PyTorch

2.9.0+cpu

torch_npu

2.9.0.post1+gitee7ba04

transformers

4.41.2+

Python

3.11.14

3. 使用方式

import torch
import torch_npu
from transformers import AutoModel, AutoTokenizer

model_path = "Shanghai_AI_Laboratory/internlm2-1_8b-reward"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True)
model = model.npu().eval()

# 对单个对话计算 reward score
conversation = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "The capital of France is Paris."},
]
score = model.get_score(tokenizer, conversation)
print(f"Reward score: {score}")

# 比较两个对话的质量
conversation2 = [
    {"role": "user", "content": "What is the capital of France?"},
    {"role": "assistant", "content": "I don't know."},
]
result = model.compare(tokenizer, conversation, conversation2)
print(f"Conversation 1 better than conversation 2: {result}")

指标

数值

设备

Ascend910

平均推理延迟

34.3 ms

总测试时间

0.343 s

吞吐量

29.17 requests/s

测试用例

优质对话得分

劣质对话得分

偏好判断正确

法国首都问答

0.9385

-2.4316

是

量子计算解释

-0.4395

-2.8184

是

7. 注意事项

该模型是奖励模型（Reward Model），用于对对话进行评分，不支持文本生成

需要使用 AutoModel 加载，而不是 AutoModelForCausalLM

下载优先使用 ModelScope SDK

模型权重包含 2 个 safetensors 文件（约 3.17GB）

需使用 trust_remote_code=True 加载自定义代码