nlp_deberta_rex-uninlu_chinese-base - Ascend NPU 适配

1. 模型简介

DeBERTa Rex-UniNLU 是一个基于 DeBERTa-v2 架构的通用信息抽取模型，支持实体识别、关系抽取、事件抽取等 NLU 任务。模型通过 RexModel 包装器扩展了基础 DeBERTa，增加了旋转位置编码和 FFN 层。

原始模型: iic/nlp_deberta_rex-uninlu_chinese-base
框架: PyTorch
任务: Rex-UniNLU (Universal Information Extraction)

2. 昇腾 NPU 适配结果

指标	值
Cosine Similarity	1.000000
MaxAbsErr	0.007798
Relative Error	0.0631%
平均延迟	17.87 ms
峰值显存	0.39 GB
参数量	98,237,568
推理精度	float32
设备	Ascend 910B4

3. 环境要求

组件	版本
CANN	8.5.1
torch_npu	2.9.0.post1
PyTorch	2.9.0
Python	3.11

4. 快速使用

# 设置环境
source setup_env.sh

# 运行推理 (CPU vs NPU 对比)
python3 inference.py --device npu:0 --dtype float32

# 使用 float16 精度
python3 inference.py --device npu:0 --dtype float16

5. 推理输出证据

NPU 推理输出（float32, CLS token 对比）:

模型: iic/nlp_deberta_rex-uninlu_chinese-base
设备: npu:0
精度: float32
------------------------------------------------------------
[CPU] 加载模型...
[CPU] 推理中...
[NPU] 加载模型到 npu:0...
[NPU] 推理中...

  CPU emb shape: torch.Size([2, 768])
  NPU emb shape: torch.Size([2, 768])
  Cosine Similarity (per sample): [0.9999996423721313, 0.9999997019767761]
  Cosine Similarity (mean): 1.000000
  MaxAbsErr: 0.007798
  Relative Error: 0.0631%

✓ 推理完成

[Perf] 加载模型测延迟...
  平均延迟: 17.87 ms

6. CPU vs NPU 精度对比

指标	CPU (float32)	NPU (float32)	误差
Cosine Similarity	基准	1.000000	< 0.001%
MaxAbsErr	-	0.007798	-
Relative Error	-	0.0631%	< 1% ✓
输出维度	[2, 768]	[2, 768]	一致
NaN	False	False	一致

7. 模型结构

Backbone: DeBERTa-v2 (12 layers, 768 hidden, 12 heads)
Wrapper: RexModel (FFN + RotaryEmbedding)
输入: 中文文本
输出: 实体/关系/事件抽取结果 (概率矩阵)

8. 验证报告

详见 screenshots/verification.txt。

9. Agent Skill

本适配由 Ascend NPU 适配 Agent Skill 自动完成。

1. 模型简介

原始模型: iic/nlp_deberta_rex-uninlu_chinese-base

框架: PyTorch

任务: Rex-UniNLU (Universal Information Extraction)

指标

值

Cosine Similarity

1.000000

MaxAbsErr

0.007798

Relative Error

0.0631%

平均延迟

17.87 ms

峰值显存

0.39 GB

参数量

98,237,568

推理精度

float32

设备

Ascend 910B4

组件

版本

CANN

8.5.1

torch_npu

2.9.0.post1

PyTorch

2.9.0

Python

3.11

5. 推理输出证据

NPU 推理输出（float32, CLS token 对比）:

模型: iic/nlp_deberta_rex-uninlu_chinese-base
设备: npu:0
精度: float32
------------------------------------------------------------
[CPU] 加载模型...
[CPU] 推理中...
[NPU] 加载模型到 npu:0...
[NPU] 推理中...

  CPU emb shape: torch.Size([2, 768])
  NPU emb shape: torch.Size([2, 768])
  Cosine Similarity (per sample): [0.9999996423721313, 0.9999997019767761]
  Cosine Similarity (mean): 1.000000
  MaxAbsErr: 0.007798
  Relative Error: 0.0631%

✓ 推理完成

[Perf] 加载模型测延迟...
  平均延迟: 17.87 ms

指标

CPU (float32)

NPU (float32)

误差

Cosine Similarity

基准

1.000000

< 0.001%

MaxAbsErr

0.007798

Relative Error

0.0631%

< 1% ✓

输出维度

[2, 768]

一致

NaN

False

一致