sarashina2.2-tts 在昇腾 NPU 上的部署

1. 简介

本文档记录 sarashina2.2-tts（Sarashina2.2 TTS 模型）在昇腾 NPU（Ascend 910B3）环境的适配部署与精度验证结果。

Sarashina2.2-TTS 基于 LlamaForCausalLM 架构，参数量约 810M，支持日语文本到语音合成。本项目完成该模型在昇腾 NPU 上的推理适配，验证 NPU 与 CPU 结果的精度误差 < 1%。

2. 验证环境

组件	版本
Python	3.11.x
PyTorch	2.10.0+cpu
torch_npu	2.10.0
transformers	5.8.1
CANN	8.5.1
NPU 硬件	Ascend 910B3

3. 模型信息

项目	值
模型架构	LlamaForCausalLM
参数量	~810M
隐藏维度	2048
Layers	24
Attention Heads	32
词表大小	108986
权重格式	safetensors
框架	PyTorch (transformers)

4. Conda 环境安装

conda create -n sarashina-tts python=3.11 -y
conda activate sarashina-tts
pip install torch==2.10.0 torchvision==0.25.0 --index-url https://repo.huaweicloud.com/repository/pypi/simple/
pip install torch_npu==2.10.0 --index-url https://repo.huaweicloud.com/repository/pypi/simple/
pip install transformers safetensors --index-url https://repo.huaweicloud.com/repository/pypi/simple/

5. 推理执行

python3 inference.py --model_path /path/to/sarashina2.2-tts
python3 inference.py ... --device cpu --text "こんにちは"
python3 benchmark.py --model_path /path/to/sarashina2.2-tts

6. 参数说明

脚本	参数	默认值
inference.py	`--device`	npu:0
inference.py	`--text`	None (随机输入)
benchmark.py	`--npu_device`	npu:0
benchmark.py	`--num_warmup`	3

7. 精度评测结果

评测方法

随机 token 输入（batch=1, seq=128），对比 CPU（FP32）和 NPU（FP32）输出 logits。

多维度精度指标

指标类别	指标	实测值	阈值	状态
Overall	向量级相对误差	0.000114%	< 1%	PASS ✅
	余弦相似度	1.0000001192	> 0.99	PASS ✅
	SNR (dB)	117.51	—	—
Absolute Error	Max	1.43e-04	—	—
	Mean	1.28e-05	—	—
	P95	3.15e-05	—	—
	P99	3.91e-05	—	—
Element-wise Rel	Max	139.84%	—	—
	Mean	0.0002%	—	—
	P95	0.0002%	—	—
Per-Position	Max Pos Error	2.82e-05	—	—
	Mean Pos Error	1.28e-05	—	—

判定结论

指标	实测值	阈值	状态
向量级相对误差	0.0001%	< 1%	PASS ✅

8. 性能数据

操作	耗时
CPU（FP32）	4.14s
NPU（FP32，3轮预热）	0.04s
加速比	97.30x

9. 注意事项

使用 AutoModelForCausalLM 标准 transformers 接口。
模型为日语 TTS，输入文本需为日文。
精度接近无损（FP32），向量级相对误差极小。