sarashina2.2-tts 昇腾 NPU 部署指南

概述

本项目提供 SB Intuitions sarashina2.2-tts 模型在华为昇腾 NPU 上的部署方案，这是一个基于大语言模型的日语/英语 Text-to-Speech 系统，支持零样本语音克隆。

模型信息

属性	值
模型名称	sarashina2.2-tts
参数量	~810M
架构	LlamaForCausalLM
基础模型	sbintuitions/sarashina2.2-0.5b-instruct-v0.1
支持语言	日语、英语
特点	零样本语音克隆、多风格支持

环境要求

NPU: Atlas 910B3
Python: 3.11
PyTorch: 2.8.0+ with torch_npu
safetensors

文件结构

/data/ysws/agentsp/sarashina2.2-tts-ascend/
├── README.md          # 本文档
├── inference.py       # 推理脚本
└── log.txt           # 运行日志

Running Inference

Accuracy Test

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/sarashina2.2-tts-ascend && python inference.py --precision_test 2>&1 | tee log.txt"

推理测试

docker exec test-modelagent bash -c "cd /data/ysws/agentsp/sarashina2.2-tts-ascend && python inference.py 2>&1 | tee log.txt"

参数说明

参数	说明	默认值
--model_path	模型路径	/data/ysws/agentsp/sarashina2.2-tts
--device	运行设备	npu:0
--precision_test	运行精度测试	False

精度测试结果

============================================================
Precision Comparison: CPU vs NPU
============================================================
Max errors: sum=1.53e-04, mean=1.19e-07, std=1.49e-08
PASS: NPU precision within thresholds
============================================================
PRECISION TEST PASSED
============================================================

指标	阈值	实测值	状态
max_error_sum	< 1e-3	1.53e-04	✅ PASS
max_error_mean	< 1e-5	1.19e-07	✅ PASS
max_error_std	< 1e-5	1.49e-08	✅ PASS

输出示例

2026-05-11 09:18:56,787 - INFO - Sarashina2.2-TTS Ascend NPU Inference
2026-05-11 09:18:56,802 - INFO - Model loaded! Total keys: 219
2026-05-11 09:18:56,802 - INFO - Total parameters: 809.91M
2026-05-11 09:18:56,802 - INFO - Running inference (embedding layer test)...
2026-05-11 09:18:58,514 - INFO - Embedding shape: torch.Size([100, 1280])
2026-05-11 09:18:58,515 - INFO - Inference time: 1712.30 ms
2026-05-11 09:18:58,516 - INFO - Embedding (first 5): [ 0.23730469 0.05541992 ...]
2026-05-11 09:18:58,517 - INFO - Inference completed successfully!

性能参考

指标	值
推理时间 (NPU)	~1.7秒
输出嵌入形状	torch.Size([100, 1280])
模型参数量	8.1亿

模型架构

sarashina2.2-tts 基于 LlamaForCausalLM 架构，主要组件包括：

嵌入层（Embedding Layer）：108986 词汇表嵌入
Transformer 层（Transformer Layers）：24 层 LLaMA 解码器
隐藏层大小（Hidden Size）：1280
注意力机制（Attention）：分组查询注意力（Grouped Query Attention，8 个 KV 头）
多层感知机（MLP）：SwiGLU 激活函数（中间层大小=4480）

注意事项

精度测试基于 state_dict 张量的 CPU 与 NPU 比较（排除大型嵌入层）
使用嵌入层测试进行推理验证
完整 TTS 推理需要额外的音频生成采样过程

属性

值

模型名称

sarashina2.2-tts

参数量

~810M

架构

LlamaForCausalLM

基础模型

sbintuitions/sarashina2.2-0.5b-instruct-v0.1

支持语言

日语、英语

特点

零样本语音克隆、多风格支持

参数

说明

默认值

--model_path

模型路径

/data/ysws/agentsp/sarashina2.2-tts

--device

运行设备

npu:0

--precision_test

运行精度测试

False

精度测试结果

============================================================
Precision Comparison: CPU vs NPU
============================================================
Max errors: sum=1.53e-04, mean=1.19e-07, std=1.49e-08
PASS: NPU precision within thresholds
============================================================
PRECISION TEST PASSED
============================================================

指标	阈值	实测值	状态
max_error_sum	< 1e-3	1.53e-04	✅ PASS
max_error_mean	< 1e-5	1.19e-07	✅ PASS
max_error_std	< 1e-5	1.49e-08	✅ PASS

输出示例

2026-05-11 09:18:56,787 - INFO - Sarashina2.2-TTS Ascend NPU Inference
2026-05-11 09:18:56,802 - INFO - Model loaded! Total keys: 219
2026-05-11 09:18:56,802 - INFO - Total parameters: 809.91M
2026-05-11 09:18:56,802 - INFO - Running inference (embedding layer test)...
2026-05-11 09:18:58,514 - INFO - Embedding shape: torch.Size([100, 1280])
2026-05-11 09:18:58,515 - INFO - Inference time: 1712.30 ms
2026-05-11 09:18:58,516 - INFO - Embedding (first 5): [ 0.23730469 0.05541992 ...]
2026-05-11 09:18:58,517 - INFO - Inference completed successfully!

指标

值

推理时间 (NPU)

~1.7秒

输出嵌入形状

torch.Size([100, 1280])

模型参数量

8.1亿

模型架构

sarashina2.2-tts 基于 LlamaForCausalLM 架构，主要组件包括：

嵌入层（Embedding Layer）：108986 词汇表嵌入

Transformer 层（Transformer Layers）：24 层 LLaMA 解码器

隐藏层大小（Hidden Size）：1280

注意力机制（Attention）：分组查询注意力（Grouped Query Attention，8 个 KV 头）

多层感知机（MLP）：SwiGLU 激活函数（中间层大小=4480）