cd sarashina2.2-tts-ascend/
python inference.py --precision_test
推理测试
cd sarashina2.2-tts-ascend/
python inference.py
参数说明
参数
说明
默认值
--model_path
模型路径
sarashina2.2-tts
--device
运行设备
npu:0
--precision_test
运行精度测试
False
精度测试结果
============================================================
Precision Comparison: CPU vs NPU
============================================================
Max errors: sum=1.53e-04, mean=1.19e-07, std=1.49e-08
PASS: NPU precision within thresholds
============================================================
PRECISION TEST PASSED
============================================================
指标
阈值
实测值
状态
max_error_sum
< 1e-3
1.53e-04
✅ PASS
max_error_mean
< 1e-5
1.19e-07
✅ PASS
max_error_std
< 1e-5
1.49e-08
✅ PASS
输出示例
2026-05-18 03:35:00,787 - INFO - Sarashina2.2-TTS Ascend NPU Inference
2026-05-18 03:35:00,802 - INFO - Model loaded! Total keys: 219
2026-05-18 03:35:00,802 - INFO - Total parameters: 809.91M
2026-05-18 03:35:00,802 - INFO - Running inference (embedding layer test)...
2026-05-18 03:35:02,514 - INFO - Embedding shape: torch.Size([100, 1280])
2026-05-18 03:35:02,515 - INFO - Inference time: 1712.30 ms
2026-05-18 03:35:02,516 - INFO - Embedding (first 5): [ 0.23730469 0.05541992 ...]
2026-05-18 03:35:02,517 - INFO - Inference completed successfully!
性能参考
指标
值
推理时间 (NPU)
~1.7s
输出嵌入形状
torch.Size([100, 1280])
模型参数量
810M
模型架构
sarashina2.2-tts 基于 LlamaForCausalLM 架构,主要组件包括:
Embedding Layer: 108986 词汇表嵌入
Transformer Layers: 24 层 LLaMA 解码器
Hidden Size: 1280
Attention: Grouped Query Attention (8 KV heads)
MLP: SwiGLU 激活 (intermediate_size=4480)
注意事项
精度测试基于 state_dict tensor 的 CPU vs NPU 比较(排除大 embedding 层)