TADA-3B-ML Ascend NPU 部署指南

项目简介

TADA-3B-ML 是一个基于 Llama 3.2 3B 的多语言语音生成模型，通过 Text-Acoustic Dual Alignment 实现语音和文本的 1:1 对齐。本项目提供其在华为 Ascend NPU 环境下的部署方案。

特性

支持 Ascend NPU 推理加速
CPU vs NPU 精度对比测试 (< 1% 误差)
1:1 Token Alignment - 每个文本token对应一个语音向量
Dynamic Duration Synthesis - 动态时长合成
多语言支持

环境要求

硬件: 华为 Ascend 910 系列 NPU
CANN: 8.0.RC1 或更高版本
PyTorch: 2.0+ with torch_npu
Docker: 容器名称 test-modelagent

目录结构

/data/ysws/agentsp/tada-3b-ml-ascend/
├── inference.py          # 精度测试脚本
├── log.txt               # 测试日志
├── README.md             # 本文档
└── final-graphics-polished/ # 评估图表

部署步骤

1. 进入容器

docker exec -it test-modelagent bash

2. 设置环境变量

source /usr/local/Ascend/ascend-toolkit/set_env.sh

3. 准备模型文件

模型文件应放在 /data/ysws/agentsp/tada-3b-ml/ 目录下：

model-00001-of-00002.safetensors - 分片1 (4.9GB)
model-00002-of-00002.safetensors - 分片2 (3.9GB)
model.safetensors.index.json - 分片索引
config.json - 模型配置
generation_config.json - 生成配置

4. 执行精度测试

cd /data/ysws/agentsp/tada-3b-ml-ascend/
python3 inference.py --precision_test

5. 运行推理

cd /data/ysws/agentsp/tada-3b-ml-ascend/
python3 inference.py

测试验证

精度测试结果

指标	实测值	阈值	状态
Max Error (sum)	1.34e-04	< 1e-3	PASS
Max Error (mean)	7.45e-09	< 1e-5	PASS
Max Error (std)	1.86e-09	< 1e-5	PASS

性能数据

操作	耗时
模型加载	~42s
CPU 参考计算 (20 tensors)	1.18s
NPU 推理 (20 tensors)	0.59s
完整推理 (1, 32 tokens)	~0.41s

测试日志

完整测试日志保存在 log.txt

模型结构

TADA-3B-ML 基于 Llama 3.2 3B 架构：

组件	参数	说明
embed_tokens	128256 x 3072	词嵌入层
layers (28层)	每层 LlamaDecoderLayer	Transformer层
norm	RMSNorm(3072)	最终归一化
lm_head	3072 x 128256	语料库投影

与 TADA-1B 对比

指标	TADA-1B	TADA-3B-ML
参数	2.16B	~3B
hidden_size	2048	3072
num_layers	16	28
num_heads	32	24
head_dim	64	128

输入输出格式

输入: (B, T) - token IDs
输出: (B, T, vocab_size) - logits

常见问题

Q: 精度测试失败?

A: 检查 NPU 驱动是否正确安装，确保 CANN 环境变量已 source。

Q: 推理时间较长?

A: 3B 模型推理时间约 0.4s/32 tokens，属于正常范围。

许可证

本项目遵循 Llama 3.2 Community License。