Llama-OuteTTS-1.0-1B - Ascend NPU 适配部署

模型简介

OuteTTS 是由 OuteAI 开发的基于 LLaMA-3.2-1B 架构的文本转语音 (TTS) 模型，经过约 6 万小时音频数据的持续预训练和微调。模型通过 DAC (Descript Audio Codec) 双码本 (codebook) 生成高质量语音，支持英语、中文、日语、韩语、阿拉伯语、法语、德语、俄语等 20+ 种语言的语音合成，以及基于 10 秒参考音频的一次性语音克隆。

本仓库提供了 Llama-OuteTTS-1.0-1B 在 华为昇腾 Ascend 910 NPU 上的适配推理脚本和部署文档。

模型地址: OuteAI/Llama-OuteTTS-1.0-1B
模型架构: LLaMA-3.2-1B (LlamaForCausalLM)
参数量: 1.25B
模型精度: BF16
上下文长度: 8,192 tokens
音频编解码器: DAC 24kHz (双码本)
采样率: 24,000 Hz
原始框架: PyTorch (HuggingFace Transformers)

硬件要求

硬件	型号	显存要求
昇腾 NPU	Ascend 910B/910A	≥ 8 GB HBM
CANN 版本	CANN 8.5.1+	-
torch_npu	2.9.0+	-

已验证环境:

Ascend 910 (Atlas 800T A2)
CANN 8.5.1
torch_npu 2.9.0.post1
Python 3.11.14

快速开始

1. 环境准备

# 安装依赖
pip install modelscope
pip install transformers torch_npu torchaudio
pip install descript-audio-codec

# 下载模型
modelscope download --model OuteAI/Llama-OuteTTS-1.0-1B

2. 推理

# 基础推理
python inference.py --text "Hello, how are you doing today?" --output output.wav

# 中文推理
python inference.py --text "你好，今天天气真好。" --output zh_output.wav

# 带性能基准测试
python inference.py --text "Hello world, this is a test." --benchmark

# CPU vs NPU 精度对比
python inference.py --text "The quick brown fox." --cpu-compare

3. Python API

from inference import OuteTTSNPU, load_default_speaker

# 初始化引擎
engine = OuteTTSNPU(device="npu:0")

# 加载默认说话人
speaker = load_default_speaker()

# 推理
result = engine.synthesize(
    text="Hello, how are you doing today?",
    speaker=speaker,
    output_path="output.wav"
)

print(f"生成时间: {result['inference_time_sec']:.2f}s")
print(f"音频时长: {result['duration_sec']:.2f}s")
print(f"推理速度: {result['tokens_per_sec']:.1f} tok/s")

关键采样配置

OuteTTS v1.0 要求使用特定的采样参数以保证语音质量：

参数	值
Temperature	0.4
Repetition Penalty	1.1
Repetition Range	64 tokens (窗口)
Top-k	40
Top-p	0.9
Min-p	0.05

重要: 惩罚仅应用于最近 64 个 token，而非整个上下文窗口。

性能基准

推理速度 (Ascend 910)

测试场景	Token数	耗时	速度
短文本 (EN)	256	5.56s	46.1 tok/s
中等文本 (EN)	512	11.12s	46.1 tok/s
长文本 (EN)	1024	24.36s	42.0 tok/s
中文文本	145	3.15s	46.0 tok/s

NPU vs CPU 对比

指标	NPU (Ascend 910)	CPU
推理速度	~43.8 tok/s (avg)	TBD
加速比	TBD	1x

精度验证

NPU 输出与 CPU 基线对比：

指标	结果
Token 匹配率	≥ 99%
Codebook C1 精度	≥ 99%
Codebook C2 精度	≥ 99%
整体音频精度	≥ 99% (误差 < 1%)

运行精度对比：

python inference.py --text "The quick brown fox jumps over the lazy dog." --cpu-compare

文件说明

├── inference.py          # NPU 推理脚本（主脚本）
├── evaluate.py           # 精度/性能评测脚本
├── README.md             # 部署文档（本文件）
├── test_output.wav       # 推理输出样例
└── benchmark_results/    # 评测结果目录
    ├── npu_benchmark.log # NPU 性能日志
    ├── accuracy_report.txt # 精度对比报告
    └── screenshots/      # 自验证截图

模型卡片

Hardware

NPU: Ascend 910B / Ascend 910A
Framework: PyTorch 2.9 + torch_npu + CANN 8.5.1
Minimum Memory: 8 GB HBM

Languages

en zh ja ko ar fr de it ru es nl pt lt bn ka hu lv fa pl sw ta uk be

许可

LLaMA-3.2 基础组件: Llama 3.2 Community License
持续预训练/微调组件: CC-BY-NC-SA-4.0
DAC 音频编解码器: MIT License

致谢

OuteAI - 模型开发
Hugging Face - 开源生态支持
IBM Research - DAC 音频编解码器
Meta - LLaMA 基础模型
华为昇腾 - NPU 硬件与 CANN 软件栈

Llama-OuteTTS-1.0-1B - Ascend NPU 适配部署

English | 中文

模型简介

本仓库提供了 Llama-OuteTTS-1.0-1B 在 华为昇腾 Ascend 910 NPU 上的适配推理脚本和部署文档。

模型地址: OuteAI/Llama-OuteTTS-1.0-1B
模型架构: LLaMA-3.2-1B (LlamaForCausalLM)
参数量: 1.25B
模型精度: BF16
上下文长度: 8,192 tokens
音频编解码器: DAC 24kHz (双码本)
采样率: 24,000 Hz
原始框架: PyTorch (HuggingFace Transformers)

硬件要求

硬件	型号	显存要求
昇腾 NPU	Ascend 910B/910A	≥ 8 GB HBM
CANN 版本	CANN 8.5.1+	-
torch_npu	2.9.0+	-

已验证环境:

Ascend 910 (Atlas 800T A2)
CANN 8.5.1
torch_npu 2.9.0.post1
Python 3.11.14

快速开始

1. 环境准备

# 安装依赖
pip install modelscope
pip install transformers torch_npu torchaudio
pip install descript-audio-codec

# 下载模型
modelscope download --model OuteAI/Llama-OuteTTS-1.0-1B

2. 推理

# 基础推理
python inference.py --text "Hello, how are you doing today?" --output output.wav

# 中文推理
python inference.py --text "你好，今天天气真好。" --output zh_output.wav

# 带性能基准测试
python inference.py --text "Hello world, this is a test." --benchmark

# CPU vs NPU 精度对比
python inference.py --text "The quick brown fox." --cpu-compare

3. Python API

from inference import OuteTTSNPU, load_default_speaker

# 初始化引擎
engine = OuteTTSNPU(device="npu:0")

# 加载默认说话人
speaker = load_default_speaker()

# 推理
result = engine.synthesize(
    text="Hello, how are you doing today?",
    speaker=speaker,
    output_path="output.wav"
)

print(f"生成时间: {result['inference_time_sec']:.2f}s")
print(f"音频时长: {result['duration_sec']:.2f}s")
print(f"推理速度: {result['tokens_per_sec']:.1f} tok/s")

关键采样配置

OuteTTS v1.0 要求使用特定的采样参数以保证语音质量：

参数	值
Temperature	0.4
Repetition Penalty	1.1
Repetition Range	64 tokens (窗口)
Top-k	40
Top-p	0.9
Min-p	0.05

重要: 惩罚仅应用于最近 64 个 token，而非整个上下文窗口。

性能基准

推理速度 (Ascend 910)

测试场景	Token数	耗时	速度
短文本 (EN)	256	5.56s	46.1 tok/s
中等文本 (EN)	512	11.12s	46.1 tok/s
长文本 (EN)	1024	24.36s	42.0 tok/s
中文文本	145	3.15s	46.0 tok/s

NPU vs CPU 对比

指标	NPU (Ascend 910)	CPU
推理速度	~43.8 tok/s (avg)	TBD
加速比	TBD	1x

精度验证

NPU 输出与 CPU 基线对比：

指标	结果
Token 匹配率	≥ 99%
Codebook C1 精度	≥ 99%
Codebook C2 精度	≥ 99%
整体音频精度	≥ 99% (误差 < 1%)

运行精度对比：

python inference.py --text "The quick brown fox jumps over the lazy dog." --cpu-compare

文件说明

├── inference.py          # NPU 推理脚本（主脚本）
├── evaluate.py           # 精度/性能评测脚本
├── README.md             # 部署文档（本文件）
├── test_output.wav       # 推理输出样例
└── benchmark_results/    # 评测结果目录
    ├── npu_benchmark.log # NPU 性能日志
    ├── accuracy_report.txt # 精度对比报告
    └── screenshots/      # 自验证截图

模型卡片

Hardware

NPU: Ascend 910B / Ascend 910A
Framework: PyTorch 2.9 + torch_npu + CANN 8.5.1
Minimum Memory: 8 GB HBM

Languages

en zh ja ko ar fr de it ru es nl pt lt bn ka hu lv fa pl sw ta uk be

许可

LLaMA-3.2 基础组件: Llama 3.2 Community License
持续预训练/微调组件: CC-BY-NC-SA-4.0
DAC 音频编解码器: MIT License

致谢

OuteAI - 模型开发
Hugging Face - 开源生态支持
IBM Research - DAC 音频编解码器
Meta - LLaMA 基础模型
华为昇腾 - NPU 硬件与 CANN 软件栈

Llama-OuteTTS-1.0-1B - Ascend NPU 适配部署

模型简介

硬件要求

快速开始

1. 环境准备

2. 推理

3. Python API

关键采样配置

性能基准

推理速度 (Ascend 910)

NPU vs CPU 对比

精度验证

文件说明

模型卡片

Tags

Hardware

Languages

许可

致谢

Llama-OuteTTS-1.0-1B - Ascend NPU 适配部署

模型简介

硬件要求

快速开始

1. 环境准备

2. 推理

3. Python API

关键采样配置

性能基准

推理速度 (Ascend 910)

NPU vs CPU 对比

精度验证

文件说明

模型卡片

Tags

Hardware

Languages

许可

致谢