neutts-air on Ascend NPU

1. 简介

neutts-air 在华为昇腾 NPU (Ascend 910B4) 环境的适配与验证结果。

属性	值
原始模型	`neuphonic/neutts-air`
架构	LlamaForCausalLM (Qwen2-based)
参数量	~748M
原始量化	None (base)
语言	English (en-us)
原始许可	Apache 2.0

NeuTTS 是 Neuphonic 开源的设备端文本转语音模型，采用小语言模型 Backbone + NeuCodec 音频编解码器架构，支持即时语音克隆。

NPU 适配原理：原始 GGUF 格式依赖 llama-cpp-python 推理引擎，无法直接在 NPU 上运行。本适配使用同架构的 HuggingFace 格式模型 (neuphonic/neutts-air) 作为 Backbone，通过 torch_npu 在 Ascend NPU 上实现推理。

2. 验证环境

组件	版本
`torch_npu`	`2.9.0.post1+gitee7ba04`
`torch`	`2.9.0+cpu`
`transformers`	`4.56.2`
`neucodec`	latest
`CANN`	`8.5.1`
NPU	`1` × Ascend 910B4 (32GB HBM)

3. 快速部署

# 安装依赖
pip install torch_npu transformers neucodec librosa soundfile numpy

# 下载权重（使用 HF 镜像）
export HF_ENDPOINT=https://hf-mirror.com
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('neuphonic/neutts-air', local_dir='./models/neutts-air')
snapshot_download('neuphonic/neucodec', local_dir='./models/neucodec')
"

推理示例：

from neutts_npu import NeuTTSNPU

tts = NeuTTSNPU(
    model_path='./models/neutts-air',
    codec_path='./models/neucodec',
    device='npu',
    seed=42,
)
ref_codes = tts.encode_reference('reference.wav')
wav = tts.infer('Hello world.', ref_codes, 'reference text.')

4. Smoke 验证

Backbone 成功加载到 npu:0
Codec 正确保持在 cpu
推理输出正常，波形长度 > 0

5. 精度评测

指标	数值
测试模式	`do_sample=False` (确定性贪婪解码)
CPU 输出样本数	649920
NPU 输出样本数	649920
MSE	0.00000000
Relative Error	0.000000%
Cosine Similarity	1.00000000
结论	NPU 与 CPU 输出完全一致 (bit-identical)

6. 性能参考

测试条件：单 NPU 卡 (910B4), do_sample=False 贪婪解码, seed=42。

指标	数值
输入文本	"Hello, this is a medium length test..."
NPU 推理时间	328.1s
输出音频时长	32.1s
NPU RTF	10.21x
CPU 推理时间	~2400s (estimated)
NPU 加速比	~7.3x

7. 注意事项

若系统安装了 espeak-ng，音素化会自动启用以获得更好的语音质量
原始 GGUF 格式无法直接在 NPU 运行，适配使用同架构 HF 格式模型
精度对比请使用 do_sample=False 确定性模式
首次加载需下载 facebook/w2v-bert-2.0 语义模型 (~2.5GB)

1. 简介

neutts-air 在华为昇腾 NPU (Ascend 910B4) 环境的适配与验证结果。

属性	值
原始模型	`neuphonic/neutts-air`
架构	LlamaForCausalLM (Qwen2-based)
参数量	~748M
原始量化	None (base)
语言	English (en-us)
原始许可	Apache 2.0

NeuTTS 是 Neuphonic 开源的设备端文本转语音模型，采用小语言模型 Backbone + NeuCodec 音频编解码器架构，支持即时语音克隆。

相关地址：

组件

版本

torch_npu

2.9.0.post1+gitee7ba04

torch

2.9.0+cpu

transformers

4.56.2

neucodec

latest

CANN

8.5.1

NPU

1 × Ascend 910B4 (32GB HBM)

3. 快速部署

# 安装依赖
pip install torch_npu transformers neucodec librosa soundfile numpy

# 下载权重（使用 HF 镜像）
export HF_ENDPOINT=https://hf-mirror.com
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('neuphonic/neutts-air', local_dir='./models/neutts-air')
snapshot_download('neuphonic/neucodec', local_dir='./models/neucodec')
"

推理示例：

from neutts_npu import NeuTTSNPU

tts = NeuTTSNPU(
    model_path='./models/neutts-air',
    codec_path='./models/neucodec',
    device='npu',
    seed=42,
)
ref_codes = tts.encode_reference('reference.wav')
wav = tts.infer('Hello world.', ref_codes, 'reference text.')

指标

数值

测试模式

do_sample=False (确定性贪婪解码)

CPU 输出样本数

649920

NPU 输出样本数

649920

MSE

0.00000000

Relative Error

0.000000%

Cosine Similarity

1.00000000

结论

NPU 与 CPU 输出完全一致 (bit-identical)

指标

数值

输入文本

"Hello, this is a medium length test..."

NPU 推理时间

328.1s

输出音频时长

32.1s

NPU RTF

10.21x

CPU 推理时间

~2400s (estimated)

NPU 加速比

~7.3x