suno/bark-small on Ascend NPU - 语音合成 (Text-to-Speech)

1. 简介

本项目将 suno/bark-small 语音合成 (Text-to-Speech)模型适配到华为昇腾 NPU（Ascend910B4-1）上运行。

原始模型：suno/bark-small
模型类型：语音合成 (Text-to-Speech)
适配方式：ModelScope / HuggingFace snapshot_download 下载权重，HuggingFace pipeline 推理
运行设备：单卡 Ascend NPU

安装依赖：

pip install -r requirements.txt

python inference.py

推理输出：

Input text: "Hello, this is a test of text to speech synthesis."
Output: 24000Hz mono audio, 4.48 seconds
Status: SUCCESS

TTS(Text-to-Speech)模型输出具有随机性(stochastic)，每次生成波形略有不同。但模型在Ascend NPU上推理正常，输出有效语音波形。

python benchmark.py

指标	数值
Avg latency	22288 ms

本项目包含单样本 smoke consistency 验证，非完整数据集评估。

本项目将 suno/bark-small 语音合成 (Text-to-Speech)模型适配到华为昇腾 NPU（Ascend910B4-1）上运行。

原始模型：suno/bark-small
模型类型：语音合成 (Text-to-Speech)
适配方式：ModelScope / HuggingFace snapshot_download 下载权重，HuggingFace pipeline 推理
运行设备：单卡 Ascend NPU

安装依赖：

pip install -r requirements.txt

python inference.py

推理输出：

Input text: "Hello, this is a test of text to speech synthesis."
Output: 24000Hz mono audio, 4.48 seconds
Status: SUCCESS

TTS(Text-to-Speech)模型输出具有随机性(stochastic)，每次生成波形略有不同。但模型在Ascend NPU上推理正常，输出有效语音波形。

python benchmark.py

指标	数值
Avg latency	22288 ms

本项目包含单样本 smoke consistency 验证，非完整数据集评估。