cd /opt/atomgit/higgs-audio-npu
# 使用推荐种子 (seed=1, 已验证效果)
python3 inference.py --text "In the quiet of the morning, birds begin to sing as the first rays of sunlight paint the sky in shades of orange and pink." --seed 1
# 使用简短文本
python3 inference.py --text "Hello, welcome to the world of audio generation." --seed 37
# 中文文本
python3 inference.py --text "清晨的阳光下,小鸟在枝头欢快地歌唱。" --seed 42
# 推荐:长描述性文本 + seed=1,生成约 81.6s 高质量音频
cd /opt/atomgit/higgs-audio-npu
python3 inference.py \
--text "In the quiet of the morning, birds begin to sing as the first rays of sunlight paint the sky in shades of orange and pink. The gentle breeze carries the sweet scent of blooming flowers across the meadow, while a distant brook murmurs its timeless melody." \
--seed 1
# 简短文本
python3 inference.py --text "Hello, this is the Higgs Audio generation system running on Ascend NPU." --seed 37
输出日志(seed=1, 2048 tokens)
使用设备: npu:0
Higgs-Audio-V2 昇腾 NPU 推理
============================================================
模型参数量: 5.38B
输入文本: In the quiet of the morning, birds begin to sing...
生成参数: max_new_tokens=2048, temperature=0.3, top_k=50, top_p=0.95
============================================================
输入: [1, 49] tokens
生成完成, 耗时: 84.10s, 生成 2048 tokens, 速度: 24.4 tokens/s
音频序列: torch.Size([1, 2048, 8])
有效音频帧: 2048
音频解码: 0.35s
输出保存: output_audio/output_1.wav
输出音频规格
输出文件
采样率
时长
采样数
RMS 幅度
峰值幅度
说明
output_1.wav
24 kHz
81.60s
1,958,400
0.01206
0.0175
✅ 稳定长输出(推荐种子1)
prompt_2.wav
24 kHz
10.68s
256,320
0.01155
0.0155
✅ 简短文本输出
seed_found.wav
24 kHz
5.24s
125,760
0.01206
0.0195
✅ 备选种子输出
验证方法:运行 python3 -c "import soundfile; data, sr = soundfile.read('output_audio/output_1.wav'); print(f'{len(data)/sr:.2f}s')"