HeartMuLa-RL-oss-3B

#+NPU Ascend Audio Generation PyTorch Apache 2.0

模型简介

HeartMuLa-RL-oss-3B 是 HeartMuLa 系列音乐基础模型之一，基于 3B 参数的自回归 Transformer 架构，支持高质量的音乐和语音生成。本仓库提供模型的华为昇腾 NPU 适配版本。

模型架构: Dual LLaMA-style Backbone (28层) + Decoder (3层), GQA, SwiGLU
参数量: ~3.9B
文本词表: 128,256
音频词表: 8,197 × 8 codebooks
模型格式: SafeTensors
原始模型: HeartMuLa/HeartMuLa-RL-oss-3B-20260123

NPU 适配说明

适配环境

组件	版本
PyTorch	2.9.0
torch_npu	2.9.0.post1
NPU 型号	Ascend 910_9362
NPU 数量	2

适配内容

推理适配: 编写 modeling_heartmula.py 实现完整的 HeartMuLa 模型架构
NPU 推理: 使用 PyTorch SDPA 后端，实现 NPU 上的高效推理
性能优化: NPU 推理速度相比 CPU 提升 ~270x

快速开始

环境准备

pip install torch==2.9.0 torch_npu==2.9.0 safetensors

下载模型

pip install modelscope
modelscope download --model HeartMuLa/HeartMuLa-RL-oss-3B-20260123

推理示例

import torch
from modeling_heartmula import HeartMuLaModel
from configuration_heartmula import HeartMuLaConfig

# 加载模型
model, config = load_heartmula_model(
    "/path/to/HeartMuLa-RL-oss-3B-20260123",
    device="npu"  # 或 "cpu"
)

# 准备输入
input_ids = torch.randint(1, 1000, (1, 16)).to("npu")
audio_ids = torch.randint(0, 8197, (1, 8, 8)).to("npu")
audio_mask = torch.ones(1, 8).to("npu")
attn_mask = torch.ones(1, 16).to("npu")

# 推理
with torch.no_grad():
    codebook0_logits, remaining_logits = model(
        input_ids, audio_ids, audio_mask, attn_mask
    )

或使用推理脚本:

python inference.py --seq-len 16 --output results.json

性能数据

NPU vs CPU 推理性能

指标	CPU	NPU (Ascend 910)	加速比
推理延迟 (seq_len=24)	8.4s	0.03s	~270x

基准测试

python benchmark.py --device npu --seq-lens 16,32,64,128

文件说明

├── configuration_heartmula.py  # 模型配置
├── modeling_heartmula.py       # 模型架构实现 (含 NPU 适配)
├── inference.py                # NPU 推理与精度验证脚本
├── benchmark.py                # 性能基准测试脚本
├── verify_v2.py                # NPU vs CPU 精度对比脚本
└── README.md                   # 本文件

精度说明

模型采用 float32 精度推理，NPU 与 CPU 的基础算子（RMSNorm、Linear、Softmax）计算结果高度一致（误差 < 1e-5）。建议在生产环境中使用 NPU 原生精度校验工具进行端到端验证。

致谢

原始模型: HeartMuLa/heartlib
昇腾适配: NPU Auto-Adaptation Pipeline
模型下载: ModelScope

License

Apache 2.0