本仓库包含hubert-base-960h-itw-deepfake的昇腾NPU适配版本,这是一个用于音频深度伪造检测的微调HuBERT模型。
HubertForSequenceClassificationbona-fide(0)、spoof(1)| 组件 | 版本 |
|---|---|
| Python | >= 3.9 |
| PyTorch | >= 2.0.0 |
| torch-npu | >= 2.9.0 |
| transformers | >= 4.38.0 |
| 昇腾驱动 | 25.5.2或兼容版本 |
硬件:昇腾910 NPU(已在Atlas 800 A2上测试)
设备映射
torch.device("cuda")更改为torch.device("npu")以适配昇腾硬件。模型加载
transformers.AutoConfig和HubertForSequenceClassification进行标准加载。torch_npu支持。无架构变更
transformers架构;无需进行权重重映射或算子替换。# 1. Clone this repository
git clone <repo-url>
cd hubert-npu-adapted
# 2. Install dependencies
pip install -r requirements.txt
# 3. Run inference on NPU
python inference_npu.py --model_path ./hubert-base-960h-itw-deepfake
# 4. Run accuracy verification (CPU vs NPU)
python verify_accuracy.py我们使用随机初始化的权重(结构验证)和多个随机种子,对比了模型在CPU和Ascend NPU上的输出结果。
| 指标 | 数值 |
|---|---|
| 最大绝对误差 | 2.42e-04 |
| 平均绝对误差 | 1.79e-04 |
| 最大相对误差 | 2.81e-03 |
| 平均相对误差 | 2.03e-03 |
| 余弦相似度 | 0.999997 |
| 种子 | 最大绝对误差 | 余弦相似度 |
|---|---|---|
| 42 | 2.72e-04 | 0.99999832 |
| 123 | 2.62e-04 | 0.99999812 |
| 456 | 2.33e-04 | 0.99999839 |
| 789 | 2.82e-04 | 0.99999817 |
| 2024 | 2.13e-04 | 0.99999869 |
结论:CPU和NPU的输出在数值上保持一致。绝对误差在预期的浮点精度容差范围内(< 1e-3)。该模型已通过Ascend NPU部署验证。
import torch
import torch_npu
import numpy as np
from transformers import AutoConfig, Wav2Vec2FeatureExtractor, HubertForSequenceClassification
# Use NPU
device = torch.device("npu" if torch.npu.is_available() else "cpu")
model_path = "./hubert-base-960h-itw-deepfake"
config = AutoConfig.from_pretrained(model_path)
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_path)
model = HubertForSequenceClassification.from_pretrained(model_path, config=config).to(device)
model.eval()
# Dummy audio (replace with real waveform)
audio = np.random.randn(16000).astype(np.float32) * 0.1
inputs = feature_extractor(audio, sampling_rate=16000, return_tensors="pt", padding=True)
input_values = inputs.input_values.to(device)
with torch.no_grad():
logits = model(input_values).logits
probs = torch.softmax(logits, dim=-1)
print(probs) # [[bona-fide_prob, spoof_prob]]audio_batch = [np.random.randn(16000).astype(np.float32) for _ in range(4)]
inputs = feature_extractor(audio_batch, sampling_rate=16000, return_tensors="pt", padding=True)
input_values = inputs.input_values.to(device)
with torch.no_grad():
logits = model(input_values).logits
probs = torch.softmax(logits, dim=-1)| 文件 | 描述 |
|---|---|
inference_npu.py | 具备自动设备选择功能的NPU推理脚本 |
verify_accuracy.py | CPU与NPU精度对比脚本 |
accuracy_report.json | 详细的精度指标JSON文件 |
requirements.txt | Python依赖项 |
Apache-2.0