Xiaoxy510/facial_emotions_image_detection-ascend

facial_emotions_image_detection on Ascend 910B3

1. 简介

本文档记录 facial_emotions_image_detection 在昇腾 Ascend 910B3 NPU 上的迁移适配、推理部署与精度评测结果。

该模型是一个基于 Vision Transformer (ViT) 的人脸情绪分类模型，参数量约 85.80M，支持 7 种情绪识别：sad（悲伤）、disgust（厌恶）、angry（愤怒）、neutral（中性）、fear（恐惧）、surprise（惊讶）、happy（高兴）。

本次适配工作包括：

在 NPU（Ascend 910B3）上验证图像分类推理的正确性
对比 NPU 与 CPU 的输出精度，确保误差 < 1%
提供可直接使用的 NPU 推理脚本 inference.py
提供精度与性能评测脚本 eval.py

2. 验证环境

组件	版本
`Python`	`3.9.13`
`torch`	`2.8.0+cpu`
`torch_npu`	`2.8.0.post4`
`transformers`	`4.57.6`
`Pillow`	可用

NPU：Ascend 910B3 × 8 逻辑卡
驱动版本：25.5.2

3. 模型适配与部署

3.1 适配说明

该模型使用 ViT 架构，transformers 库原生支持。NPU 适配无需修改模型结构或权重。

已验证通过的适配流程：

import torch
import torch_npu
from transformers import AutoModelForImageClassification, AutoImageProcessor
from PIL import Image

model = AutoModelForImageClassification.from_pretrained("dima806/facial_emotions_image_detection")
model = model.npu()
model.eval()

processor = AutoImageProcessor.from_pretrained("dima806/facial_emotions_image_detection")
image = Image.open("face.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.npu() for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.softmax(outputs.logits.cpu(), dim=-1)
    pred = probs.argmax().item()
    print(model.config.id2label[pred])

3.2 环境准备

pip install torch torch_npu transformers Pillow -i https://repo.huaweicloud.com/repository/pypi/simple/
export HF_ENDPOINT=https://hf-mirror.com

3.3 推理脚本使用

# NPU 推理
python inference.py --image face.jpg

# 保存详细结果到 JSON
python inference.py --image face.jpg --output result.json

# CPU 推理
python inference.py --image face.jpg --device cpu

4. Smoke 验证

python inference.py

脚本默认使用随机生成图像进行推理验证。

5. 性能参考

测试条件：batch_size=8，224×224 输入，float32 精度，连续 10 次取平均。

指标	CPU	NPU (Ascend 910B3)
平均推理时间 (8 images)	~6800 ms	~19 ms
单图像平均耗时	~850 ms	~2.4 ms
加速比	1x	~350x
参数量	85.80M	85.80M
模型大小	327.2 MB	327.2 MB

6. 精度评测

评测方法

在 CPU 上加载模型并推理得到参考输出（logits）
在 NPU 上加载同一权重并推理得到 NPU 输出
对比两组输出，计算多个精度指标

评测结果

使用 8 张 224×224 测试图像进行评测：

指标	数值	要求	结果
MSE	1.98e-5	-	-
Cosine Similarity	0.99995435	> 0.999	✓
Prob Mean Diff (softmax)	0.062%	< 1%	✓ PASS
Prediction Agreement	100%	> 99%	✓

结论：NPU 精度误差 0.062%，满足精度要求（< 1%），分类结果与 CPU 一致。

详细评测日志见 eval_log.txt。

7. 注意事项

权重文件：NPU 适配无需修改原始权重
设备选择：脚本默认自动检测 NPU，若 NPU 不可用则回退到 CPU
图像大小：模型输入为 224×224 RGB 图像，processor 会自动调整大小
情绪类别：支持 7 种情绪：sad, disgust, angry, neutral, fear, surprise, happy
torch_npu 版本：确保与 torch 版本匹配
单卡推理：当前使用单张 NPU 卡