facial_emotions_image_detection 是基于 Google ViT-Base-Patch16-224 的人脸表情分类模型,能够对输入的人脸图像进行情感分类。模型将图像分类为 7 种情感:sad、disgust、angry、neutral、fear、surprise、happy。
facial_emotions_image_detection-ascend/
├── inference.py # 推理测试脚本
├── log.txt # 测试日志
├── README.md # 本文档
├── test_image.png # 测试图像
├── inference_result.json # 推理结果
└── precision_result.json # 精度测试结果docker exec -it test-modelagent bashsource /usr/local/Ascend/ascend-toolkit/set_env.sh模型文件位于 /data/ysws/agentsp/5-16/facial_emotions_image_detection/ 目录下:
pip install transformers torch_npu pillow numpy -i https://pypi.huaweicloud.com/repository/pypi/simple/Run the inference script for emotion classification:
cd /data/ysws/agentsp/5-16/facial_emotions_image_detection-ascend/
python3 inference.py
python3 inference.py --mode inference运行精度对比测试:
cd /data/ysws/agentsp/5-16/facial_emotions_image_detection-ascend/
python3 inference.py --mode precision_test| 参数 | 说明 | 默认值 |
|---|---|---|
--mode | 测试模式: all, inference 或 precision_test | all |
| 指标 | 实测值 | 阈值 | 状态 |
|---|---|---|---|
| 最大相对误差 | 0.8199% | < 1.00% | PASS |
| CPU 推理时间 | 1.680s | - | - |
| NPU 推理时间 | 0.034s | - | - |
| 加速比 | 49.64x | > 1x | PASS |
输入: 224x224 RGB 人脸图像
输出:
| 情感 | ID |
|---|---|
| sad | 0 |
| disgust | 1 |
| angry | 2 |
| neutral | 3 |
| fear | 4 |
| surprise | 5 |
| happy | 6 |
Facial Emotions Image Detection NPU Test
Model: dima806/facial_emotions_image_detection (ViT emotion classification)
Output: /data/ysws/agentsp/5-16/facial_emotions_image_detection-ascend
============================================================
Inference Test (NPU)
============================================================
Device: npu:0
Loading model and processor...
Model loaded successfully
Input shape: torch.Size([1, 3, 224, 224])
Inference time: 5.232s
Predicted class: 4 (fear)
Confidence: 0.2571
Logits shape: torch.Size([1, 7])
============================================================
Precision Test (CPU vs NPU)
============================================================
NPU Device: npu:0
Loading model...
Input shape: torch.Size([1, 3, 224, 224])
Running on CPU...
Running on NPU...
CPU inference time: 1.680s
NPU inference time: 0.034s
Speedup: 49.64x
Max absolute error: 9.278297e-03
Max relative error: 0.8199% (threshold: 1.0%)
Status: PASS
============================================================
Precision Test Result: PASS
============================================================
============================================================
Test Complete!
============================================================import torch
from PIL import Image
from transformers import ViTForImageClassification, AutoImageProcessor
MODEL_DIR = "/data/ysws/agentsp/5-16/facial_emotions_image_detection"
processor = AutoImageProcessor.from_pretrained(MODEL_DIR)
model = ViTForImageClassification.from_pretrained(MODEL_DIR)
model = model.to("npu:0").eval()
image = Image.open("face.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
inputs = {k: v.to("npu:0") for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=-1)
predicted_class = logits.argmax(-1).item()
predicted_label = model.config.id2label[str(predicted_class)]
confidence = probs[0][predicted_class].item()
print(f"Predicted: {predicted_label} (confidence: {confidence:.4f})")import numpy as np
array = np.random.randint(0, 255, (224, 224, 3), dtype=np.uint8)
image = Image.fromarray(array)
inputs = processor(images=image, return_tensors="pt")| 组件 | 说明 |
|---|---|
| embeddings | 图像块嵌入 + CLS标记 |
| encoder | 12层Transformer编码器 |
| layernorm | 层归一化 |
| classifier | 线性分类头 (768 -> 7) |
从 config.json 提取的关键参数:
{
"hidden_size": 768,
"intermediate_size": 3072,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"patch_size": 16,
"image_size": 224,
"num_channels": 3,
"problem_type": "single_label_classification"
}A: 检查 NPU 驱动是否正确安装。ViT 模型在 CPU 和 NPU 上的数值误差极小(< 0.9%),远低于 1% 阈值。
A: 首次推理会有编译开销。NPU 相比 CPU 有显著加速(49x),适合批量处理场景。
A: 支持 PIL、numpy array、torch tensor 等常见格式。输入图像会自动 resize 到 224x224 并 normalize。
A: 修改 inputs 格式即可:
images = [Image.open(f) for f in image_files]
inputs = processor(images=images, return_tensors="pt")本项目遵循 Apache-2.0 许可证