模型: facebook/webssl-mae700m-full2b-224
任务: 视觉特征提取 (Vision Transformer)
硬件: 昇腾 Ascend 910
适配组织: Ascend-SACT
报告日期: 2026-05-20
| 属性 | 值 |
|---|---|
| 模型架构 | ViT-Huge (Vision Transformer) |
| 参数量 | 700M |
| 输入分辨率 | 224×224 |
| 隐藏层大小 | 1280 |
| 层数 | 32 |
| 图像块大小(Patch Size) | 14 |
| 预训练方法 | Web-MAE(掩码自编码器) |
| 适用任务 | 图像特征提取、视觉表征学习 |
| 项目 | 配置 |
|---|---|
| NPU 型号 | Ascend 910 / 910B |
| NPU 数量 | ≥ 1 卡 |
| 内存 | ≥ 16 GB |
| 存储 | ≥ 10 GB(模型文件约 2.5 GB) |
| 项目 | 版本 |
|---|---|
| 操作系统 | EulerOS 2.10 / CentOS 7.6 / Ubuntu 22.04 |
| CANN | ≥ 8.0 |
| Python | 3.9 / 3.10 / 3.11 |
| PyTorch | 2.9.0+cpu |
| torch_npu | 2.9.0.post1 |
| transformers | 4.57.6 |
# 激活 CANN 环境
source /usr/local/Ascend/ascend-toolkit/set_env.sh
# 安装 PyTorch
pip install torch==2.9.0 --index-url https://download.pytorch.org/whl/cpu
# 安装 torch_npu
pip install torch_npu==2.9.0.post1
# 安装 transformers 及其他依赖
pip install transformers==4.57.6 Pillow numpy scipynpu-smi info确认 NPU 状态为 OK 且温度正常。
将模型权重与配置文件放置于 model_files/ 目录下:
model_files/
├── config.json
├── preprocessor_config.json
└── model.safetensors# NPU FP32 推理
python inference.py \
--model_path ./model_files \
--device npu \
--precision fp32 \
--image path/to/your/image.jpg \
--save_features \
--output_dir ./outputs
# NPU FP16 推理
python inference.py \
--model_path ./model_files \
--device npu \
--precision fp16 \
--image path/to/your/image.jpg \
--save_features \
--output_dir ./outputspython accuracy_run.py --precision fp32 --output_dir ./logs
python accuracy_run.py --precision fp16 --output_dir ./logspython accuracy_run_perf.py --device npu --precision fp32 --output_dir ./logs
python accuracy_run_perf.py --device npu --precision fp16 --output_dir ./logspython check_accuracy_run_perf.py --precision both --output_dir ./logs| 精度模式 | 余弦相似度 | 最大差值 | 平均差值 | 判定 |
|---|---|---|---|---|
| FP32 | 1.00000322 | 2.90e-02 | 1.76e-03 | 通过 |
| FP16 | 0.99999934 | 4.59e-02 | 1.82e-03 | 通过 |
| 精度模式 | 平均延迟 | 吞吐量 | 延迟标准差 |
|---|---|---|---|
| FP32 | 15.13 ms | 66.09 img/s | 0.33 ms |
| FP16 | 14.91 ms | 67.05 img/s | 0.48 ms |
| 阶段 | 耗时 | 占比 |
|---|---|---|
| 预处理(CPU) | 1.52 ms | 9.2% |
| H2D 数据拷贝 | 0.10 ms | 0.6% |
| 纯推理(NPU) | 14.97 ms | 90.2% |
以下为本仓库脚本在 Ascend910_9362 上的真实终端输出(复制自实际运行日志)。
============================================================
Web-SSL MAE ViT-H Inference on Ascend NPU
============================================================
Device selected: npu
NPU Device: Ascend910_9362
torch version: 2.9.0+cpu
torch_npu version: 2.9.0.post1+gitee7ba04
[1/4] Loading model and processor...
Model loaded in 2.37s
[2/4] Preparing input data...
Input shape: torch.Size([1, 3, 224, 224])
[3/4] Warm-up inference...
[4/4] Running benchmark inference...
Output shape: torch.Size([1, 257, 1280])
Output dtype: torch.float32
Output mean: 0.010152
Output std: 0.479154
Output max: 6.903232
Output min: -14.741646
Average inference time (5 runs): 0.0146s
NPU Memory: allocated=2478.5MB, reserved=2584.0MB
Model load time: 2.37s
Avg inference time: 0.0146s
Web-SSL MAE ViT-H NPU inference completed successfully![INFO] Using device: npu:0
[INFO] Using precision: fp32 (torch.float32)
[INFO] Loading model from ./model_files ...
[INFO] Model loaded. Parameters: ~700M
[INFO] No image provided, using random 224x224 image
[INFO] Input shape: torch.Size([1, 3, 224, 224])
[INFO] Warmup 3 rounds ...
[INFO] Running inference ...
===== Inference Result =====
Feature shape: torch.Size([1, 257, 1280])
Feature dtype: torch.float32
Feature mean: 0.009344
Feature std: 0.479671
Feature min: -13.164553
Feature max: 7.015934
Inference latency: 15.00 ms
Throughput: 66.67 images/sec
[INFO] Features saved to ./outputs/features_npu_fp32.pt
[INFO] Metadata saved to ./outputs/meta_npu_fp32.json
[INFO] Inference completed successfully.============================================================
Web-SSL MAE ViT-H CPU vs NPU Precision Verification
============================================================
NPU Device: Ascend910_9362
torch version: 2.9.0+cpu
torch_npu version: 2.9.0.post1+gitee7ba04
[1/5] Loading model and processor...
Model loaded
[2/5] Preparing fixed input...
Input shape: torch.Size([1, 3, 224, 224])
[3/5] Running CPU inference (baseline)...
CPU output shape: (1, 257, 1280)
CPU output mean: 0.010283, std: 0.479693
[4/5] Running NPU inference...
NPU output shape: (1, 257, 1280)
NPU output mean: 0.010280, std: 0.479144
[5/5] Comparing precision...
Max absolute diff: 7.184887e-02
Mean absolute diff: 1.831753e-03
MSE: 6.355060e-06
Relative L2 error: 5.254095e-03 (0.5254%)
Cosine similarity: 0.99998784
Max relative error (|cpu| > 1e-4): 4.881472e+01
Primary metric: Relative L2 error = 0.5254% (threshold < 1%)
Secondary metric: Cosine similarity = 0.99998784 (threshold > 0.999)
✅ Precision verification PASSED
Precision verification completed!推理成功截图

精度通过截图

性能基准截图

.
├── inference.py # 标准推理脚本
├── accuracy_run.py # 精度评测脚本
├── accuracy_run_perf.py # 性能评测脚本
├── check_accuracy_run_perf.py # 验收检查脚本
├── webssl_npu_infer.py # NPU推理与性能基准脚本
├── webssl_npu_verify.py # CPU vs NPU精度验证脚本
├── webssl_npu_adaptation_report.md # 完整适配报告
├── logs/ # 评测日志与报告
│ ├── accuracy_report_fp32.json
│ ├── accuracy_report_fp16.json
│ ├── perf_report_npu:0_fp32.json
│ ├── perf_report_npu:0_fp16.json
│ └── run_*.log
├── screenshots/ # 自验证截图
│ ├── inference_success.png
│ ├── accuracy_passed.png
│ └── perf_benchmark.png
├── model_files/ # 模型权重(用户自备)
└── readme.md # 本部署文档Q1: NPU 推理时出现 LOG_WARNING
A: 检查 CANN 环境是否激活。LOG_WARNING 不影响推理功能。
Q2: FP16 为什么没有明显加速?
A: 当前 torch_npu 2.9.0 版本中部分 ViT 算子的 FP16 融合优化有限。建议后续升级 torch_npu/CANN。
Q3: 如何验证模型是否正确加载?
A: 运行 python inference.py --device npu --precision fp32,检查输出特征形状是否为 [1, 257, 1280]。
@article{fan2025scaling,
title={Scaling Language-Free Visual Representation Learning},
author={David Fan and Shengbang Tong and Jiachen Zhu and Koustuv Sinha and Zhuang Liu and Xinlei Chen and Michael Rabbat and Nicolas Ballas and Yann LeCun and Amir Bar and Saining Xie},
year={2025},
eprint={2504.01017},
archivePrefix={arXiv},
primaryClass={cs.CV}
}本文档由 Model Agent 自动生成,用于昇腾 NPU 模型适配交付。