Ascend-SACT/ViT
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

环境信息:

npu-smi info
+--------------------------------------------------------------------------------------------------------+
| npu-smi 25.3.rc1.b030                            Version: 25.3.rc1.b030                                |
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Name                  | Health          | Power(W)     Temp(C)           Hugepages-Usage(page) |
| Chip    Device                | Bus-Id          | AICore(%)    Memory-Usage(MB)                        |
+===============================+=================+======================================================+
| 2       310P3                 | OK              | NA           64                0     / 0             |
| 0       0                     | 0000:01:00.0    | 0            1556 / 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 2       310P3                 | OK              | NA           63                0     / 0             |
| 1       1                     | 0000:01:00.0    | 0            1413 / 43693                            |
+===============================+=================+======================================================+
| 5       310P3                 | OK              | NA           73                0     / 0             |
| 0       2                     | 0000:81:00.0    | 0            1483 / 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 5       310P3                 | OK              | NA           72                0     / 0             |
| 1       3                     | 0000:81:00.0    | 0            1481 / 43693                            |
+===============================+=================+======================================================+
| 6       310P3                 | OK              | NA           74                0     / 0             |
| 0       4                     | 0000:82:00.0    | 0            1466 / 44280                            |
+-------------------------------+-----------------+------------------------------------------------------+
| 6       310P3                 | OK              | NA           71                0     / 0             |
| 1       5                     | 0000:82:00.0    | 0            1500 / 43693                            |
+===============================+=================+======================================================+
+-------------------------------+-----------------+------------------------------------------------------+
| NPU     Chip                  | Process id      | Process name             | Process memory(MB)        |
+===============================+=================+======================================================+


lscpu
架构:                    aarch64
  CPU 运行模式:          64-bit
  字节序:                Little Endian
CPU:                      96
  在线 CPU 列表:         0-95
厂商 ID:                 HiSilicon
  BIOS Vendor ID:         HiSilicon
  型号名称:              Kunpeng-920
    BIOS Model name:      HUAWEI Kunpeng 920 5250
    型号:                0
    每个核的线程数:      1
    每个座的核数:        48
    座:                  2
    步进:                0x1
    Frequency boost:      disabled
    CPU 最大 MHz:        2600.0000
    CPU 最小 MHz:        200.0000
    BogoMIPS:            200.00
    标记:                fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
Caches (sum of all):
  L1d:                    6 MiB (96 instances)
  L1i:                    6 MiB (96 instances)
  L2:                     48 MiB (96 instances)
  L3:                     96 MiB (4 instances)
NUMA:
  NUMA 节点:             4
  NUMA 节点0 CPU:        0-23
  NUMA 节点1 CPU:        24-47
  NUMA 节点2 CPU:        48-71
  NUMA 节点3 CPU:        72-95
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Not affected
  Spectre v1:             Mitigation; __user pointer sanitization
  Spectre v2:             Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected
CANN版本:8.2.RC1
用到的镜像:mindie:2.1.RC1.B152-300I-Duo-py3.11-openeuler24.03-lts-aarch64

1.检查当前宿主机环境是否有NPU卡

##输入npu-smi info,若有卡的信息则表示有NPU卡
npu-smi info

2.检查当前环境是否通网

##通过能否curl通百度来确定
curl www.baidu.com

##若不能curl通则需要配置代理,cntlm代理的配置参考https://3ms.huawei.com/km/blogs/details/21430554
##例如:
CCW_HOST_IP=90.254.50.56
export http_proxy="http://${CCW_HOST_IP}:3128"
export https_proxy=${http_proxy}
export ftp_proxy=${http_proxy}
export GIT_SSL_NO_VERIFY=true

##其中的CCW_HOST_IP需要更换成VPN的地址

3.检查是否已经安装了docker

##输入docker -h可以检查
docker -h

##若没有docker则需要安装
yum install docker

4.获取mindie的docker镜像

##以A3的openeuler镜像为例
docker pull mindie:2.1.RC1.B152-300I-Duo-py3.11-openeuler24.03-lts-aarch64

##通过docker images来查看已经pull的镜像
####
[root@localhost ~]# docker images

5.创建容器并进入

export IMAGE=mindie:dev-2.2.RC1.B110-300I-Duo-py311-ubuntu22.04-aarch64 && docker run --privileged -u root --name mindie-ljh-1014 --device /dev/davinci4 --device /dev/davinci5 --device /dev/davinci6 --device /dev/davinci7 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -v /etc/ascend_install.info:/etc/ascend_install.info -v /root/.cache:/root/.cache -p 18050:8080 -it $IMAGE bash

5.安装依赖包

yum install -y git
yum install -y patch
yum install -y unzip
yum install -y mesa-libGL
yum install gcc-c++

6.获取本仓源码

##自己创建一个文件夹 比如/home/ljh,并cd /home/ljh

git clone https://gitee.com/ascend/ModelZoo-PyTorch.git
cd ModelZoo-PyTorch/ACL_PyTorch/built-in/cv/ViTDet_for_Pytorch

###若遇到git访问权限问题可以通过如下命令解决
git config --global --unset http.proxy
git config --global --unset https.proxy

7获取模型仓mmdetection源码和依赖仓mmengine源码,并安装相关依赖

git clone https://github.com/open-mmlab/mmdetection.git
git clone https://github.com/open-mmlab/mmengine.git



cd mmdetection
git reset --hard cfd5d3a985b0249de009b67d04f37263e11cdf3d
pip3 install -r requirements.txt

cd ../mmengine
git reset --hard 390ba2fbb272816adfd2883642326d0fd0ca6049
pip3 install -r requirements.txt
cd ..


##从源码安装mmcv
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout v2.2.0
pip install -r requirements.txt
MMCV_WITH_OPS=1 pip install -e .

若遇到AttributeError: type object 'Callable' has no attribute '_abc_registry'报错则可通过下面命令解决:

rm -rf /usr/local/lib/python3.11/site-packages/typing.py
rm -rf /usr/local/lib/python3.11/site-packages/typing-*

# 确保用系统对应的 python3.11 来执行
python3.11 -m pip uninstall -y typing
python3.11 -m pip uninstall -y typing_extensions

若遇到Using cached fastmcp-2.9.0-py3-none-any.whl.metadata (17 kB)安装特别慢或aim<=3.17.5安装失败等错误可以先注释以下代码:

然后再通过pip install 单独安装

vim requirements/tests.txt

#aim<=3.17.5;sys_platform!='win32'
bitsandbytes
clearml
coverage
dadaptation
dvclive
lion-pytorch
lmdb
#mlflow
parameterized
pydantic==1.10.9
pytest
transformers

8 转移文件位置

cp mmengine.patch mmengine/mmengine/
cp mmdet.patch mmdetection/
cp infer.py mmdetection/

9.更换当前路径并打补丁,修改完mmseg源码后进行安装

cd mmengine/mmengine/
patch -p2 < mmengine.patch
pip install -v -e ..

cd ../../mmdetection
patch -p1 < mmdet.patch
  • 在执行 pip install -v -e .. 时遇到 ImportError: cannot import name 'Logger' from partially initialized module 'logging' (most likely due to a circular import) 错误的解决办法
cd ViTDet_for_Pytorch/mmengine
pip install -v -e .

10 下载数据集与权重

###下载权重文件并放到mmdetection/ckpt下
cd mmdetection
mkdir ckpt
cd ckpt
wget --no-check-certificate https://download.openmmlab.com/mmdetection/v3.0/vitdet/vitdet_mask-rcnn_vit-b-mae_lsj-100e/vitdet_mask-rcnn_vit-b-mae_lsj-100e_20230328_153519-e15fe294.pth

###下载数据集
cd ..
mkdir data
cd data
mkdir coco
cd coco
# 下载验证集图片
wget http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip
rm val2017.zip

# 下载标注
wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip annotations_trainval2017.zip
rm annotations_trainval2017.zip

11模型推理

在310P的环境上推理前需修改以下代码:

由于当前numpy的版本较高,直接推理会报错,需要降低numpy的版本:

pip uninstall numpy
pip install numpy==1.24.4
pip install future tensorboard
source /usr/local/Ascend/ascend-toolkit/set_env.sh

python infer.py --cfg projects/ViTDet/configs/vitdet_mask-rcnn_vit-b-mae_lsj-100e.py --ckpt "ckpt/vitdet_mask-rcnn_vit-b-mae_lsj-100e_20230328_153519-e15fe294.pth" --warm_up_times 1 
  • 若遇到ImportError: /usr/local/lib64/python3.11/site-packages/torch_npu/lib/libtorch_npu.so: undefined symbol: ZNK5torch8autograd4Node4nameEv报错则是因为torch和torch_npu版本不匹配导致的:
##首先卸载torch等
pip uninstall torch torchvision torch_npu

##然后重新安装torch 等
pip install torch==2.4.0 torch_npu==2.4.0post4 torchvision==0.19.0 torchaudio==2.4.0
##参考:https://www.hiascend.com/document/detail/zh/Pytorch/710/releasenote/releasenote_0003.html
  • 配置本地代理
CCW_HOST_IP=141.4.102.160 #  141.4.124.251
export http_proxy="http://${CCW_HOST_IP}:3128"
export https_proxy=${http_proxy}
export ftp_proxy=${http_proxy}
export GIT_SSL_NO_VERIFY=true

参考教程:gitee.com

  • 创建容器的sh脚本:
IMAGES_ID=$1
NAME=$2
if [ $# -ne 2 ]; then
    echo "error: need one argument describing your container name."
    exit 1
fi
docker run --name ${NAME} -it -d --net=host --shm-size=500g \
    --privileged=true \
    -w /home \
    --device=/dev/davinci_manager \
    --device=/dev/hisi_hdc \
    --device=/dev/devmm_svm \
    --entrypoint=bash \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/sbin:/usr/local/sbin \
    -v /home:/home \
    -v /tmp:/tmp \
    -v /usr/share/zoneinfo/Asia/Shanghai:/etc/localtime \
    -e http_proxy=$http_proxy \
    -e https_proxy=$https_proxy \
    ${IMAGES_ID}

bash start-docker.sh 5718854ec902 dev2_rc1_b010

5718854ec902 为IMAGES_ID,dev2_rc1_b010为NAME

  • 跳过证书验证

pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org -r requirements.txt

或者 pip3 install *** -i https://pypi.tuna.tsinghua.edu.cn/simple

  • pip 安装mmcv后还是有报错,找不到mmcv.__ext的解决办法:

3ms.huawei.com

报错解决办法:降级numpy包

  • 永久修改 pip 源(推荐):
mkdir -p ~/.pip
vim ~/.pip/pip.conf

[global]
index-url = https://pypi.tuna.tsinghua.edu.cn/simple
timeout = 6000

[install]
trusted-host = pypi.tuna.tsinghua.edu.cn
  • 在NPU环境上从源码编译mmcv步骤:

从源码编译 MMCV — mmcv 2.0.1 文档

  • 910B上跑的结果:

910B上将pse设置为None跑出来的结果:(相比较上面下降30%)