UNet 是一种经典的编码器-解码器结构(Encoder-Decoder)的深度学习模型,最初为医学图像分割设计,凭借其出色的像素级预测能力,在智驾领域被广泛应用于语义分割、车道线检测、可行驶区域分割等场景。
| 组件 | 版本 |
|---|---|
| Python | 3.11 |
| PyTorch | 2.5.1 |
| torch_npu | 2.5.1.post1.dev20250722 |
| CANN | cann_8.2.rc1 |
| 设备型号 | NPU 配置 |
|---|---|
| Atlas 800T A3 | 单卡 / 多卡(0~15) |
| 镜像环境 | 镜像地址 |
|---|---|
| 公网 | swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 |
docker run -itd -u root \
--privileged \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci8 \
--device=/dev/davinci9 \
--device=/dev/davinci10 \
--device=/dev/davinci11 \
--device=/dev/davinci12 \
--device=/dev/davinci13 \
--device=/dev/davinci14 \
--device=/dev/davinci15 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/bin/hccn_tool:/usr/bin/hccn_tool \
-v /etc/hccn.conf:/etc/hccn.conf \
--shm-size 1024g --net=host \
-v <host_dir>:<container_dir> \
--name <container_name> <image_id> /bin/bashdocker exec -it unet bash
conda create -n unet --clone PyTorch-2.5.1
conda activate unet为避免依赖下载失败或速度过慢,建议统一使用 华为内部 PyPI 镜像源:
pip config --user set global.index https://mirrors.huaweicloud.com/repository/pypi
pip config --user set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple
pip config --user set global.trusted-host mirrors.huaweicloud.comcd /home/ma-user/
git clone https://github.com/milesial/Pytorch-UNet.git pip install -r requirements.txt执行如下脚本下载数据集
bash scripts/download_data.shimport torch_npu
from torch_npu.contrib import transfer_to_npu
Traceback (most recent call last):
File "/home/ma-user/anaconda3/envs/socc/lib/python3.9/site-packages/requests/adapters.py", line 589, in send
resp = conn.urlopen(
File "/home/ma-user/anaconda3/envs/socc/lib/python3.9/site-packages/urllib3/connectionpool.py", line 841, in urlopen
retries = retries.increment(
File "/home/ma-user/anaconda3/envs/socc/lib/python3.9/site-packages/urllib3/util/retry.py", line 535, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.wandb.ai', port=443): Max retries exceeded with url: /graphql (Caused by ProxyError('Unable to connect to proxy', OSError('Tunnel connection failed: 407 Proxy Authentication Required')))
During handling of the above exception, another exception occurred:将wandb修改为线下模式,在train.py中添加如下代码
# 将wandb改用线下模式
os.environ["WANDB_MODE"] = "offline" # 或 "disabled" File "/home/ma-user/Pytorch-UNet/train.py", line 100, in train_model
images = images.to(device=device, dtype=torch.float32, memory_format=torch.channels_last)
File "/home/ma-user/anaconda3/envs/socc/lib/python3.9/site-packages/torch_npu/contrib/transfer_to_npu.py", line 151, in decorated
return fn(*args, **kwargs)
RuntimeError: Only c10::MemoryFormat::Contiguous is supported for creating a npu tensor
[ERROR] 2026-01-28-10:37:09 (PID:95357, Device:0, RankID:-1) ERR01007 OPS feature not supported华为昇腾 NPU(Ascend)目前仅支持 contiguous_format 和 preserve_format,因此还需要修改如下2个文件: evaluate.py 修改前
image = image.to(device=device, dtype=torch.float32, memory_format=torch.channels_last)修改后
image = image.to(device=device, dtype=torch.float32)train.py 修改前
images = images.to(device=device, dtype=torch.float32, memory_format=torch.channels_last)修改后
images = images.to(device=device, dtype=torch.float32)# 指定4卡进行训练
export ASCEND_RT_VISIBLE_DEVICES=4
python train.py| 硬件 | 卡数 | 性能 |
|---|---|---|
| 910C | 1 | 12.63 img/s |