RAFT-Stereo是由普林斯顿大学(Princeton University)于2021年提出的高性能立体匹配(Stereo Matching)深度学习模型,是经典光流模型RAFT(Recurrent All-Pairs Field Transforms)在双目视觉领域的成功扩展。它不是专为智能驾驶设计的端到端模型,但因其高精度、强鲁棒性,被广泛应用于自动驾驶、机器人导航、3D重建、电力巡检等需要稠密深度估计的场景。
| 组件 | 版本 |
|---|---|
| Python | 3.11 |
| PyTorch | 2.5.1 |
| torch_npu | 2.5.1.post1.dev20250722 |
| CANN | cann_8.2.rc1 |
| 设备型号 | NPU 配置 |
|---|---|
| Atlas 800T A3 | 单卡 / 多卡(0~15) |
| 镜像环境 | 镜像地址 |
|---|---|
| 公网 | swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 |
docker run -itd -u root \
--privileged \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci8 \
--device=/dev/davinci9 \
--device=/dev/davinci10 \
--device=/dev/davinci11 \
--device=/dev/davinci12 \
--device=/dev/davinci13 \
--device=/dev/davinci14 \
--device=/dev/davinci15 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/bin/hccn_tool:/usr/bin/hccn_tool \
-v /etc/hccn.conf:/etc/hccn.conf \
--shm-size 1024g --net=host \
-v <host_dir>:<container_dir> \
--name <container_name> <image_id> /bin/bashdocker exec -it raftstereo bash
conda create -n raftstereo --clone PyTorch-2.5.1
conda activate raftstereo为避免依赖下载失败或速度过慢,建议统一使用 华为内部 PyPI 镜像源:
pip config --user set global.index https://mirrors.huaweicloud.com/repository/pypi
pip config --user set global.index-url https://mirrors.huaweicloud.com/repository/pypi/simple
pip config --user set global.trusted-host mirrors.huaweicloud.comcd /home/ma-user/
https://github.com/princeton-vl/RAFT-Stereo.git
cd RAFT-Stereo
pip install opt_einsum执行如下命令下载数据集
chmod ug+x download_middlebury_2014.sh && ./download_middlebury_2014.sh下载完成后数据集目录格式如下,当前只下载middlebury数据集:
├── datasets
├── FlyingThings3D
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Monkaa
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── Driving
├── frames_cleanpass
├── frames_finalpass
├── disparity
├── KITTI
├── testing
├── training
├── devkit
├── Middlebury
├── MiddEval3
├── ETH3D
├── two_view_testing
import torch_npu
from torch_npu.contrib import transfer_to_npu File "/home/ma-user/anaconda3/envs/raftstereo/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 33, in warn_imbalance
values = [get_prop(props) for props in dev_props]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ma-user/anaconda3/envs/raftstereo/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 33, in <listcomp>
values = [get_prop(props) for props in dev_props]
^^^^^^^^^^^^^^^
File "/home/ma-user/anaconda3/envs/raftstereo/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 45, in <lambda>
if warn_imbalance(lambda props: props.multi_processor_count):
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'torch_npu._C._NPUDeviceProperties' object has no attribute 'multi_processor_count'
[ERROR] 2026-01-29-10:06:47 (PID:2668507, Device:0, RankID:-1) ERR99999 UNKNOWN applicaiton exception由于当前NPU不支持nn.DataParallel,需要适配为DistributedDataParallel 参考如下官方链接https://www.hiascend.com/document/detail/zh/Pytorch/730/ptmoddevg/trainingmigrguide/PT_LMTMOG_0030.html
修改前
model = nn.DataParallel(RAFTStereo(args))修改后
dist.init_process_group(backend="hccl")
local_rank = int(os.environ.get("LOCAL_RANK", 0))
torch.npu.set_device(f"npu:{local_rank}")
model = RAFTStereo(args).to(f"npu:{local_rank}")
model = nn.parallel.DistributedDataParallel(model, device_ids=[local_rank],broadcast_buffers=False)参考模型官网链接https://github.com/princeton-vl/RAFT-Stereo?tab=readme-ov-file下载对应的权重文件
# 指定0卡进行训练
export ASCEND_RT_VISIBLE_DEVICES=0
export RANK=0
export WORLD_SIZE=1
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=10000
python train_stereo.py --train_datasets middlebury_2014 --num_steps 4000 --image_size 384 1000 --lr 0.00002 --restore_ckpt models/raftstereo-sceneflow.pth --batch_size 2 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --mixed_precision| 硬件 | 卡数 | 性能 |
|---|---|---|
| 910C | 1 | 3.75 秒/迭代 |