1. 模型概述及场景

Nex 是一个下一代的全栈代理平台，将基础模型、合成数据管道、RL 训练、代理框架和部署工具整合在一个统一的生态系统中。 DeepSeek-V3.1-Nex-N1 是 Nex-N1 系列的旗舰版本——一个后训练模型，旨在突出代理自主性、工具使用和现实世界的生产力。我们致力于使构建和部署 AI 代理比以往任何时候都更容易，为研究人员和企业家提供高性能、可靠且经济高效的“开箱即用”代理系统。模型详细介绍：https://www.modelscope.cn/models/nex-agi/Qwen3-32B-Nex-N1/summary

2. 准备运行环境

2.1 版本配套表

硬件版本

组件	版本
硬件环境	910B（4卡）
cann 驱动	25.0.rc1.1

软件版本

本环境采用镜像安装，镜像位置 swr.cn-north-4.myhuaweicloud.com/ascend-sact/ascend-910b-ubuntu:v2.0，对应的软件版本由镜像决定，当前镜像版本中个组件版本如下：

组件	版本
OS	Ubuntu 24.04 x86_64
Python	3.11.14
ascend-toolkit	8.3.RC2
torch_npu	2.8.0
sglang	0.5.5.post3
triton-ascend	3.2.0rc4

3. 运行指导

3.1 下载镜像

命令： docker pull swr.cn-north-4.myhuaweicloud.com/ascend-sact/ascend-910b-ubuntu:v2.0

具内容可参考： https://gitcode.com/Ascend-SACT/ascend-docker

3.2 启动容器

具体内容参考：https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha003/softwareinst/instg/instg_0020.html?Mode=DockerIns&OS=openEuler&Software=cannToolKit#ZH-CN_TOPIC_0000002118090620

3.3 切换conda环境

命令：conda activate ascend-infer

3.4 检查triton组件

该模型使用sglang框，triton组件需要保持唯一性，否则启动出错，具体可参考： https://gitcode.com/Ascend-SACT/ascend-docker/issues/3

命令：pip list|grep triton

如果命令输出显示有两个triton组件（如下），需要执行卸载重新安装命令：pip uninstall triton triton-ascend -y; pip install triton-ascend

triton 3.5.0

triton-ascend 3.2.0rc4

注意：使用triton-ascend 3.2.0版本启动服务会报错，需要使用triton-ascend 3.2.0rc4，如果pip install triton-ascend安装的是 triton-ascend 3.2.0版本，需要使用 pip install triton-ascend==3.2.0rc4命令安装

3.5 下载模型

确保已安装 Git 和 Git LFS 命令：git lfs install

下载模型命令：git clone https://www.modelscope.cn/nex-agi/Qwen3-32B-Nex-N1.git

3.6 启动模型服务

设置环境变量

命令：export HCCL_INTRA_ROCE_ENABLE=1

启动模型服务

命令：SGLANG_USE_MODELSCOPE=true python3 -m sglang.launch_server --model-path ${MODEL_PATH}/Qwen3-32B-Nex-N1 --tp-size 4 --mem-fraction-static 0.8

提示如下，表明服务启动成功

warnings.warn('When enable frozen_parameter, Parameters and input tensors with immutable data_ptr ' /opt/mamba/envs/ascend-infer/lib/python3.11/site-packages/torch_npu/dynamo/torchair/_ge_concrete_graph/fx2ge_converter.py:1256: UserWarning: When enable frozen_parameter, Parameters and input tensors with immutable data_ptr marked by torch._dynamo.mark_static_address() will be considered frozen. Please make sure that the Parameters data address remain the same throughout the program runtime. warnings.warn('When enable frozen_parameter, Parameters and input tensors with immutable data_ptr '

[2026-01-20 16:56:57] INFO: 127.0.0.1:36524 - "POST /generate HTTP/1.1" 200 OK

[2026-01-20 16:56:57] The server is fired up and ready to roll!

4. 注意事项

4.1 triton版本问题导致服务启动不成功

如果triton-ascend 3.2.0版本，则服务不能启动，提示 File "/opt/mamba/envs/ascend-infer/lib/python3.11/site-packages/triton/runtime/jit.py", line 353, in return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/mamba/envs/ascend-infer/lib/python3.11/site-packages/triton/runtime/jit.py", line 660, in run kernel = self.compile( ^^^^^^^^^^^^^ File "/opt/mamba/envs/ascend-infer/lib/python3.11/site-packages/triton/compiler/compiler.py", line 320, in compile raise MLIRCompilationError(stage_name, error_detail) triton.compiler.errors.MLIRCompilationError: ///------------------[ERROR][Triton][BEG]------------------ [ConvertLinalgRToBinary] encounters error: bishengir-compile: for the --target option: Cannot find option named 'Ascend910B2C'!

[INFO]: The compiled kernel cache is in /root/.triton/cache/lTAthGiXzBSU6dTKTbV4p_T42f_XwKPHkxD0xDa1--s

4. 2 多节点部署

sglang通过"--dist-init-addr"、"--nnodes"和"--node-rank"参数实现多节点部署支持。

参数	介绍
--dist-init-addr	主节点地址
--nnodes	节点总数
--node-rank	当前节点的编号(主节点为0，其他依次递增)
详细信息参见sglang文档：https://docs.sglang.io/advanced_features/server_arguments.html

4.2.1 各节点环境准备

每个节点都用第三节中的方式准备好sglang环境

4.2.2 主节点服务拉起

可在多个节点中选取一个节点作为主节点，获取其IP地址以便跨节点拉起服务时使用。命令为(双节点为例)：

export HCCL_INTRA_ROCE_ENABLE=1
RANK_ID=0
NODE_NUM=2
MASTER_ADDR=10.244.132.124:20000
SGLANG_USE_MODELSCOPE=true python3 -m sglang.launch_server --model-path ${MODEL_PATH}/Qwen3-32B-Nex-N1 --tp-size 4 --dist-init-addr=$MASTER_ADDR --nnodes $NODE_NUM --node-rank $RANK_ID --host 0.0.0.0 --port 8000 --mem-fraction-static 0.8

待从节点从运行相关命令后，服务即可拉起。

4.2.3 从节点服务拉起

从节点与主节点运行的命令只有rank id存在差异, 其他无差异。命令为(双节点为例)：

export HCCL_INTRA_ROCE_ENABLE=1
RANK_ID=1
NODE_NUM=2
MASTER_ADDR=10.244.132.124:20000
SGLANG_USE_MODELSCOPE=true python3 -m sglang.launch_server --model-path ${MODEL_PATH}/Qwen3-32B-Nex-N1 --tp-size 4 --dist-init-addr=$MASTER_ADDR --nnodes $NODE_NUM --node-rank $RANK_ID --host 0.0.0.0 --port 8000 --mem-fraction-static 0.8

最终服务拉起后，主节点显示信息与上面单节点显示信息一致。