本文档描述了如何手动安装vllm-ascend。
有以下两种安装方法:
安装前,需确保固件、驱动及CANN已正确安装,详情参考《快速安装昇腾环境 — 昇腾开源》文档 (ascend.github.io)。
为验证Ascend NPU固件和驱动是否正常安装,执行以下命令: npu-smi info 更多细节请参考《快速安装昇腾环境 — 昇腾开源》文档 (ascend.github.io)。
配置软件环境最简单的方法是直接使用CANN的镜像:
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/cann:|cann_image_tag|
docker run --rm \
--name vllm-ascend-env \
--shm-size=1g \
--device $DEVICE \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it $IMAGE bash以上安装完成,便可以开始安装vllm和vllm-ascend。
首先安装系统依赖和确认pip镜像。
# Using apt-get with mirror
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
apt-get update -y && apt-get install -y gcc g++ cmake libnuma-dev wget git curl jq
# Or using yum
# yum update -y && yum install -y gcc g++ cmake numactl-devel wget git curl jq
# Config pip mirror
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple[可选] 如果你的cpu是x86架构的,那么可以使用以下pip源:
# For torch-npu dev version or x86 machine
pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"也可以通过以下命令安装:
# Install vllm-project/vllm. The newest supported version is v0.11.0.
pip install vllm==0.11.0
# Install vllm-project/vllm-ascend from pypi.
pip install vllm-ascend==0.11.0rc3vllm-ascend 提供了Docker 镜像。可以直接拉取预置的镜像进行安装部署。
# Update --device according to your device (Atlas A2: /dev/davinci[0-7] Atlas A3:/dev/davinci[0-15]).
# Update the vllm-ascend image according to your environment.
# Note you should download the weight to /root/.cache in advance.
export IMAGE=quay.io/ascend/vllm-ascend:|vllm_ascend_version|
docker run --rm \
--name vllm-ascend-env \
--shm-size=1g \
--net=host \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-it $IMAGE bash默认的工作目录是 /workspace,vLLM 和 vLLM Ascend 的代码被放置在 /vllm-workspace。
1) 下载权重 在ModelScope上下载权重: www.modelscope.cn
2) 准备服务端脚本
export ASCEND_RT_VISIBLE_DEVICES=4
vllm serve {your model path} --port 8080 --task score --served-model-name qwen3-reranker-8b3)准备客户端脚本
curl -X ‘POST’ http://localhost:8080/v1/rerank \
-H “accept: application/json” \
-H “Content-Type: application/json” \
-d ‘{
“model”: “qwen3-reranker-8b”,
“query”: “What is the capital of Brasilia?”,
“documents”: [
“The capital of Brazil is Brasilia.”,
“The capital of France is Paris.”
“Horses and cows are both animals”
]
}’