本目录提供的是 MiniCPM-V-4.6 在 vLLM-Ascend 上运行所需的 patch 交付物:
vllm-minicpmv46.patchvllm-ascend-minicpmv46.patch其中:
vllm-minicpmv46.patch
vllmvllm-ascend-minicpmv46.patch
vllm-ascend基础镜像:
quay.io/ascend/vllm-ascend:v0.19.1rc1环境说明:
transformers 升级到 5.7.0transformers升级命令:
pip install --upgrade "transformers[torch]==5.7.0"准备变量:
PATCH_DIR=/path/to/minicpm_v4.5/patch
VLLM_REPO=/vllm-workspace/vllm
VLLM_ASCEND_REPO=/vllm-workspace/vllm-ascend
MODEL_PATH=/path/to/MiniCPM-V-4.6应用 vllm 补丁:
cd "$VLLM_REPO"
git apply --check "$PATCH_DIR/vllm-minicpmv46.patch"
git apply "$PATCH_DIR/vllm-minicpmv46.patch"应用 vllm-ascend 补丁:
cd "$VLLM_ASCEND_REPO"
git apply --check "$PATCH_DIR/vllm-ascend-minicpmv46.patch"
git apply "$PATCH_DIR/vllm-ascend-minicpmv46.patch"如需回退已应用补丁,可执行:
cd "$VLLM_ASCEND_REPO"
git apply -R --check "$PATCH_DIR/vllm-ascend-minicpmv46.patch"
git apply -R "$PATCH_DIR/vllm-ascend-minicpmv46.patch"
cd "$VLLM_REPO"
git apply -R --check "$PATCH_DIR/vllm-minicpmv46.patch"
git apply -R "$PATCH_DIR/vllm-minicpmv46.patch"MiniCPM-V-4.6 单逻辑 NPUcd /workspace
ASCEND_RT_VISIBLE_DEVICES=0 HCCL_OP_EXPANSION_MODE=AIV \
vllm serve "$MODEL_PATH" \
--served-model-name MiniCPM-V-4.6 \
--trust-remote-code \
--dtype bfloat16 \
--limit-mm-per-prompt '{"image":4,"video":1}' \
--port 8000 \
--compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'--limit-mm-per-prompt '{"image":4,"video":1}' 表示单个 prompt 最多允许 4 张图片和 1 个视频。实际上限仍受上下文长度、processor token budget 和显存约束。
服务健康检查:
curl -sS http://127.0.0.1:8000/health -w '\nHTTP %{http_code}\n'预期返回 HTTP 200。
服务 ready 检查:
curl -sS http://127.0.0.1:8000/v1/models预期返回中应包含:
id: MiniCPM-V-4.6root: 当前 MODEL_PATHmax_model_len: 模型默认上下文长度文本生成验证:
curl -sS http://127.0.0.1:8000/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model":"MiniCPM-V-4.6",
"messages":[
{"role":"user","content":"用中文简短回答:2+3等于几?"}
],
"temperature":0,
"max_tokens":32
}'--trust-remote-code 需要保留,否则模型侧自定义配置和 processor 可能无法加载。image:4 限制下会返回 HTTP 400;当前错误路径可能在服务端日志中打印 traceback,但客户端限制生效。