Ascend-SACT/DeepSeekV4
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

DeepSeek V4 函数调用适配

1. 目标

本文档仅记录一条最小可复现路径:

  1. 基于原始镜像 quay.io/ascend/vllm-ascend:v0.13.0rc3-a3 新建容器
  2. 在容器内为 /vllm-workspace/vllm 打补丁
  3. 按照既定启动参数启动 dsv4 服务
  4. 验证函数调用修复是否生效

2. 输入文件

补丁文件:

/data1/lcb/xxx/dsv4_013/dsv4_function_call_live.patch

验证用启动脚本:

/data1/lcb/xxx/dsv4_013/run_ds_v4_013_fc_validation.sh

模型路径:

/mnt/share_space/xxx/models/DeepSeek-V4-Flash-w8a8

3. 从原始镜像创建容器

docker run -itd --privileged --name=ds_v4_013_fc --net=host \
  --shm-size 500g \
  --device=/dev/davinci0 \
  --device=/dev/davinci1 \
  --device=/dev/davinci2 \
  --device=/dev/davinci3 \
  --device=/dev/davinci4 \
  --device=/dev/davinci5 \
  --device=/dev/davinci6 \
  --device=/dev/davinci7 \
  --device=/dev/davinci_manager \
  --device=/dev/hisi_hdc \
  --device /dev/devmm_svm \
  -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
  -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
  -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
  -v /usr/local/sbin:/usr/local/sbin \
  -v /etc/hccn.conf:/etc/hccn.conf \
  -v /home:/home \
  -v /data1:/data1 \
  -v /data2:/data2 \
  -v /data3:/data3 \
  -v /opt:/opt \
  -v /home:/home \
  -v /mnt:/mnt \
  --entrypoint /bin/bash \
  quay.io/ascend/vllm-ascend:v0.13.0rc3-a3

建议先确认容器镜像来源:

docker inspect ds_v4_013_fc --format '{{.Config.Image}} {{.Image}}'

预期:

quay.io/ascend/vllm-ascend:v0.13.0rc3-a3 sha256:7f078ea3f8c35aee9e41b2b4a243d1bb65da72390fa5d50b5827510c577a5c31

4. 在容器内打 Patch

进入 vllm 工作树做检查:

docker exec ds_v4_013_fc bash -lc 'cd /vllm-workspace/vllm && git status --short'

先检查 patch 是否可应用:

docker exec ds_v4_013_fc bash -lc 'cd /vllm-workspace/vllm && git apply --check /data1/lcb/xxx/dsv4_013/dsv4_function_call_live.patch'

正式应用补丁:

docker exec ds_v4_013_fc bash -lc 'cd /vllm-workspace/vllm && git apply /data1/lcb/xxx/dsv4_013/dsv4_function_call_live.patch'

建议做一次静态校验:

docker exec ds_v4_013_fc bash -lc 'python3 -m py_compile \
  /vllm-workspace/vllm/vllm/config/model.py \
  /vllm-workspace/vllm/vllm/tokenizers/registry.py \
  /vllm-workspace/vllm/vllm/tool_parsers/__init__.py \
  /vllm-workspace/vllm/vllm/tool_parsers/deepseekv32_tool_parser.py \
  /vllm-workspace/vllm/vllm/tool_parsers/deepseekv4_tool_parser.py \
  /vllm-workspace/vllm/vllm/tokenizers/deepseek_v4.py \
  /vllm-workspace/vllm/vllm/tokenizers/deepseek_v4_encoding.py \
  /vllm-workspace/vllm/vllm/entrypoints/openai/serving_engine.py'

5. 启动服务

当前验证使用的启动方式是直接在容器内执行:

docker exec -d ds_v4_013_fc bash -lc '/data1/lcb/xxx/dsv4_013/run_ds_v4_013_fc_validation.sh >/data1/lcb/xxx/dsv4_013/logs/dsv4_fc_md_validation_ds_v4_013_fc_20260424_225930.log 2>&1'

这个脚本内部使用的关键环境和启动参数与验证时保持一致:

export PYTHONPATH=/vllm-workspace/vllm:/vllm-workspace/vllm-ascend:${PYTHONPATH}
export USE_MULTI_BLOCK_POOL=1
export OMP_PROC_BIND=false
export OMP_NUM_THREADS=10
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export TORCH_NPU_ALLOC_CONF=expandable_segments:True
export VLLM_USE_V1=1
export ACL_OP_INIT_MODE=1
export VLLM_VERSION=0.13.0
export VLLM_TORCH_PROFILER_DIR=./vllm_profile
export VLLM_TORCH_PROFILER_WITH_STACK=0
export TRITON_ALL_BLOCKS_PARALLEL=1

vllm serve /mnt/share_space/xxx/models/DeepSeek-V4-Flash-w8a8 \
  --host 0.0.0.0 \
  --max_model_len 65536 \
  --max-num-batched-tokens 8192 \
  --served-model-name dsv4 \
  --gpu-memory-utilization 0.9 \
  --max-num-seqs 16 \
  --data-parallel-size 1 \
  --tensor-parallel-size 8 \
  --enable-expert-parallel \
  --quantization ascend \
  --port 8006 \
  --block-size 128 \
  --chat-template /mnt/share_space/xxx/models/DeepSeek-V4-Flash-w8a8/chat_template.jinja \
  --async-scheduling \
  --trust-remote-code \
  --tokenizer-mode deepseek_v4 \
  --enable-auto-tool-choice \
  --tool-call-parser deepseek_v4 \
  --additional-config '{"enable_cpu_binding": "true", "multistream_overlap_shared_expert": true}' \
  --speculative-config '{"num_speculative_tokens": 1,"method": "deepseek_mtp"}' \
  --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}'

6. 验证成功的表现

6.1 服务启动成功

启动日志:

/data1/lcb/xxx/dsv4_013/logs/dsv4_fc_md_validation_ds_v4_013_fc_20260424_225930.log

日志中出现以下内容,表示服务已经拉起:

INFO:     Started server process [3033]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

6.2 /v1/models 返回 200

docker exec ds_v4_013_fc bash -lc 'curl --noproxy "*" -sS http://127.0.0.1:8006/v1/models'

验证成功时,返回中应包含:

{
  "id": "dsv4",
  "root": "/mnt/share_space/xxx/models/DeepSeek-V4-Flash-w8a8"
}

6.3 强制 tool_choice:function 成功

验证成功时,应满足:

  1. HTTP 200
  2. 响应中包含 tool_calls
  3. tool_calls[0].function.name 正确
  4. 参数类型按 schema 返回,不再退化成字符串

例如:

{
  "finish_reason": "stop",
  "tool_calls": [
    {
      "function": {
        "name": "add_numbers",
        "arguments": "{\"a\": 23, \"b\": 19}"
      }
    }
  ]
}

6.4 tool_choice:auto 成功

验证成功时,应满足:

  1. HTTP 200
  2. finish_reason = "tool_calls"
  3. 自动返回正确工具名

例如:

{
  "finish_reason": "tool_calls",
  "tool_calls": [
    {
      "function": {
        "name": "get_weather",
        "arguments": "{\"city\": \"北京\"}"
      }
    }
  ]
}

6.5 yidun case 成功

对于 yidun 这类审核工具的调用,验证成功时,应至少看到:

{
  "finish_reason": "tool_calls",
  "tool_calls": [
    {
      "function": {
        "name": "yidun",
        "arguments": "{\"content\": \"hello\"}"
      }
    }
  ]
}

如果使用强制命名函数调用模板,还可以进一步验证:

  • arguments.content 与目标文本完全一致

6.6 中文工具名 auto case 成功

以下请求也已在 ds_v4_013_fc 中实测通过:

{
  "model": "dsv4",
  "stream": false,
  "messages": [
    {
      "role": "system",
      "content": "使用工具返回"
    },
    {
      "role": "user",
      "content": "检测你好是敏感词吗"
    }
  ],
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "检测工具",
        "description": "An agent designed to detect user input in conversations and identify whether it contains sensitive words.",
        "parameters": {
          "properties": {
            "content": {
              "description": "待检测内容",
              "type": "string"
            }
          },
          "required": ["content"],
          "type": "object"
        }
      }
    }
  ]
}

结果文件:

/data1/lcb/xxx/dsv4_013/results/dsv4_case_chinese_tool_auto_20260425.json

验证成功时,响应中可看到:

{
  "finish_reason": "tool_calls",
  "tool_calls": [
    {
      "function": {
        "name": "检测工具",
        "arguments": "{\"content\": \"你好\"}"
      }
    }
  ]
}

说明:

  • 中文工具名 检测工具 可以命中 tool_calls
  • tool_choice = "auto" 可正常工作
  • 但在自然问句模板下,arguments.content 当前会抽取核心检测词 你好,不是整句 检测你好是敏感词吗

7. 已知剩余点

当前仍然成立的已知点:

  1. 强制 named function 调用场景下,finish_reason 仍可能是 stop
  2. 这不影响 tool_calls 本身生成
  3. tool_choice = "required" 在当前 async scheduling + speculative decoding 组合下仍可能触发 400
  4. tool_choice = "auto" 且 user 使用自然问句时,审核类工具参数可能只抽取关键词,不保证整句原样透传