加载docker镜像
启动容器
docker run --privileged \
--name glm41v_int8 \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci4 \
--device /dev/davinci5 \
--device /dev/davinci6 \
--device /dev/davinci7 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
--network host \
-v /dev/shm:/dev/shm \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64:/usr/local/Ascend/driver/lib64 \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /home:/home \
-it glm4.1v:1017 /bin/bash可仿照
-v /home:/home \,新增需要挂载的宿主机目录
/home/glm41v_int8,并将其加入白名单:mkdir /home/glm41v_int8
export HUB_WHITE_LIST_PATHS=/home/glm41v_int8然后,执行以下python程序,将模型权重下载至/home/glm41v_int8
from openmind_hub import snapshot_download
snapshot_download(
repo_id="MindSpore-Lab/GLM-4.1V-9B-Thinking-golden-stick-8bit",
local_dir="/home/glm41v_int8",
local_dir_use_symlinks=False
)export VLLM_MS_MODEL_BACKEND=Native
export ASCEND_TOTAL_MEMORY_GB=40
export MS_ENABLE_LCCL=off
export MS_ENABLE_INTERNAL_BOOST=off
export ASCEND_RT_VISIBLE_DEVICES=6,7 # 设置所占用的300I卡
export MS_ALLOC_CONF=enable_vmm:true
export ASCEND_CUSTOM_OPP_PATH=/usr/local/python3.11.13/lib/python3.11/site-packages/ms_custom_ops/vendors/customize/
# 可修改`--port`所指定的端口号(默认为8140)、`--tensor-parallel-size`所指定的TP并行数量(默认为2)
vllm-mindspore serve /home/glm41v_int8/ --port 8140 --limit_mm_per_prompt='{"video":"0"}' --disable-mm-preprocessor-cache --disable-log-requests --disable-uvicorn-access-log --tensor-parallel-size 2 --gpu-memory-utilization 0.90 --max-num-batched-tokens 32768 --block_size 128 --quantization smoothquant > log.txt 2>&1 &可以通过tail -f log.txt命令查看启动进度,当显示以下信息时,已启动成功:
INFO: Waiting for application startup.
INFO: Application startup complete.60.10.230.191服务器为例,查看服务端口可访问性curl http://60.10.230.191:8140/v1/models正常其将返回如下信息
{"object":"list","data":[{"id":"/home/glm41v_int8/","object":"model","created":1760782194,"owned_by":"vllm","root":"/home/glm41v_int8/","parent":null,"max_model_len":65536,"permission":[{"id":"modelperm-7552ce4e0f2d4f5888ee9775794f068a","object":"model_permission","created":1760782194,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}然后,发送服务测试请求:
curl http://60.10.230.191:8140/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "/home/glm41v_int8/",
"prompt": "456*123等于多少?<think>\n",
"max_tokens": 1024,
"temperature": 0
}'其将返回:
{"id":"cmpl-39f6d51403f84f4ab53a11db0f3590c3","object":"text_completion","created":1758510912,"model":"/home/glm41v_int8/","choices":[{"index":0,"text":"<think>用户现在需要计算456乘以123的结果。首先,我们可以用竖
式乘法或者分步计算的方法。先分解一下,456×123可以拆成456×(100+20+3),然后分别计算456×100、456×20、456×3,再把结果相加。\n\n首先计算456×100,就是45600;然后456×20,20是2×10,所以456×2=912,再乘以10就是9120;接下
来456×3,3×456,3×400=1200,3×50=150,3×6=18,加起来1200+150=1350+18=1368。然后把这些结果加起来:45600 + 9120 = 54720,然后54720 + 1368 = 56088? 等等,这里可能算错了,再检查一遍。\n\n或者用竖式计算:先列竖式,
123\n×456\n先算123×6,6×6=36,写6进3;6×5=30+3=33,写3进3;6×4=24+3=27,所以123×6=738。然后123×50(因为456的50是5×10),123×5=615,后面加一个0,所以6150。然后123×400(456的400是4×100),123×4=492,后面加两个0,
所以49200。然后把这些加起来:738 + 6150 = 6888,然后6888 + 49200 = 56088。对,刚才分步计算的时候可能加错了,现在竖式计算确认是56088。所以456×123等于56088。</think><answer>要计算 \$ 456 \\times 123 \$,可通过*
*竖式乘法**或**分步拆分计算**完成,以下是详细过程: \n\n\n### 方法一:竖式乘法 \n将 \$ 123 \$ 按位拆分为百位、十位、个位,分别与 \$ 456 \$ 相乘后累加: \n1. 先计算 \$ 123 \\times 6 \$(个位): \n \
$ 6 \\times 6 = 36 \$(个位写 6,向十位进 3), \n \$ 6 \\times 5 = 30 + 3 = 33 \$(十位写 3,向百位进 3), \n \$ 6 \\times 4 = 24 + 3 = 27 \$(百位写 27), \n 所以 \$ 123 \\times 6 = 738 \$
。 \n\n2. 再计算 \$ 123 \\times 50 \$(十位,注意乘后加 1 个 0): \n 先算 \$ 123 \\times 5 = 615 \$,再在末尾加 1 个 0,得 \$ 6150 \$。 \n\n3. 最后计算 \$ 123 \\times 400 \$(百位,注意乘后加 2 个
0): \n 先算 \$ 123 \\times 4 = 492 \$,再在末尾加 2 个 0,得 \$ 49200 \$。 \n\n4. 累加所有结果: \n \$ 738 + 6150 = 6888 \$, \n \$ 6888 + 49200 = 56088 \$。 \n\n\n### 方法二:分步拆分计算
\n将 \$ 123 \$ 拆分为 \$ 100 + 20 + 3 \$,分别与 \$ 456 \$ 相乘后累加: \n- \$ 456 \\times 100 = 45600 \$ \n- \$ 456 \\times 20 = 456 \\times 2 \\times 10 = 912 \\times 10 = 9120 \$ \n- \$ 456 \
\times 3 = 1368 \$(计算过程:\$ 3 \\times 6 = 18 \$,\$ 3 \\times 50 = 150 \$,\$ 3 \\times 400 = 1200 \$,累加得 \$ 1200 + 150 + 18 = 1368 \$) \n\n将结果累加: \n\$ 45600 + 9120 = 54720 \$, \n
\$ 54720 + 1368 = 56088 \$。 \n\n\n综上,\$ 456 \\times 123 = 56088 \$。 \n最终答案: \n<|begin_of_box|>56088<|end_of_box|></answer>","logprobs":null,"finish_reason":"length","stop_reason":null,"prompt_l
ogprobs":null}],"usage":{"prompt_tokens":9,"total_tokens":1033,"completion_tokens":1024,"prompt_tokens_details":null},"kv_transfer_params":null}