权重下载
下载权重后,修改权重路径下的config.json文件,将"model_type": "chatglm"进行调整;同时新增键值对:"_name_or_path": "THUDM/glm-4-9b-chat",修改后的示例如下:
{
"_name_or_path": "THUDM/glm-4-9b-chat", // 增加键值对,不管是什么模型都是这个 key
"architectures": ["Glm4ForCausalLM"],
"attention_bias": false,
"attention_dropout": 0.0,
"eos_token_id": [151329, 151336, 151338],
"head_dim": 128,
"hidden_act": "silu",
"hidden_size": 6144,
"initializer_range": 0.02,
"intermediate_size": 23040,
"max_position_embeddings": 131072,
"model_type": "chatglm", // 从 glm4 改为 chatglm
"num_attention_heads": 48,
"num_hidden_layers": 61,
"num_key_value_heads": 8,
"pad_token_id": 151329,
"partial_rotary_factor": 0.5,
"rms_norm_eps": 1e-5,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.52.0.dev0",
"use_cache": true,
"vocab_size": 151552
}下载镜像并加载:
wget https://mindx.sdk.obs.cn-north-4.myhuaweicloud.com/MindIE/docker/mindie_2.0.T3-20250417-800I-A2-py311-openeuler24.03-lts-aarch64.tar.gz --no-check-certificate
docker load -i ./mindie_2.0.T3-20250417-800I-A2-py311-openeuler24.03-lts-aarch64.tar.gz1台Atlas 800I A2服务器目前提供的 MindIE 镜像预置了 GLM-4-9B-0414 系列模型推理脚本,无需再额外下载模型适配代码,直接新建容器即可。
如果您使用的是 root 用户镜像(例如从 Ascend Hub 上取得),并且可以使用特权容器,请使用以下命令启动容器:
docker run -it -d --net=host --shm-size=1g \
--privileged \
--name <container-name> \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /path-to-weights:/path-to-weights:ro \
<IMAGE ID> bash如果您希望使用自行构建的普通用户镜像,并且规避容器相关权限风险,可以使用以下命令指定用户与设备:
docker run -it -d --net=host --shm-size=1g \
--name <container-name> \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver:ro \
-v /usr/local/sbin:/usr/local/sbin:ro \
-v /path-to-weights:/path-to-weights:ro \
<IMAGE ID> bash更多镜像使用信息请参考官方镜像仓库文档。
由于 GLM-4-0414 系列模型依赖最新版本 transformers,因此需要更改容器内 transformers 版本并使其兼容 PyTorch 2.1.0。
首先进入容器:
docker exec -it ${容器名称} bash下载源码:
git clone https://github.com/huggingface/transformers.git将 transformers/utils/generic.py 中约 355 行的 from torch.utils._pytree import register_pytree_node 修改为 from torch.utils._pytree import _register_pytree_node,以确保高版本 transformers 与 PyTorch 2.1.0 兼容:
def __init_subclass__(cls) -> None:
"""Register subclasses as pytree nodes.
This is necessary to synchronize gradients when using `torch.nn.parallel.DistributedDataParallel` with
`static_graph=True` with modules that output `ModelOutput` subclasses.
"""
if is_torch_available():
if version.parse(get_torch_version()) >= version.parse("2.2"):
from torch.utils._pytree import register_pytree_node
register_pytree_node(
cls,
_model_output_flatten,
partial(_model_output_unflatten, output_type=cls),
serialized_type_name=f"{cls.__module__}.{cls.__name__}",
)
else:
# 修改这里
from torch.utils._pytree import _register_pytree_node
# 修改这里
_register_pytree_node(
cls,
_model_output_flatten,
partial(_model_output_unflatten, output_type=cls),
)另外若后续服务化出现错误,可能是读取 chat_template 时 ASCII 编码错误导致,需要修改 transformers/tokenization_utils_base.py 约 2160 行左右,从:
with open(chat_template_file) as chat_template_handle:修改为:
with open(chat_template_file, encoding="utf-8") as chat_template_handle:从源码安装 transformers:
pip install ./transformers确保 transforemrs>=4.51.3。
可能需要安装 einops:
pip3 install einops进入 llm_model 路径
cd $ATB_SPEED_HOME_PATH执行对话测试
torchrun --nproc_per_node 2 \
--master_port 20037 \
-m examples.run_pa \
--model_path ${权重路径} \
--input_texts 'What is deep learning?' \
--max_output_length 20vim /usr/local/Ascend/mindie/latest/mindie-service/conf/config.json{
...
"ServerConfig" :
{
...
"port" : 1025, #自定义
"managementPort" : 1026, #自定义
"metricsPort" : 1027, #自定义
...
"httpsEnabled" : false,
...
},
"BackendConfig": {
...
"npuDeviceIds" : [[0,1,2,3]],
...
"ModelDeployConfig":
{
"ModelConfig" : [
{
...
"modelName" : "chatglm",
"modelWeightPath" : "/data/datasets/GLM-4-9B-0414",
"worldSize" : 4,
...
}
]
},
...
}
}cd /usr/local/Ascend/mindie/latest/mindie-service/bin
./mindieservice_daemoncurl 127.0.0.1:1025/generate -d '{
"prompt": "What is deep learning?",
"max_tokens": 32,
"stream": false,
"do_sample":true,
"temperature": 0.6,
"top_p": 0.95,
"model": "chatglm"
}'注: 服务化推理的更多信息请参考MindIE Service 用户指南