# 请替换URL为CANN版本和设备型号对的URL
# 安装CANN Toolkit
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install
# 安装CANN Kernels
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install
# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.shpip install openmind_hubpip install openmind[pt]from openmind import AutoTokenizer, AutoModelForCausalLM
texts = "今天天气不错,"
model = AutoModelForCausalLM.from_pretrained("AI_Connect/cpm-ant-10b", device_map="npu:0")
tokenizer = AutoTokenizer.from_pretrained("AI_Connect/cpm-ant-10b")
input_ids = tokenizer(texts, return_tensors="pt").to(model.device)
outputs = model.generate(**input_ids)
output_texts = tokenizer.batch_decode(outputs)
print(output_texts)准备WizardLM_evol_instruct_V2_143k数据集,下载链接为:https://huggingface.co/datasets/WizardLMTeam/WizardLM_evol_instruct_V2_196k,下载至本地。
git clone https://github.com/hiyouga/LLaMA-Factory.git --depth 1
cd LLaMA-Factory
pip install -e ".[torch-npu,metrics]"
pip install transformers==4.42.3在LLaMa Factory的data/dataset_info.json文件中添加如下字段
"evol_instruct_V2": {
"file_name": "WizardLM_evol_instruct_V2_143k.json" ##修改为本地的WizardLM_evol_instruct_V2_143k.json路径
},在LLaMa Factory路径下新建examples/train_full/cpm-ant-10b_full_lora_ds2.yaml微调配置文件,微调配置文件如下,其中model_name_or_path字段修改为本地模型路径:
### model
model_name_or_path: /models/cpm-ant-10b
### method
stage: sft
do_train: true
finetuning_type: lora
deepspeed: examples/deepspeed/ds_z2_config.json
### dataset
dataset: evol_instruct_V2
template: cpm
cutoff_len: 1024
max_samples: 1000000000000000
# max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: saves/cpm-ant-10b/full/sft
logging_steps: 1
save_steps: 5000
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
learning_rate: 3.0e-5
max_steps: 5000
# num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
### eval
val_size: 0.1
per_device_eval_batch_size: 1
eval_strategy: steps
eval_steps: 50000
llamafactory-cli train examples/train_full/cpm-ant-10b_full_lora_ds2.yaml