TimeCMA模型昇腾部署指导

介绍

多变量时间序列预测（MTSF）旨在学习变量之间的时间动态，从而预测未来的时间序列。现有的基于统计和深度学习的方法由于可学习参数有限和训练数据规模较小而存在局限性。最近，结合时间序列与文本提示的大型语言模型（LLMs）在多变量时间序列预测中取得了令人瞩目的性能。然而，作者发现当前基于LLM的解决方案在学习解耦嵌入方面仍存在不足。为此，作者提出了TimeCMA，这是一个通过跨模态对齐实现多变量时间序列预测的直观而有效的框架。本指导适用于在昇腾NPU（A2）上部署TimeCMA，并达到较好的性能。

环境准备

环境信息

|驱动固件|25.5.0.b070| |CANN版本|8.5.0| |python版本|3.11.14| |torch版本|2.7.1| |torch_npu版本|2.7.1|

运行容器

准备镜像

本指导以8.5.0-910b-ubuntu22.04-py3.11镜像为例，获取镜像的命令为：

docker pull quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11

通过docker images可以查看是否拉取成功。

启动容器

docker run -it -d     --shm-size=500g     --name TimeCMA     --privileged     --entrypoint /bin/bash     --net=host     --device /dev/davinci0     --device /dev/davinci1     --device /dev/davinci2     --device /dev/davinci3     --device /dev/davinci4     --device /dev/davinci5     --device /dev/davinci6     --device /dev/davinci7     --device /dev/davinci_manager     --device /dev/devmm_svm     --device /dev/hisi_hdc     -v /usr/local/dcmi:/usr/local/dcmi     -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool     -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi     -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/     -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info     -v /etc/ascend_install.info:/etc/ascend_install.info     -v /opt/:/opt/     quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11

##进入容器
docker exec -itu root TimeCMA bash

部署

拉取代码

本项目以在/home 目录下拉取为例，首先 cd /home，然后输入：

git clone https://github.com/ChenxiLiu-HNU/TimeCMA.git

可以看到存在TimeCMA文件，即表示拉取成功。

获取数据集

为了不修改代码中对于数据集路径的引用方式，故下面以源代码中的路径去创建数据集。

cd /mnt

mkdir sfs-common

cd sfs-common

git clone https://github.com/zhouhaoyi/ETDataset.git

cp -r ETDataset/ETT-small/ ./dataset

若想要使用自定义的数据集路径，则需要修改TimeCMA中的相应代码。

创建conda虚拟环境

首先确保当前环境有conda，若无conda则需要先安装conda。对于arrch64架构来说，获取安装脚本的命令为（建议在/tmp 目录下执行）：

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh

给与执行权限：

chmod +x Miniconda3-latest-Linux-aarch64.sh

开始安装（按照提示输入回车或者yes）：

bash Miniconda3-latest-Linux-aarch64.sh

激活conda

source ~/miniconda3/etc/profile.d/conda.sh

创建conda虚拟环境

cd /home/TimeCMA

对于ubuntu来说，使用env_ubuntu.yaml配置文件进行安装，但是该配置文件中有部分包已失效，故需要手动修改env_ubuntu.yaml，修正后的env_ubuntu.yaml为：

name: TimeCMA
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bottleneck=1.3.5
  - bzip2=1.0.8
  - ca-certificates
  - libffi=3.4.4
  - libgcc-ng=11.2.0
  - libgomp=11.2.0
  - libstdcxx-ng=11.2.0
  - libuuid=1.41.5
  - ncurses=6.4
  - numexpr=2.8.7
  - openssl=3.0.12
  - pip=23.3.1
  - python=3.10.13
  - python-dateutil=2.8.2
  - python-tzdata=2023.3
  - pytz=2023.3.post1
  - readline=8.2
  - six=1.16.0
  - sqlite=3.41.2
  - tbb=2021.8.0
  - tk=8.6.12
  - tzdata=2023c
  - wheel=0.41.2
  - xz=5.4.5
  - zlib=1.2.13
  - pip:
      - absl-py==2.0.0
      - aiohttp==3.9.1
      - aiosignal==1.3.1
      - async-timeout==4.0.3
      - attrs==23.2.0
      - cachetools==5.3.2
      - certifi==2022.12.7
      - charset-normalizer==2.1.1
      - contourpy==1.2.0
      - cycler==0.12.1
      - easy-torch==1.3.2
      - easydict==1.10
      - einops==0.7.0
      - filelock==3.9.0
      - fonttools==4.47.0
      - frozenlist==1.4.1
      - fsspec==2023.12.2
      - google-auth==2.25.2
      - google-auth-oauthlib==1.2.0
      - grpcio==1.60.0
      - huggingface-hub==0.19.4
      - idna==3.4
      - jinja2==3.1.2
      - joblib==1.3.2
      - kiwisolver==1.4.5
      - lightning-utilities==0.10.1
      - markdown==3.5.1
      - markupsafe==2.1.3
      - matplotlib==3.8.2
      - mpmath==1.3.0
      - multidict==6.0.4
      - networkx==3.0
      - numpy==1.22.4
      - oauthlib==3.2.2
      - packaging==23.1
      - pandas==1.3.5
      - pillow==9.3.0
      - protobuf==4.23.4
      - psutil==5.9.8
      - pyasn1==0.5.1
      - pyasn1-modules==0.3.0
      - pyparsing==3.1.1
      - pytorch-lightning==1.9.4
      - pyyaml==6.0.1
      - regex==2023.10.3
      - requests==2.28.1
      - requests-oauthlib==1.3.1
      - rsa==4.9
      - safetensors==0.4.1
      - scikit-learn==1.0.2
      - scipy==1.7.3
      - seaborn==0.13.2
      - sentencepiece==0.2.0
      - setproctitle==1.3.2
      - setuptools==59.5.0
      - sympy==1.10.1
      - tables==3.7.0
      - tensorboard==2.15.1
      - tensorboard-data-server==0.7.2
      - threadpoolctl==3.2.0
      - tokenizers==0.15.0
      - torch==2.1.0
      - tqdm==4.66.1
      - transformers==4.36.2
      - triton==3.5.0
      - typing-extensions==4.4.0
      - urllib3==1.26.13
      - werkzeug==3.0.1
      - yarl==1.9.4

conda env create -f env_ubuntu.yaml

安装提示输入即可，此处需等待几分钟。

用上述配置文件激活conda：

conda activate TimeCMA

安装依赖

pip install h5py 
pip install torch==2.7.1 torch_npu==2.7.1

核心适配代码

cd TimeCMA
vim storage/gen_prompt_emb.py

##在开头的import torch下面添加如下代码：
import torch_npu
from torch_npu.contrib import transfer_to_npu

##同理，对于train.py，storage/store_emb.py和models/TimeCMA.py中也需要添加上述代码。

Last token embedding storage

cd TimeCMA

mkdir Results

mkdir Results/emb_logs

pip install huggingface-hub tqdm

##下载gpt2的tokenizer
HF_ENDPOINT=https://hf-mirror.com python -c "
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='openai-community/gpt2',
    local_dir='./gpt2',
    resume_download=True,
    local_dir_use_symlinks=False
)
"

修改调用gpt2 tokenizer处的代码：

vim scripts/Store_ETT.sh

##在nohup python storage/store_emb.py后添加： --model_name "/home/TimeCMA/gpt2"

开始运行：

bash scripts/Store_ETT.sh

观察点：

1. npu-smi info有python进程被占用
2. /home/TimeCMA/Embeddings/ETTh1/test等文件夹下面有文件生成:0.h5,1.h5......

上述运行需等待一段时间，在上述程序运行完之后，将Embeddings/ 目录拷贝至/mnt/sfs-common/TimeCMA。

mkdir /mnt/sfs-common/TimeCMA
cp -r Embeddings/ /mnt/sfs-common/TimeCMA

Train and inference

以在训练ETTh1数据集上训练为例：

cd TimeCMA
bash scripts/ETTh1.sh

由于上述脚本使用了nohup后台运行，因此可以通过cat Results/ETTh1目录下的文件来查看程序的运行状况。

常见问题处理

TimeCMA模型训练过程中出现aten::_transformer_encoder_layer_fwd算子fallback问题

问题背景与定位

客户需用TimeCMA做储层改造大数据分析与智能体技术研究，在A2上训练过程中抛出“Warning: CAUTION: The operator ‘aten::_transformer_encoder_layer_fwd’ is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)”告警，同时查看CPU利用率接近100%。故可以判断该算子fall back到CPU上运行，影响模型训练效率。故需要开发NPU侧的aten::_transformer_encoder_layer_fwd算子，从而提升模型的训练性能。

TimeCMA的模型部署详见：https://ai.gitcode.com/Ascend-SACT/TimeCMA/。

开发算子

准备环境

为了避免由于环境差异导致编包失败或报错等问题，建议先按照上述模型部署指导创建好容器，并使用conda 环境。下面的操作均在此条件下进行：

模型部署链接：https://ai.gitcode.com/Ascend-SACT/TimeCMA/

source ~/miniconda3/etc/profile.d/conda.sh

conda activate TimeCMA

合入新算子代码及PTA编包

拉取pytorch

cd /home

###由于当前conda虚拟环境中的torch版本是2.7.1，故下面拉取对应2.7.1版本的pytorch git clone https://gitcode.com/Ascend/pytorch.git -b v2.7.1 --depth 1

cd pytorch

##python版本可以通过python --version查看 bash ci/build.sh --python=3.10

合入新算子代码

touch third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp

vim third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp

###添加如下代码： TransformerEncoderLayerFwdKernelNpuOpApi.cpp的代码：

#include <ATen/ops/_transformer_encoder_layer_fwd_native.h>

#include "op_plugin/OpApiInterface.h" #include "op_plugin/utils/op_api_common.h"

namespace op_api { using npu_preparation = at_npu::native::OpPreparation;

at::Tensor _transformer_encoder_layer_fwd( const at::Tensor& src, const int64_t embed_dim, const int64_t num_heads, const at::Tensor& qkv_weight, const at::Tensor& qkv_bias, const at::Tensor& proj_weight, const at::Tensor& proj_bias, const bool use_gelu, const bool norm_first, const double layer_norm_eps, const at::Tensor& layer_norm_weight_1, const at::Tensor& layer_norm_bias_1, const at::Tensor& layer_norm_weight_2, const at::Tensor& layer_norm_bias_2, const at::Tensor& ffn_weight_1, const at::Tensor& ffn_bias_1, const at::Tensor& ffn_weight_2, const at::Tensor& ffn_bias_2, const c10::optionalat::Tensor& mask, const c10::optional<int64_t> mask_type) { return at::native::transformer_encoder_layer_forward( src, embed_dim, num_heads, qkv_weight, qkv_bias, proj_weight, proj_bias, use_gelu, norm_first, layer_norm_eps, layer_norm_weight_1, layer_norm_bias_1, layer_norm_weight_2, layer_norm_bias_2, ffn_weight_1, ffn_bias_1, ffn_weight_2, ffn_bias_2, mask, mask_type); }

} // namespace op_api vim third_party/op-plugin/op_plugin/config/op_plugin_functions.yaml 在official下面添加如下代码（在第3行处，注意缩进）：

func: _transformer_encoder_layer_fwd(Tensor src, int embed_dim, int num_heads, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, bool use_gelu, bool norm_first, float eps, Tensor norm_weight_1, Tensor norm_bias_1, Tensor norm_weight_2, Tensor norm_bias_2, Tensor ffn_weight_1, Tensor ffn_bias_1, Tensor ffn_weight_2, Tensor ffn_bias_2, Tensor? mask=None, int? mask_type=None) -> Tensor op_api: all_version 开始编包：

bash ci/build.sh --python=3.10

安装PTA包

在上述编包完成后，会在当前的dist/ 目录下生成.whl 文件，执行

pip install dist/*.whl 即可安装成功。通过pip show torch_npu 可以查看当前的torch_npu是否为补丁版本。

验证与测试

升级conda里的libstdc++

由于当前环境里的 libstdc++ 版本太低，使用上述补丁版本的torch_npu会有libstdc++.so.6: version GLIBCXX_3.4.30’ not found 报错，因此需要升级conda 环境中的libstdc++:

conda activate TimeCMA

conda install -c conda-forge libstdcxx-ng=12

开始训练

cd /home/TimeCMA

bash scripts/ETTh1.sh

##等待几十秒后查看日志

cat Results/ETTh1/i96_o192_lr1e-4_c64_el1_dl2_dn0.7_bs16.log 若上述日志中没有出现aten::_transformer_encoder_layer_fwd 算子fallback的告警，且CPU利用率较低，则表示算子替换成功，此时可以看到日志中训练明显加快了很多。

正常训练的日志如下：

Namespace(device='cuda', data_path='ETTh1', channel=64, num_nodes=7, seq_len=96, pred_len=192, batch_size=16, learning_rate=0.0001, dropout_n=0.7, d_llm=768, e_layer=1, d_layer=2, head=8, weight_decay=0.001, num_workers=10, model_name='gpt2', epochs=999, seed=2024, es_patience=50, save='./logs/2026-02-28-07:51:44-')
The number of trainable parameters: 12503459
The number of parameters: 12503460
Start training...
Epoch: 001, Training Time: 25.3328 secs
Epoch: 001, Validation Time: 2.9651 secs
-----------------------
Epoch: 001, Train Loss: 0.7727, Train MAE: 0.6361
Epoch: 001, Valid Loss: 1.1880, Valid MAE: 0.7476
###Update tasks appear###
Updating! Valid Loss:1.1880, epoch:  1
Epoch: 002, Training Time: 23.0531 secs
Epoch: 002, Validation Time: 2.9789 secs
-----------------------
Epoch: 002, Train Loss: 0.5786, Train MAE: 0.5466
Epoch: 002, Valid Loss: 1.1415, Valid MAE: 0.7263
###Update tasks appear###
Updating! Valid Loss:1.1415, epoch:  2

TimeCMA模型昇腾部署指导

介绍

环境准备

环境信息

|驱动固件|25.5.0.b070| |CANN版本|8.5.0| |python版本|3.11.14| |torch版本|2.7.1| |torch_npu版本|2.7.1|

运行容器

准备镜像

本指导以8.5.0-910b-ubuntu22.04-py3.11镜像为例，获取镜像的命令为：

docker pull quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11

通过docker images可以查看是否拉取成功。

启动容器

docker run -it -d     --shm-size=500g     --name TimeCMA     --privileged     --entrypoint /bin/bash     --net=host     --device /dev/davinci0     --device /dev/davinci1     --device /dev/davinci2     --device /dev/davinci3     --device /dev/davinci4     --device /dev/davinci5     --device /dev/davinci6     --device /dev/davinci7     --device /dev/davinci_manager     --device /dev/devmm_svm     --device /dev/hisi_hdc     -v /usr/local/dcmi:/usr/local/dcmi     -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool     -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi     -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/     -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info     -v /etc/ascend_install.info:/etc/ascend_install.info     -v /opt/:/opt/     quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11

##进入容器
docker exec -itu root TimeCMA bash

部署

拉取代码

本项目以在/home 目录下拉取为例，首先 cd /home，然后输入：

git clone https://github.com/ChenxiLiu-HNU/TimeCMA.git

可以看到存在TimeCMA文件，即表示拉取成功。

获取数据集

为了不修改代码中对于数据集路径的引用方式，故下面以源代码中的路径去创建数据集。

cd /mnt

mkdir sfs-common

cd sfs-common

git clone https://github.com/zhouhaoyi/ETDataset.git

cp -r ETDataset/ETT-small/ ./dataset

若想要使用自定义的数据集路径，则需要修改TimeCMA中的相应代码。

创建conda虚拟环境

首先确保当前环境有conda，若无conda则需要先安装conda。对于arrch64架构来说，获取安装脚本的命令为（建议在/tmp 目录下执行）：

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh

给与执行权限：

chmod +x Miniconda3-latest-Linux-aarch64.sh

开始安装（按照提示输入回车或者yes）：

bash Miniconda3-latest-Linux-aarch64.sh

激活conda

source ~/miniconda3/etc/profile.d/conda.sh

创建conda虚拟环境

cd /home/TimeCMA

对于ubuntu来说，使用env_ubuntu.yaml配置文件进行安装，但是该配置文件中有部分包已失效，故需要手动修改env_ubuntu.yaml，修正后的env_ubuntu.yaml为：

name: TimeCMA
channels:
  - pytorch
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - bottleneck=1.3.5
  - bzip2=1.0.8
  - ca-certificates
  - libffi=3.4.4
  - libgcc-ng=11.2.0
  - libgomp=11.2.0
  - libstdcxx-ng=11.2.0
  - libuuid=1.41.5
  - ncurses=6.4
  - numexpr=2.8.7
  - openssl=3.0.12
  - pip=23.3.1
  - python=3.10.13
  - python-dateutil=2.8.2
  - python-tzdata=2023.3
  - pytz=2023.3.post1
  - readline=8.2
  - six=1.16.0
  - sqlite=3.41.2
  - tbb=2021.8.0
  - tk=8.6.12
  - tzdata=2023c
  - wheel=0.41.2
  - xz=5.4.5
  - zlib=1.2.13
  - pip:
      - absl-py==2.0.0
      - aiohttp==3.9.1
      - aiosignal==1.3.1
      - async-timeout==4.0.3
      - attrs==23.2.0
      - cachetools==5.3.2
      - certifi==2022.12.7
      - charset-normalizer==2.1.1
      - contourpy==1.2.0
      - cycler==0.12.1
      - easy-torch==1.3.2
      - easydict==1.10
      - einops==0.7.0
      - filelock==3.9.0
      - fonttools==4.47.0
      - frozenlist==1.4.1
      - fsspec==2023.12.2
      - google-auth==2.25.2
      - google-auth-oauthlib==1.2.0
      - grpcio==1.60.0
      - huggingface-hub==0.19.4
      - idna==3.4
      - jinja2==3.1.2
      - joblib==1.3.2
      - kiwisolver==1.4.5
      - lightning-utilities==0.10.1
      - markdown==3.5.1
      - markupsafe==2.1.3
      - matplotlib==3.8.2
      - mpmath==1.3.0
      - multidict==6.0.4
      - networkx==3.0
      - numpy==1.22.4
      - oauthlib==3.2.2
      - packaging==23.1
      - pandas==1.3.5
      - pillow==9.3.0
      - protobuf==4.23.4
      - psutil==5.9.8
      - pyasn1==0.5.1
      - pyasn1-modules==0.3.0
      - pyparsing==3.1.1
      - pytorch-lightning==1.9.4
      - pyyaml==6.0.1
      - regex==2023.10.3
      - requests==2.28.1
      - requests-oauthlib==1.3.1
      - rsa==4.9
      - safetensors==0.4.1
      - scikit-learn==1.0.2
      - scipy==1.7.3
      - seaborn==0.13.2
      - sentencepiece==0.2.0
      - setproctitle==1.3.2
      - setuptools==59.5.0
      - sympy==1.10.1
      - tables==3.7.0
      - tensorboard==2.15.1
      - tensorboard-data-server==0.7.2
      - threadpoolctl==3.2.0
      - tokenizers==0.15.0
      - torch==2.1.0
      - tqdm==4.66.1
      - transformers==4.36.2
      - triton==3.5.0
      - typing-extensions==4.4.0
      - urllib3==1.26.13
      - werkzeug==3.0.1
      - yarl==1.9.4

conda env create -f env_ubuntu.yaml

安装提示输入即可，此处需等待几分钟。

用上述配置文件激活conda：

conda activate TimeCMA

安装依赖

pip install h5py 
pip install torch==2.7.1 torch_npu==2.7.1

核心适配代码

cd TimeCMA
vim storage/gen_prompt_emb.py

##在开头的import torch下面添加如下代码：
import torch_npu
from torch_npu.contrib import transfer_to_npu

##同理，对于train.py，storage/store_emb.py和models/TimeCMA.py中也需要添加上述代码。

Last token embedding storage

cd TimeCMA

mkdir Results

mkdir Results/emb_logs

pip install huggingface-hub tqdm

##下载gpt2的tokenizer
HF_ENDPOINT=https://hf-mirror.com python -c "
from huggingface_hub import snapshot_download
snapshot_download(
    repo_id='openai-community/gpt2',
    local_dir='./gpt2',
    resume_download=True,
    local_dir_use_symlinks=False
)
"

修改调用gpt2 tokenizer处的代码：

vim scripts/Store_ETT.sh

##在nohup python storage/store_emb.py后添加： --model_name "/home/TimeCMA/gpt2"

开始运行：

bash scripts/Store_ETT.sh

观察点：

1. npu-smi info有python进程被占用
2. /home/TimeCMA/Embeddings/ETTh1/test等文件夹下面有文件生成:0.h5,1.h5......

上述运行需等待一段时间，在上述程序运行完之后，将Embeddings/ 目录拷贝至/mnt/sfs-common/TimeCMA。

mkdir /mnt/sfs-common/TimeCMA
cp -r Embeddings/ /mnt/sfs-common/TimeCMA

Train and inference

以在训练ETTh1数据集上训练为例：

cd TimeCMA
bash scripts/ETTh1.sh

由于上述脚本使用了nohup后台运行，因此可以通过cat Results/ETTh1目录下的文件来查看程序的运行状况。

常见问题处理

TimeCMA模型训练过程中出现aten::_transformer_encoder_layer_fwd算子fallback问题

问题背景与定位

TimeCMA的模型部署详见：https://ai.gitcode.com/Ascend-SACT/TimeCMA/。

开发算子

准备环境

为了避免由于环境差异导致编包失败或报错等问题，建议先按照上述模型部署指导创建好容器，并使用conda 环境。下面的操作均在此条件下进行：

模型部署链接：https://ai.gitcode.com/Ascend-SACT/TimeCMA/

source ~/miniconda3/etc/profile.d/conda.sh

conda activate TimeCMA

合入新算子代码及PTA编包

拉取pytorch

cd /home

###由于当前conda虚拟环境中的torch版本是2.7.1，故下面拉取对应2.7.1版本的pytorch git clone https://gitcode.com/Ascend/pytorch.git -b v2.7.1 --depth 1

cd pytorch

##python版本可以通过python --version查看 bash ci/build.sh --python=3.10

合入新算子代码

touch third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp

vim third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp

###添加如下代码： TransformerEncoderLayerFwdKernelNpuOpApi.cpp的代码：

#include <ATen/ops/_transformer_encoder_layer_fwd_native.h>

#include "op_plugin/OpApiInterface.h" #include "op_plugin/utils/op_api_common.h"

namespace op_api { using npu_preparation = at_npu::native::OpPreparation;

} // namespace op_api vim third_party/op-plugin/op_plugin/config/op_plugin_functions.yaml 在official下面添加如下代码（在第3行处，注意缩进）：

func: _transformer_encoder_layer_fwd(Tensor src, int embed_dim, int num_heads, Tensor qkv_weight, Tensor qkv_bias, Tensor proj_weight, Tensor proj_bias, bool use_gelu, bool norm_first, float eps, Tensor norm_weight_1, Tensor norm_bias_1, Tensor norm_weight_2, Tensor norm_bias_2, Tensor ffn_weight_1, Tensor ffn_bias_1, Tensor ffn_weight_2, Tensor ffn_bias_2, Tensor? mask=None, int? mask_type=None) -> Tensor op_api: all_version 开始编包：

bash ci/build.sh --python=3.10

安装PTA包

在上述编包完成后，会在当前的dist/ 目录下生成.whl 文件，执行

pip install dist/*.whl 即可安装成功。通过pip show torch_npu 可以查看当前的torch_npu是否为补丁版本。

验证与测试

升级conda里的libstdc++

由于当前环境里的 libstdc++ 版本太低，使用上述补丁版本的torch_npu会有libstdc++.so.6: version GLIBCXX_3.4.30’ not found 报错，因此需要升级conda 环境中的libstdc++:

conda activate TimeCMA

conda install -c conda-forge libstdcxx-ng=12

开始训练

cd /home/TimeCMA

bash scripts/ETTh1.sh

##等待几十秒后查看日志

正常训练的日志如下：

Namespace(device='cuda', data_path='ETTh1', channel=64, num_nodes=7, seq_len=96, pred_len=192, batch_size=16, learning_rate=0.0001, dropout_n=0.7, d_llm=768, e_layer=1, d_layer=2, head=8, weight_decay=0.001, num_workers=10, model_name='gpt2', epochs=999, seed=2024, es_patience=50, save='./logs/2026-02-28-07:51:44-')
The number of trainable parameters: 12503459
The number of parameters: 12503460
Start training...
Epoch: 001, Training Time: 25.3328 secs
Epoch: 001, Validation Time: 2.9651 secs
-----------------------
Epoch: 001, Train Loss: 0.7727, Train MAE: 0.6361
Epoch: 001, Valid Loss: 1.1880, Valid MAE: 0.7476
###Update tasks appear###
Updating! Valid Loss:1.1880, epoch:  1
Epoch: 002, Training Time: 23.0531 secs
Epoch: 002, Validation Time: 2.9789 secs
-----------------------
Epoch: 002, Train Loss: 0.5786, Train MAE: 0.5466
Epoch: 002, Valid Loss: 1.1415, Valid MAE: 0.7263
###Update tasks appear###
Updating! Valid Loss:1.1415, epoch:  2