多变量时间序列预测(MTSF)旨在学习变量之间的时间动态,从而预测未来的时间序列。现有的基于统计和深度学习的方法由于可学习参数有限和训练数据规模较小而存在局限性。最近,结合时间序列与文本提示的大型语言模型(LLMs)在多变量时间序列预测中取得了令人瞩目的性能。然而,作者发现当前基于LLM的解决方案在学习解耦嵌入方面仍存在不足。为此,作者提出了TimeCMA,这是一个通过跨模态对齐实现多变量时间序列预测的直观而有效的框架。本指导适用于在昇腾NPU(A2)上部署TimeCMA,并达到较好的性能。
|驱动固件|25.5.0.b070| |CANN版本|8.5.0| |python版本|3.11.14| |torch版本|2.7.1| |torch_npu版本|2.7.1|
本指导以8.5.0-910b-ubuntu22.04-py3.11镜像为例,获取镜像的命令为:
docker pull quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11通过docker images可以查看是否拉取成功。
docker run -it -d --shm-size=500g --name TimeCMA --privileged --entrypoint /bin/bash --net=host --device /dev/davinci0 --device /dev/davinci1 --device /dev/davinci2 --device /dev/davinci3 --device /dev/davinci4 --device /dev/davinci5 --device /dev/davinci6 --device /dev/davinci7 --device /dev/davinci_manager --device /dev/devmm_svm --device /dev/hisi_hdc -v /usr/local/dcmi:/usr/local/dcmi -v /usr/local/Ascend/driver/tools/hccn_tool:/usr/local/Ascend/driver/tools/hccn_tool -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info -v /etc/ascend_install.info:/etc/ascend_install.info -v /opt/:/opt/ quay.io/ascend/cann:8.5.0-910b-ubuntu22.04-py3.11
##进入容器
docker exec -itu root TimeCMA bash本项目以在/home 目录下拉取为例,首先 cd /home,然后输入:
git clone https://github.com/ChenxiLiu-HNU/TimeCMA.git可以看到存在TimeCMA文件,即表示拉取成功。
为了不修改代码中对于数据集路径的引用方式,故下面以源代码中的路径去创建数据集。
cd /mnt
mkdir sfs-common
cd sfs-common
git clone https://github.com/zhouhaoyi/ETDataset.git
cp -r ETDataset/ETT-small/ ./dataset若想要使用自定义的数据集路径,则需要修改TimeCMA中的相应代码。
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh给与执行权限:
chmod +x Miniconda3-latest-Linux-aarch64.sh开始安装(按照提示输入回车或者yes):
bash Miniconda3-latest-Linux-aarch64.shsource ~/miniconda3/etc/profile.d/conda.shcd /home/TimeCMA对于ubuntu来说,使用env_ubuntu.yaml配置文件进行安装,但是该配置文件中有部分包已失效,故需要手动修改env_ubuntu.yaml,修正后的env_ubuntu.yaml为:
name: TimeCMA
channels:
- pytorch
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bottleneck=1.3.5
- bzip2=1.0.8
- ca-certificates
- libffi=3.4.4
- libgcc-ng=11.2.0
- libgomp=11.2.0
- libstdcxx-ng=11.2.0
- libuuid=1.41.5
- ncurses=6.4
- numexpr=2.8.7
- openssl=3.0.12
- pip=23.3.1
- python=3.10.13
- python-dateutil=2.8.2
- python-tzdata=2023.3
- pytz=2023.3.post1
- readline=8.2
- six=1.16.0
- sqlite=3.41.2
- tbb=2021.8.0
- tk=8.6.12
- tzdata=2023c
- wheel=0.41.2
- xz=5.4.5
- zlib=1.2.13
- pip:
- absl-py==2.0.0
- aiohttp==3.9.1
- aiosignal==1.3.1
- async-timeout==4.0.3
- attrs==23.2.0
- cachetools==5.3.2
- certifi==2022.12.7
- charset-normalizer==2.1.1
- contourpy==1.2.0
- cycler==0.12.1
- easy-torch==1.3.2
- easydict==1.10
- einops==0.7.0
- filelock==3.9.0
- fonttools==4.47.0
- frozenlist==1.4.1
- fsspec==2023.12.2
- google-auth==2.25.2
- google-auth-oauthlib==1.2.0
- grpcio==1.60.0
- huggingface-hub==0.19.4
- idna==3.4
- jinja2==3.1.2
- joblib==1.3.2
- kiwisolver==1.4.5
- lightning-utilities==0.10.1
- markdown==3.5.1
- markupsafe==2.1.3
- matplotlib==3.8.2
- mpmath==1.3.0
- multidict==6.0.4
- networkx==3.0
- numpy==1.22.4
- oauthlib==3.2.2
- packaging==23.1
- pandas==1.3.5
- pillow==9.3.0
- protobuf==4.23.4
- psutil==5.9.8
- pyasn1==0.5.1
- pyasn1-modules==0.3.0
- pyparsing==3.1.1
- pytorch-lightning==1.9.4
- pyyaml==6.0.1
- regex==2023.10.3
- requests==2.28.1
- requests-oauthlib==1.3.1
- rsa==4.9
- safetensors==0.4.1
- scikit-learn==1.0.2
- scipy==1.7.3
- seaborn==0.13.2
- sentencepiece==0.2.0
- setproctitle==1.3.2
- setuptools==59.5.0
- sympy==1.10.1
- tables==3.7.0
- tensorboard==2.15.1
- tensorboard-data-server==0.7.2
- threadpoolctl==3.2.0
- tokenizers==0.15.0
- torch==2.1.0
- tqdm==4.66.1
- transformers==4.36.2
- triton==3.5.0
- typing-extensions==4.4.0
- urllib3==1.26.13
- werkzeug==3.0.1
- yarl==1.9.4
conda env create -f env_ubuntu.yaml安装提示输入即可,此处需等待几分钟。
用上述配置文件激活conda:
conda activate TimeCMApip install h5py
pip install torch==2.7.1 torch_npu==2.7.1cd TimeCMA
vim storage/gen_prompt_emb.py
##在开头的import torch下面添加如下代码:
import torch_npu
from torch_npu.contrib import transfer_to_npu
##同理,对于train.py,storage/store_emb.py和models/TimeCMA.py中也需要添加上述代码。cd TimeCMA
mkdir Results
mkdir Results/emb_logs
pip install huggingface-hub tqdm
##下载gpt2的tokenizer
HF_ENDPOINT=https://hf-mirror.com python -c "
from huggingface_hub import snapshot_download
snapshot_download(
repo_id='openai-community/gpt2',
local_dir='./gpt2',
resume_download=True,
local_dir_use_symlinks=False
)
"修改调用gpt2 tokenizer处的代码:
vim scripts/Store_ETT.sh
##在nohup python storage/store_emb.py后添加: --model_name "/home/TimeCMA/gpt2"开始运行:
bash scripts/Store_ETT.sh观察点:
1. npu-smi info有python进程被占用
2. /home/TimeCMA/Embeddings/ETTh1/test等文件夹下面有文件生成:0.h5,1.h5......上述运行需等待一段时间,在上述程序运行完之后,将Embeddings/ 目录拷贝至/mnt/sfs-common/TimeCMA。
mkdir /mnt/sfs-common/TimeCMA
cp -r Embeddings/ /mnt/sfs-common/TimeCMA以在训练ETTh1数据集上训练为例:
cd TimeCMA
bash scripts/ETTh1.sh由于上述脚本使用了nohup后台运行,因此可以通过cat Results/ETTh1目录下的文件来查看程序的运行状况。
客户需用TimeCMA做储层改造大数据分析与智能体技术研究,在A2上训练过程中抛出“Warning: CAUTION: The operator ‘aten::_transformer_encoder_layer_fwd’ is not currently supported on the NPU backend and will fall back to run on the CPU. This may have performance implications. (function npu_cpu_fallback)”告警,同时查看CPU利用率接近100%。故可以判断该算子fall back到CPU上运行,影响模型训练效率。故需要开发NPU侧的aten::_transformer_encoder_layer_fwd算子,从而提升模型的训练性能。
TimeCMA的模型部署详见:https://ai.gitcode.com/Ascend-SACT/TimeCMA/。
为了避免由于环境差异导致编包失败或报错等问题,建议先按照上述模型部署指导创建好容器,并使用conda 环境。下面的操作均在此条件下进行:
模型部署链接:https://ai.gitcode.com/Ascend-SACT/TimeCMA/
source ~/miniconda3/etc/profile.d/conda.sh
conda activate TimeCMA
cd /home
###由于当前conda虚拟环境中的torch版本是2.7.1,故下面拉取对应2.7.1版本的pytorch git clone https://gitcode.com/Ascend/pytorch.git -b v2.7.1 --depth 1
cd pytorch
##python版本可以通过python --version查看 bash ci/build.sh --python=3.10
touch third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp
vim third_party/op-plugin/op_plugin/ops/opapi/TransformerEncoderLayerFwdKernelNpuOpApi.cpp
###添加如下代码: TransformerEncoderLayerFwdKernelNpuOpApi.cpp的代码:
#include <ATen/ops/_transformer_encoder_layer_fwd_native.h>
#include "op_plugin/OpApiInterface.h" #include "op_plugin/utils/op_api_common.h"
namespace op_api { using npu_preparation = at_npu::native::OpPreparation;
at::Tensor _transformer_encoder_layer_fwd( const at::Tensor& src, const int64_t embed_dim, const int64_t num_heads, const at::Tensor& qkv_weight, const at::Tensor& qkv_bias, const at::Tensor& proj_weight, const at::Tensor& proj_bias, const bool use_gelu, const bool norm_first, const double layer_norm_eps, const at::Tensor& layer_norm_weight_1, const at::Tensor& layer_norm_bias_1, const at::Tensor& layer_norm_weight_2, const at::Tensor& layer_norm_bias_2, const at::Tensor& ffn_weight_1, const at::Tensor& ffn_bias_1, const at::Tensor& ffn_weight_2, const at::Tensor& ffn_bias_2, const c10::optionalat::Tensor& mask, const c10::optional<int64_t> mask_type) { return at::native::transformer_encoder_layer_forward( src, embed_dim, num_heads, qkv_weight, qkv_bias, proj_weight, proj_bias, use_gelu, norm_first, layer_norm_eps, layer_norm_weight_1, layer_norm_bias_1, layer_norm_weight_2, layer_norm_bias_2, ffn_weight_1, ffn_bias_1, ffn_weight_2, ffn_bias_2, mask, mask_type); }
} // namespace op_api vim third_party/op-plugin/op_plugin/config/op_plugin_functions.yaml 在official下面添加如下代码(在第3行处,注意缩进):
bash ci/build.sh --python=3.10
在上述编包完成后,会在当前的dist/ 目录下生成.whl 文件,执行
pip install dist/*.whl 即可安装成功。 通过pip show torch_npu 可以查看当前的torch_npu是否为补丁版本。
由于当前环境里的 libstdc++ 版本太低,使用上述补丁版本的torch_npu会有libstdc++.so.6: version GLIBCXX_3.4.30’ not found 报错,因此需要升级conda 环境中的libstdc++:
conda activate TimeCMA
conda install -c conda-forge libstdcxx-ng=12
cd /home/TimeCMA
bash scripts/ETTh1.sh
##等待几十秒后查看日志
cat Results/ETTh1/i96_o192_lr1e-4_c64_el1_dl2_dn0.7_bs16.log 若上述日志中没有出现aten::_transformer_encoder_layer_fwd 算子fallback的告警,且CPU利用率较低,则表示算子替换成功,此时可以看到日志中训练明显加快了很多。
正常训练的日志如下:
Namespace(device='cuda', data_path='ETTh1', channel=64, num_nodes=7, seq_len=96, pred_len=192, batch_size=16, learning_rate=0.0001, dropout_n=0.7, d_llm=768, e_layer=1, d_layer=2, head=8, weight_decay=0.001, num_workers=10, model_name='gpt2', epochs=999, seed=2024, es_patience=50, save='./logs/2026-02-28-07:51:44-')
The number of trainable parameters: 12503459
The number of parameters: 12503460
Start training...
Epoch: 001, Training Time: 25.3328 secs
Epoch: 001, Validation Time: 2.9651 secs
-----------------------
Epoch: 001, Train Loss: 0.7727, Train MAE: 0.6361
Epoch: 001, Valid Loss: 1.1880, Valid MAE: 0.7476
###Update tasks appear###
Updating! Valid Loss:1.1880, epoch: 1
Epoch: 002, Training Time: 23.0531 secs
Epoch: 002, Validation Time: 2.9789 secs
-----------------------
Epoch: 002, Train Loss: 0.5786, Train MAE: 0.5466
Epoch: 002, Valid Loss: 1.1415, Valid MAE: 0.7263
###Update tasks appear###
Updating! Valid Loss:1.1415, epoch: 2