hdk:24.1.RC3
cann:8.0.RC3
python:3.10.16
torch:2.1.0
torch-npu:2.1.0.post8a. 新建conda环境
conda create --name Genecorpus python=3.10.16
conda activate Genecorpus
b. 安装Geneformer
git lfs install
git clone https://huggingface.co/ctheodoris/Geneformer
cd Geneformer
git checkout bfcada6 #重要
vi requirements.txt,将torch>=2.0.1修改为torch==2.1.0
:wq保存退出
pip install .
c. 安装torch_npu
wget https://gitee.com/ascend/pytorch/releases/download/v6.0.rc3-pytorch2.1.0/torch_npu-2.1.0.post8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
如果遇到证书不可信,需在命令结尾添加 --no-check-certificate
pip3 install torch_npu-2.1.0.post8-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whlGenecorpus-30M获取
git clone https://gitee.com/hf-datasets/Genecorpus-30M.git其它权重(bin、safetensors)配置文件以及数据集(pkl)获取
已存在于源码目录(gf-6L-30M-i2048、gf-12L-30M-i2048)
source /usr/local/Ascend/ascend-toolkit/set_env.sh1.细胞分类
vi /root/miniconda3/envs/Genecorpus/lib/python3.10/site-packages/geneformer/evaluation_utils.py
a. 在评估脚本头部添加
import torch_npu
from torch_npu.contrib import transfer_to_npu
b. 在第86行classifier_predict函数内添加
device = torch.device('npu' if torch_npu.npu.is_available() else 'cpu')
并将119、120、121三行中的.to(“cuda”)修改为.to(device)
:wq保存退出
cd /Geneformer/examples
source /usr/local/Ascend/ascend-toolkit/set_env.sh
执行微调
python3 cell_classification.py
(需注意微调脚本中,数据集和预训练模型的位置,根据实际路径进行修改)2.基因分类
cd /Geneformer/examples
source /usr/local/Ascend/ascend-toolkit/set_env.sh
执行微调
python3 gene_classification.py3.绘制细胞嵌入图
cd /Geneformer/examples
执行脚本
python3 extract_and_plot_cell_embeddings.py4.多任务细胞分类
cd /Geneformer/examples
source /usr/local/Ascend/ascend-toolkit/set_env.sh
python3 multitask_cell_classification.py