InfoGraph 是一个用于无监督图级别表示学习的模型,通过互信息最大化(InfoMax)来学习分子图的嵌入表示。模型包含两个编码器:一个编码整个图,另一个编码不同粒度的子结构(如节点、边),通过对比损失函数最大化两者之间的互信息。编码器采用 GIN(Graph Isomorphism Network)和 NNConv 等图卷积层,可捕获分子拓扑结构和特征信息。InfoGraph 主要用于分子表示的预训练,也可直接用于图分类等下游任务。本文描述的 InfoGraph 模型是基于 DeepChem 套件实现的,后续适配也是基于该套件修改。
| 组件 | 版本 |
|---|---|
| Python | 3.10.19 |
| PyTorch | 2.1.0 |
| torch_npu | 2.1.0.post13 |
| CANN | 8.1.RC1 |
| 设备型号 | NPU 配置 |
|---|---|
| Atlas 800T A2 | 单卡 / 多卡 |
| 镜像环境 | 镜像地址 |
|---|---|
| 公网 | swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_2_1_ascend:pytorch_2.1.0-cann_8.1.rc1-py_3.10-euler_2.10.11-aarch64-snt9b-20250603154214-4e60e43 |
docker run -u root --privileged \
--name {container_name} \
--device /dev/davinci0 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-itd {image_id} /bin/bashdocker exec -it {container_name} bash直接安装 deepchem-ascend 二进制包,该包已基于适配代码重新编译并上传至 PyPI。
pip install deepchem-ascend==0.0.1安装运行必要的依赖库
pip install torch-geometric测试代码举例如下:
import numpy as np
import torch
from deepchem.feat.molecule_featurizers import MolGraphConvFeaturizer
from deepchem.feat.graph_data import BatchGraphData
from deepchem.models.torch_models.torch_model import is_npu_available
try:
from deepchem.models.torch_models.infograph import InfoGraphModel, InfoGraphStarModel
except ImportError:
print("Failed to import InfoGraph, please ensure torch-geometric is installed")
exit(1)
if not is_npu_available():
print("NPU not available, skipping test")
exit()
device = torch.device('npu')
print(f"Testing InfoGraph on NPU device: {device}")
num_feat = 30
edge_dim = 11
embedding_dim = 64
num_gc_layers = 2
smiles = ['C1=CC=CC=C1', 'C1=CC=CC=C1C2=CC=CC=C2']
featurizer = MolGraphConvFeaturizer(use_edges=True)
graphs = featurizer.featurize(smiles)
batch_graphs = BatchGraphData(graphs)
print(f"Number of graphs: {len(graphs)}")
print("Testing InfoGraphModel (pretraining task)...")
infograph_model = InfoGraphModel(
num_features=num_feat,
embedding_dim=embedding_dim,
num_gc_layers=num_gc_layers,
task='pretraining',
device=device
)
print(f"InfoGraphModel device: {infograph_model.device}")
inputs = batch_graphs.numpy_to_torch(device)
g_enc, l_enc = infograph_model.model(inputs)
print(f"InfoGraphModel forward pass successful, g_enc shape: {g_enc.shape}, l_enc shape: {l_enc.shape}")
print("\nTesting InfoGraphModel (regression task)...")
regression_model = InfoGraphModel(
num_features=num_feat,
embedding_dim=embedding_dim,
num_gc_layers=num_gc_layers,
task='regression',
n_tasks=1,
device=device
)
print(f"Regression model device: {regression_model.device}")
inputs = batch_graphs.numpy_to_torch(device)
regression_output = regression_model.model(inputs)
print(f"Regression model forward pass successful, output shape: {regression_output.shape}")
print("\nTesting InfoGraphModel (classification task)...")
classification_model = InfoGraphModel(
num_features=num_feat,
embedding_dim=embedding_dim,
num_gc_layers=num_gc_layers,
task='classification',
n_tasks=1,
n_classes=2,
device=device
)
print(f"Classification model device: {classification_model.device}")
inputs = batch_graphs.numpy_to_torch(device)
classification_output = classification_model.model(inputs)
print(f"Classification model forward pass successful, output shape: {classification_output.shape}")测试结果: