Matrix-Game 3.0：具备长时记忆的实时流交互世界模型

📝 概述

Matrix-Game-3.0 是一款开源的、内存增强型交互世界模型，专为720p实时长视频生成而设计。

框架概述

我们的框架将三个阶段统一为端到端流水线：

数据引擎 — 工业级无限数据引擎，集成Unreal Engine合成场景、大规模自动化AAA游戏采集以及真实世界视频增强，以批量生成高质量的Video-Pose-Action-Prompt四元组；
模型训练 — 带有误差缓冲器的内存增强型扩散Transformer（DiT），通过内存增强的长时一致性学习动作条件生成；
推理部署 — 少步采样、INT8量化和模型蒸馏，使用5B模型实现720p@40FPS的实时生成。

Model Overview

✨ 核心特性

🚀 特性1：升级的数据引擎：结合基于Unreal Engine的合成数据、大规模自动化AAA游戏数据和真实世界视频增强，生成高质量的Video–Pose–Action–Prompt数据。
🖱️ 特性2：长时记忆与一致性：利用预测残差和帧重注入进行自我修正，同时通过相机感知内存确保长期时空一致性。
🎬 特性3：实时交互与开放访问：采用基于分布匹配蒸馏（DMD）的多段自回归蒸馏策略，结合模型量化和VAE解码器蒸馏，支持5B模型在720p分辨率下实现[40fps]实时生成，同时在长达数分钟的序列中保持稳定的内存一致性。
👍 特性3：28B-MoE模型扩展：扩展至2×14B模型可进一步提升生成质量、动态效果和泛化能力。

🔥 最新更新

[2026-03] 🎉 Matrix-Game-3.0 模型首次发布

🚀 快速开始

安装

创建 conda 环境并安装依赖项：

conda create -n matrix-game-3.0 python=3.12 -y
conda activate matrix-game-3.0
# install FlashAttention
# Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
cd Matrix-Game-3.0
pip install -r requirements.txt

模型下载

pip install "huggingface_hub[cli]"
huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0

推理

运行推理前，需准备：

输入图像
文本提示词

下载预训练模型后，可使用以下命令生成带有随机动作的交互式视频：

torchrun --nproc_per_node=$NUM_GPUS generate.py --size 704*1280 --dit_fsdp --t5_fsdp --ckpt_dir Matrix-Game-3.0 --fa_version 3 --use_int8 --num_iterations 12 --num_inference_steps 3 --image demo_images/000/image.png --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." --save_name test --seed 42 --compile_vae --lightvae_pruning_rate 0.5 --vae_type mg_lightvae --output_dir ./output
# "num_iterations" refers to the number of iterations you want to generate. The total number of frames generated is given by:57 + (num_iterations - 1) * 40

提示：如果您想使用基础模型，可以使用“--use_base_model --num_inference_steps 50”。如果您想根据自己的输入动作生成交互式视频，则可以使用“--interactive”。在多GPU环境下，您可以通过传递“--use_async_vae --async_vae_warmup_iters 1”来加速推理。

⭐ 致谢

Diffusers 提供了出色的扩散模型框架
Self-Forcing 的卓越研究成果
GameFactory 提出的动作控制模块理念
LightX2V 提供的优秀量化框架
Wan2.2 提供的强大基础模型
lingbot-world 提供的上下文并行框架

📖 引用

如果您发现本研究对您的工作有所帮助，请引用我们的论文：

  @misc{2026matrix,
    title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
    author={{Skywork AI Matrix-Game Team}},
    year={2026},
    howpublished={Technical report},
    url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
  }