我们推出了 NextStep-1.1,这是 NextStep 系列的一次重大飞跃。该版本有效解决了 NextStep-1 中出现的可视化失败问题,并通过扩展训练以及基于流的强化学习(RL)后训练范式,显著提升了图像质量。
NextStep-1.1 不仅仅是微调;它是一个专注于稳定性和高保真输出的重新设计版本。主要改进包括:
本仓库已完成对华为昇腾 NPU 的适配。模型可通过 device_map="npu:0" 直接加载到 NPU 上,pipeline 开箱即用地支持 NPU 推理。
torch.npu 随机种子处理)。inference.py 将模型加载到 NPU 并生成图像。benchmark.py 测量不同 batch size 下的推理耗时。accuracy.py 提供 NPU 与 CPU 精度对比的验证框架。原始模型权重地址:https://huggingface.co/stepfun-ai/NextStep-1.1-Pretrain-256px
使用 ModelScope 下载模型权重:
modelscope download --model stepfun-ai/NextStep-1.1-Pretrain-256px --local_dir /opt/atomgit/weight/NextStep-1.1-Pretrain-256px下载权重后,安装权重目录下的依赖:
pip install -r /opt/atomgit/weight/NextStep-1.1-Pretrain-256px/requirements.txt请确保已安装华为昇腾 CANN 工具包和 torch_npu:
# 以 CANN 8.5 为例
source /usr/local/Ascend/ascend-toolkit/set_env.sh
pip install torch_npu若你使用的是 torch==2.9.0+cpu,请确保同时安装 accelerate:
pip install torch==2.9.0+cpu acceleratefrom pathlib import Path
from models.gen_pipeline import NextStepPipeline
from transformers import AutoModel, AutoTokenizer
HF_HUB = "/opt/atomgit/weights/stepfun-ai/NextStep-1.1-Pretrain-256px"
# 加载模型和 tokenizer
tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
model = AutoModel.from_pretrained(
HF_HUB, device_map="npu:0", local_files_only=True, trust_remote_code=True
)
pipeline = NextStepPipeline(
model_name_or_path=HF_HUB, tokenizer=tokenizer, model=model, device="npu:0"
)
# 设置提示词
positive_prompt = ""
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry."
example_prompt = 'A REALISTIC PHOTOGRAPH OF A WALL WITH "TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE" PROMINENTLY DISPLAYED'
# 文生图
IMG_SIZE = 256
image = pipeline.generate_image(
example_prompt,
hw=(IMG_SIZE, IMG_SIZE),
num_images_per_caption=1,
positive_prompt=positive_prompt,
negative_prompt=negative_prompt,
cfg=7.5,
cfg_img=1.0,
cfg_schedule="constant",
use_norm=False,
num_sampling_steps=28,
timesteps_shift=1.0,
seed=3407,
)[0]
Path("./output").mkdir(parents=True, exist_ok=True)
image.save("./output/output.jpg")以下图片是使用上述提示词在昇腾 NPU 上生成的结果:

运行 benchmark.py 测量 NPU 上的推理性能:
python3 benchmark.py| Batch Size | 总耗时 (s) | 单张耗时 (s) |
|---|---|---|
| 1 | 42.63 | 42.63 |
| 2 | 43.51 | 21.75 |
| 4 | 45.06 | 11.26 |
由于该模型规模较大,CPU 环境无法完成模型加载,因此未执行 NPU vs CPU 像素级精度对比。
NPU 推理正确性已通过 inference.py 验证:使用相同提示词在昇腾 NPU 上可稳定生成符合预期的图像,详见上方推理输出章节。
如果 NextStep 对你的研究和应用有所帮助,欢迎为本仓库点赞并引用:
@article{nextstepteam2025nextstep1,
title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},
author={NextStep Team and Chunrui Han and Guopeng Li and Jingwei Wu and Quan Sun and Yan Cai and Yuang Peng and Zheng Ge and Deyu Zhou and Haomiao Tang and Hongyu Zhou and Kenkun Liu and Ailin Huang and Bin Wang and Changxin Miao and Deshan Sun and En Yu and Fukun Yin and Gang Yu and Hao Nie and Haoran Lv and Hanpeng Hu and Jia Wang and Jian Zhou and Jianjian Sun and Kaijun Tan and Kang An and Kangheng Lin and Liang Zhao and Mei Chen and Peng Xing and Rui Wang and Shiyu Liu and Shutao Xia and Tianhao You and Wei Ji and Xianfang Zeng and Xin Han and Xuelin Zhang and Yana Wei and Yanming Xu and Yimin Jiang and Yingming Wang and Yu Zhou and Yucheng Han and Ziyang Meng and Binxing Jiao and Daxin Jiang and Xiangyu Zhang and Yibo Zhu},
journal={arXiv preprint arXiv:2508.10711},
year={2025}
}