NextStep-1.1-Pretrain-256px:可用于在昇腾 NPU 上进行文本到图像生成，项目通过强化学习提升图像纹理质量、减少伪影，解决数值不稳定性问题，支持 NPU 推理及性能评测。【此简介由AI生成】

NextStep-1.1

主页 | GitHub | 论文

我们推出了 NextStep-1.1，这是 NextStep 系列的一次重大飞跃。该版本有效解决了 NextStep-1 中出现的可视化失败问题，并通过扩展训练以及基于流的强化学习（RL）后训练范式，显著提升了图像质量。

1.1 版本新特性

NextStep-1.1 不仅仅是微调；它是一个专注于稳定性和高保真输出的重新设计版本。主要改进包括：

RL 增强视觉保真度：通过 RL 显著提升图像纹理质量，大幅减少视觉伪影，确保输出更干净、更专业。
技术稳定性：解决了自回归流模型中固有的数值不稳定性问题。

昇腾 NPU 适配

本仓库已完成对华为昇腾 NPU 的适配。模型可通过 device_map="npu:0" 直接加载到 NPU 上，pipeline 开箱即用地支持 NPU 推理。

适配细节

models 目录：从原始权重仓库复制，并增加了昇腾 NPU 适配（如 torch.npu 随机种子处理）。
推理脚本：inference.py 将模型加载到 NPU 并生成图像。
性能评测脚本：benchmark.py 测量不同 batch size 下的推理耗时。
精度验证脚本：accuracy.py 提供 NPU 与 CPU 精度对比的验证框架。

权重下载

原始模型权重地址：https://huggingface.co/stepfun-ai/NextStep-1.1-Pretrain-256px

使用 ModelScope 下载模型权重：

modelscope download --model stepfun-ai/NextStep-1.1-Pretrain-256px --local_dir /opt/atomgit/weight/NextStep-1.1-Pretrain-256px

环境配置

下载权重后，安装权重目录下的依赖：

pip install -r /opt/atomgit/weight/NextStep-1.1-Pretrain-256px/requirements.txt

NPU 环境

请确保已安装华为昇腾 CANN 工具包和 torch_npu：

# 以 CANN 8.5 为例
source /usr/local/Ascend/ascend-toolkit/set_env.sh
pip install torch_npu

若你使用的是 torch==2.9.0+cpu，请确保同时安装 accelerate：

pip install torch==2.9.0+cpu accelerate

使用方法

NPU 推理

from pathlib import Path
from models.gen_pipeline import NextStepPipeline
from transformers import AutoModel, AutoTokenizer

HF_HUB = "/opt/atomgit/weights/stepfun-ai/NextStep-1.1-Pretrain-256px"

# 加载模型和 tokenizer
tokenizer = AutoTokenizer.from_pretrained(HF_HUB, local_files_only=True, trust_remote_code=True)
model = AutoModel.from_pretrained(
    HF_HUB, device_map="npu:0", local_files_only=True, trust_remote_code=True
)
pipeline = NextStepPipeline(
    model_name_or_path=HF_HUB, tokenizer=tokenizer, model=model, device="npu:0"
)

# 设置提示词
positive_prompt = ""
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry."
example_prompt = 'A REALISTIC PHOTOGRAPH OF A WALL WITH "TOWARD AUTOREGRESSIVE IMAGE GENERATION WITH CONTINUOUS TOKENS AT SCALE" PROMINENTLY DISPLAYED'

# 文生图
IMG_SIZE = 256
image = pipeline.generate_image(
    example_prompt,
    hw=(IMG_SIZE, IMG_SIZE),
    num_images_per_caption=1,
    positive_prompt=positive_prompt,
    negative_prompt=negative_prompt,
    cfg=7.5,
    cfg_img=1.0,
    cfg_schedule="constant",
    use_norm=False,
    num_sampling_steps=28,
    timesteps_shift=1.0,
    seed=3407,
)[0]

Path("./output").mkdir(parents=True, exist_ok=True)
image.save("./output/output.jpg")

推理输出

以下图片是使用上述提示词在昇腾 NPU 上生成的结果：

NPU 推理输出

性能评测

运行 benchmark.py 测量 NPU 上的推理性能：

python3 benchmark.py

昇腾 NPU 评测结果

Batch Size	总耗时 (s)	单张耗时 (s)
1	42.63	42.63
2	43.51	21.75
4	45.06	11.26

精度验证

由于该模型规模较大，CPU 环境无法完成模型加载，因此未执行 NPU vs CPU 像素级精度对比。

NPU 推理正确性已通过 inference.py 验证：使用相同提示词在昇腾 NPU 上可稳定生成符合预期的图像，详见上方推理输出章节。

引用

如果 NextStep 对你的研究和应用有所帮助，欢迎为本仓库点赞并引用：

@article{nextstepteam2025nextstep1,
  title={NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale},
  author={NextStep Team and Chunrui Han and Guopeng Li and Jingwei Wu and Quan Sun and Yan Cai and Yuang Peng and Zheng Ge and Deyu Zhou and Haomiao Tang and Hongyu Zhou and Kenkun Liu and Ailin Huang and Bin Wang and Changxin Miao and Deshan Sun and En Yu and Fukun Yin and Gang Yu and Hao Nie and Haoran Lv and Hanpeng Hu and Jia Wang and Jian Zhou and Jianjian Sun and Kaijun Tan and Kang An and Kangheng Lin and Liang Zhao and Mei Chen and Peng Xing and Rui Wang and Shiyu Liu and Shutao Xia and Tianhao You and Wei Ji and Xianfang Zeng and Xin Han and Xuelin Zhang and Yana Wei and Yanming Xu and Yimin Jiang and Yingming Wang and Yu Zhou and Yucheng Han and Ziyang Meng and Binxing Jiao and Daxin Jiang and Xiangyu Zhang and Yibo Zhu},
  journal={arXiv preprint arXiv:2508.10711},
  year={2025}
}