JoyAI-Image-Edit
_{^{唤醒统一多模态理解与生成中的空间智能}}

🐶 JoyAI-Image-Edit

JoyAI-Image-Edit 是一款专注于指令引导图像编辑的多模态基础模型。它通过强大的空间理解能力（包括场景解析、关系定位和指令分解）实现精准可控的编辑，能够将复杂修改准确应用于指定区域。

🚀 快速开始

1. 环境配置

要求：Python >= 3.10，支持 CUDA 的 GPU

创建虚拟环境并安装：

git clone https://github.com/jd-opensource/JoyAI-Image
cd JoyAI-Image
conda create -n joyai python=3.10 -y
conda activate joyai

pip install -e .

关于 Flash Attention 的说明：为获得最佳性能，flash-attn >= 2.8.0 已列为依赖项。

核心依赖项

包	版本	用途
`torch`	>= 2.8	PyTorch
`transformers`	>= 4.57.0, < 4.58.0	文本编码器
`diffusers`	>= 0.34.0	流水线工具
`flash-attn`	>= 2.8.0	快速注意力内核

2. 推理

图像编辑

python inference.py \
  --ckpt-root /path/to/ckpts_infer \
  --prompt "Turn the plate blue" \
  --image test_images/test_1.jpg \
  --output outputs/result.png \
  --seed 123 \
  --steps 50 \
  --guidance-scale 4.0 \
  --basesize 1024

命令行界面参考（`inference.py`）

参数	类型	默认值	描述
`--ckpt-root`	str	必填	检查点根目录
`--prompt`	str	必填	编辑指令或文本生成图像提示词
`--image`	str	None	输入图像路径（编辑时必填，文本生成图像时省略）
`--output`	str	`example.png`	输出图像路径
`--steps`	int	50	去噪步数
`--guidance-scale`	float	4.0	无分类器引导尺度
`--seed`	int	42	用于结果复现的随机种子
`--neg-prompt`	str	`""`	负面提示词
`--basesize`	int	1024	输入图像 resize 的基础尺寸（256/512/768/1024）
`--config`	str	auto	配置文件路径；默认为 `<ckpt-root>/infer_config.py`
`--rewrite-prompt`	flag	off	启用基于大语言模型的提示词重写
`--hsdp-shard-dim`	int	1	多 GPU 场景下的 FSDP 分片维度（设置为 GPU 数量）

空间编辑参考

JoyAI-Image 支持三种空间编辑提示词模式：物体移动、物体旋转和相机控制。为获得最稳定的效果，我们建议尽可能严格遵循以下提示词模板。

1. 物体移动

当您希望将目标物体移动到指定区域时，使用此模式。

提示词模板：

Move the <object> into the red box and finally remove the red box.

规则：

将 <object> 替换为待移动目标对象的清晰描述。
红色方框 表示图像中的目标位置。
短语 "finally remove the red box" 意味着最终编辑结果中不应出现引导方框。

示例：

Move the apple into the red box and finally remove the red box.

2. 对象旋转

当你希望将对象旋转至特定标准视角时，可使用此模式。

提示词模板：

Rotate the <object> to show the <view> side view.

支持的 <view> 值：

front
right
left
rear
front right
front left
rear right
rear left

规则：

用待旋转物体的清晰描述替换 <object>。
用上述支持的方向之一替换 <view>。
本指令旨在改变物体朝向，同时尽可能保持物体本身及周围场景的一致性。

示例：

Rotate the chair to show the front side view.
Rotate the car to show the rear left side view.

3. 相机控制

当你希望仅更改相机视角，而保持3D场景本身不变时，使用此模式。

提示词模板：

Move the camera.
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
- Camera zoom: in/out/unchanged.
- Keep the 3D scene static; only change the viewpoint.

规则：

{y_rotation} 指定偏航旋转角度（以度为单位）。
{p_rotation} 指定俯仰旋转角度（以度为单位）。
“Camera zoom” 必须是以下选项之一：
- in
- out
- unchanged
最后一行至关重要：它明确告知模型保留 3D 场景内容和几何结构，仅调整相机视角。

示例：

Move the camera.
- Camera rotation: Yaw 45°, Pitch 0°.
- Camera zoom: in.
- Keep the 3D scene static; only change the viewpoint.

Move the camera.
- Camera rotation: Yaw -90°, Pitch 20°.
- Camera zoom: unchanged.
- Keep the 3D scene static; only change the viewpoint.

许可协议

JoyAI-Image 基于 Apache 2.0 许可协议授权。

☎️ 我们正在招聘！

我们正在积极招聘研究科学家、工程师和实习生，加入我们共同构建下一代生成式基础模型并将其应用于实际场景。如果您感兴趣，请将简历发送至：huanghaoyang.ocean@jd.com

JoyAI-Image-Edit
_{^{唤醒统一多模态理解与生成中的空间智能}}

🐶 JoyAI-Image-Edit

🚀 快速开始

1. 环境配置

要求：Python >= 3.10，支持 CUDA 的 GPU

创建虚拟环境并安装：

git clone https://github.com/jd-opensource/JoyAI-Image
cd JoyAI-Image
conda create -n joyai python=3.10 -y
conda activate joyai

pip install -e .

关于 Flash Attention 的说明：为获得最佳性能，flash-attn >= 2.8.0 已列为依赖项。

核心依赖项

包	版本	用途
`torch`	>= 2.8	PyTorch
`transformers`	>= 4.57.0, < 4.58.0	文本编码器
`diffusers`	>= 0.34.0	流水线工具
`flash-attn`	>= 2.8.0	快速注意力内核

2. 推理

图像编辑

python inference.py \
  --ckpt-root /path/to/ckpts_infer \
  --prompt "Turn the plate blue" \
  --image test_images/test_1.jpg \
  --output outputs/result.png \
  --seed 123 \
  --steps 50 \
  --guidance-scale 4.0 \
  --basesize 1024

命令行界面参考（`inference.py`）

参数	类型	默认值	描述
`--ckpt-root`	str	必填	检查点根目录
`--prompt`	str	必填	编辑指令或文本生成图像提示词
`--image`	str	None	输入图像路径（编辑时必填，文本生成图像时省略）
`--output`	str	`example.png`	输出图像路径
`--steps`	int	50	去噪步数
`--guidance-scale`	float	4.0	无分类器引导尺度
`--seed`	int	42	用于结果复现的随机种子
`--neg-prompt`	str	`""`	负面提示词
`--basesize`	int	1024	输入图像 resize 的基础尺寸（256/512/768/1024）
`--config`	str	auto	配置文件路径；默认为 `<ckpt-root>/infer_config.py`
`--rewrite-prompt`	flag	off	启用基于大语言模型的提示词重写
`--hsdp-shard-dim`	int	1	多 GPU 场景下的 FSDP 分片维度（设置为 GPU 数量）

空间编辑参考

JoyAI-Image 支持三种空间编辑提示词模式：物体移动、物体旋转和相机控制。为获得最稳定的效果，我们建议尽可能严格遵循以下提示词模板。

1. 物体移动

当您希望将目标物体移动到指定区域时，使用此模式。

提示词模板：

Move the <object> into the red box and finally remove the red box.

规则：

将 <object> 替换为待移动目标对象的清晰描述。
红色方框 表示图像中的目标位置。
短语 "finally remove the red box" 意味着最终编辑结果中不应出现引导方框。

示例：

Move the apple into the red box and finally remove the red box.

2. 对象旋转

当你希望将对象旋转至特定标准视角时，可使用此模式。

提示词模板：

Rotate the <object> to show the <view> side view.

支持的 <view> 值：

front
right
left
rear
front right
front left
rear right
rear left

规则：

用待旋转物体的清晰描述替换 <object>。
用上述支持的方向之一替换 <view>。
本指令旨在改变物体朝向，同时尽可能保持物体本身及周围场景的一致性。

示例：

Rotate the chair to show the front side view.
Rotate the car to show the rear left side view.

3. 相机控制

当你希望仅更改相机视角，而保持3D场景本身不变时，使用此模式。

提示词模板：

Move the camera.
- Camera rotation: Yaw {y_rotation}°, Pitch {p_rotation}°.
- Camera zoom: in/out/unchanged.
- Keep the 3D scene static; only change the viewpoint.

规则：

{y_rotation} 指定偏航旋转角度（以度为单位）。
{p_rotation} 指定俯仰旋转角度（以度为单位）。
“Camera zoom” 必须是以下选项之一：
- in
- out
- unchanged
最后一行至关重要：它明确告知模型保留 3D 场景内容和几何结构，仅调整相机视角。

示例：

Move the camera.
- Camera rotation: Yaw 45°, Pitch 0°.
- Camera zoom: in.
- Keep the 3D scene static; only change the viewpoint.

Move the camera.
- Camera rotation: Yaw -90°, Pitch 20°.
- Camera zoom: unchanged.
- Keep the 3D scene static; only change the viewpoint.

许可协议

JoyAI-Image 基于 Apache 2.0 许可协议授权。

JoyAI-Image-Edit唤醒统一多模态理解与生成中的空间智能

🐶 JoyAI-Image-Edit

🚀 快速开始

1. 环境配置

核心依赖项

2. 推理

图像编辑

命令行界面参考（inference.py）

空间编辑参考

1. 物体移动

2. 对象旋转

3. 相机控制

许可协议

☎️ 我们正在招聘！

JoyAI-Image-Edit唤醒统一多模态理解与生成中的空间智能

🐶 JoyAI-Image-Edit

🚀 快速开始

1. 环境配置

核心依赖项

2. 推理

图像编辑

命令行界面参考（inference.py）

空间编辑参考

1. 物体移动

2. 对象旋转

3. 相机控制

许可协议

☎️ 我们正在招聘！

JoyAI-Image-Edit
_{^{唤醒统一多模态理解与生成中的空间智能}}

命令行界面参考（`inference.py`）

JoyAI-Image-Edit
_{^{唤醒统一多模态理解与生成中的空间智能}}

命令行界面参考（`inference.py`）