WebWorld 是一个大规模开放网络世界模型系列,专为训练和评估网络智能体而设计。它通过可扩展的分层数据管道,在100 多万条真实世界网络交互轨迹上进行训练,支持以下功能:
在 WebWorld 合成轨迹上训练的智能体在 MiniWob++ 上性能提升 9.9%,在 WebArena 上提升 10.9%。当用于推理时的前瞻搜索时,WebWorld 作为世界模型性能优于 GPT-5。
| 模型 | 基础模型 | HuggingFace 链接 | ModelScope 链接 |
|---|---|---|---|
| WebWorld-8B | Qwen3-8B | 🤗 HuggingFace | 🤖 ModelScope |
| WebWorld-14B | Qwen3-14B | 🤗 HuggingFace | 🤖 ModelScope |
| WebWorld-32B | Qwen3-32B | 🤗 HuggingFace | 🤖 ModelScope |
WebWorldData:Huggingface: Qwen/WebWorldData,ModelScope: Qwen/WebWorldData
💡 推荐:8B 模型适用于快速模拟和数据合成;14B/32B 模型适用于更高保真度的模拟和更强的长周期鲁棒性。若要在特定环境中获得最佳效果,建议使用领域内轨迹进行任务特定微调。
transformers(推荐:最新版本)torchaccelerate、vllm(用于高效部署)核心说明:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Qwen/WebWorld-8B" # or WebWorld-14B, WebWorld-32B
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).eval()
system_prompt = (
"You are a web world model. I will provide you with an initial page state "
"and a sequence of actions. For each action, predict the resulting page state.\n"
"Strictly maintain the original format. Output only the full page state "
"without explanations, code, or truncation."
)
current_state = """RootWebArea 'Global Start - Your Daily Portal', focused
\t[1] banner 'Top Header', visible
\t\t[2] link 'Set as Homepage', clickable, visible
\t\t[3] link 'Feedback', clickable, visible
\t\t[5] region 'Weather Widget', visible
\t\t\tStaticText 'New York, USA'
\t\t\t[6] image 'Sunny', visible
\t\t\tStaticText '24°C'
\t\t[8] link 'Sign In', clickable, visible
\t[10] region 'Search Area', visible
\t\t[11] image 'Global Start Logo', visible
\t\tStaticText 'Search the entire web'
\t\t[12] tablist 'Search Engine Selector', orientation='horizontal'
\t\t\t[13] tab 'Google', selected=True, clickable
\t\t\t[14] tab 'Bing', selected=False, clickable
\t\t\t[15] tab 'DuckDuckGo', selected=False, clickable
\t\t[18] combobox 'Web Search', clickable, visible, autocomplete='both', expanded=False
\t\t\t[19] textbox 'Type keywords or URL...', clickable, visible, editable, value=''
\t\t[20] button 'Search', clickable, visible
\t[30] navigation 'Category Bar', visible
\t\t[31] link 'Home', clickable, selected=True
\t\t[32] link 'News', clickable
\t\t[33] link 'Video', clickable
\t\t[34] link 'Shopping', clickable
\t\t[35] link 'Social', clickable
\t[50] main 'Site Directory', visible
\t\t[51] region 'Top Recommended', visible
\t\t\t[52] heading 'Most Popular', visible
\t\t\t[53] list 'Top Sites Grid', visible
\t\t\t\t[54] link 'Facebook', clickable
\t\t\t\t[56] link 'YouTube', clickable
\t\t\t\t[58] link 'Amazon', clickable
\t\t\t\t[60] link 'Twitter / X', clickable
\t\t\t\t[62] link 'Instagram', clickable
\t\t\t\t[64] link 'Wikipedia', clickable
\t\t\t\t[66] link 'Netflix', clickable
\t\t\t\t[68] link 'LinkedIn', clickable
\t\t[80] region 'News & Media', visible
\t\t\t[81] heading 'Latest News', visible
\t\t\t[82] link 'CNN', clickable
\t\t\t[83] link 'BBC', clickable
\t\t\t[84] link 'The Verge', clickable
\t\t[90] region 'Shopping', visible
\t\t\t[91] heading 'E-Commerce', visible
\t\t\t[92] link 'eBay', clickable
\t\t\t[93] link 'Walmart', clickable
\t\t\t[94] link 'Best Buy', clickable
\t[200] complementary 'Ads', visible
\t\t[201] image 'Ad: Travel to Japan'
\t\t[202] link 'Book Now', clickable
\t[300] contentinfo 'Footer', visible
\t\tStaticText '© 2026 Global Start Inc.'"""
user_message = (
f"Initial Page State:\n{current_state}\n\n"
f"First Action: 'click([32])'\n\n"
f"Next Page State:"
)
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=4096,
do_sample=False,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)第一轮提供初始状态和首个动作。后续每一轮均使用固定的续写提示:
CONTINUE_PROMPT = (
"Continue the trajectory. Given the previous state, "
"predict the next page state after this action.\n\n"
"Action: '{action}'\n\nNext Page State:"
)
# Turn 1
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Initial Page State:\n{state_0}\n\nFirst Action: '{action_0}'\n\nNext Page State:"},
]
state_1 = generate(messages) # your generate function
# Turn 2
messages.append({"role": "assistant", "content": state_1})
messages.append({"role": "user", "content": CONTINUE_PROMPT.format(action=action_1)})
state_2 = generate(messages)
# Turn 3, 4, ... up to 30+ turns: repeat the same pattern
messages.append({"role": "assistant", "content": state_2})
messages.append({"role": "user", "content": CONTINUE_PROMPT.format(action=action_2)})
state_3 = generate(messages)WebWorld 支持统一的动作空间,形式为 Python 风格的函数调用:
| 类别 | 动作 | 描述 |
|---|---|---|
| 元素操作 | click(bid, button, modifiers) | 通过 DOM 元素 ID 点击元素 |
fill(bid, text, press_enter) | 在输入框中输入文本 | |
select_option(bid, options) | 从下拉菜单/组合框中选择选项 | |
hover(bid) | 悬停在元素上 | |
| 鼠标操作 | mouse_move(x, y) | 将光标移动到指定坐标 |
mouse_click(x, y, button) | 在指定坐标处点击 | |
mouse_down(x, y) / mouse_up(x, y) | 按下/释放鼠标(用于拖放) | |
| 键盘操作 | keyboard_press(key) | 按下单个键(例如 Enter、Tab) |
keyboard_type(text) | 按顺序输入字符串 | |
| 浏览器操作 | scroll(dx, dy) | 滚动视口 |
goto(url) | 导航至指定 URL | |
go_back() / go_forward() | 浏览器历史记录导航 | |
tab_new() / tab_close() / tab_focus(index) | 管理浏览器标签页 | |
| 元操作 | send_msg_to_user(text) | 向用户发送消息 |
noop(wait_ms) | 等待指定时长 | |
infeasible(reason) | 声明任务无法完成 |
WebWorld-Bench 通过事实性得分(功能正确性)和网络图灵得分(感知真实性)在九个维度上评估模型:
| 模型 | 平均事实性得分 | 平均图灵得分 |
|---|---|---|
| GPT-4o | 59.5 | 35.4 |
| Claude-Opus-4.1 | 71.3 | 47.4 |
| Gemini-3-Pro | 70.3 | 43.2 |
| Qwen3-8B(基础版) | 26.9 | 17.4 |
| WebWorld-8B | 70.1 | 42.2 |
| WebWorld-14B | 70.7 | 44.7 |
| WebWorld-32B | 71.0 | 45.6 |
| 模型 | MiniWob++ 成功率 | WebArena 成功率 |
|---|---|---|
| GPT-4o | 64.3% | 26.6% |
| Qwen3-8B(基础版) | 49.4% | 9.8% |
| Qwen3-8B + WebWorld | 59.3%(提升 9.9%) | 20.7%(提升 10.9%) |
| Qwen3-14B(基础版) | 54.9% | 15.1% |
| Qwen3-14B + WebWorld | 63.2%(提升 8.3%) | 24.3%(提升 9.2%) |
| 环境 | Qwen3-8B | WebWorld-8B | 提升值 |
|---|---|---|---|
| API 服务 | 0.088 | 0.299 | +0.211 |
| 代码 | 0.147 | 0.396 | +0.249 |
| 游戏 | 0.253 | 0.473 | +0.220 |
| GUI 桌面 | 0.322 | 0.705 | +0.383 |
@misc{xiao2026webworldlargescaleworldmodel,
title={WebWorld: A Large-Scale World Model for Web Agent Training},
author={Zikai Xiao and Jianhong Tu and Chuhang Zou and Yuxin Zuo and Zhi Li and Peng Wang and Bowen Yu and Fei Huang and Junyang Lin and Zuozhu Liu},
year={2026},
eprint={2602.14721},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.14721},
}