ChartVerse-Coder 是一款具备复杂度感知能力的图表代码生成器,能够从零开始自主合成多样化、高复杂度的图表代码。该工具作为 opendatalab/ChartVerse 项目的一部分开发而成。如需了解我们的方法、数据集和完整模型系列的更多详情,请访问我们的 项目页面。
与以往基于模板或种子条件的方法不同,ChartVerse-Coder 通过高温采样生成图表代码,能够广泛探索长尾图表分布,并生成具有高结构复杂度的多样化、逼真图表。
我们提出滚动后验熵(RPE),通过生成稳定性来量化图表的内在复杂度:
核心洞察:简单图表会产生一致的重建结果(低 RPE),而复杂图表则会导致发散的结果(高 RPE)。我们仅保留 RPE ≥ 0.4 的样本。
阶段 1:难度筛选冷启动
阶段 2:迭代自我增强
最终输出:生成 100 万 个高复杂度图表代码样本,用于下游 QA 合成。
ChartVerse-Coder 合成的图表在复杂度和多样性上显著优于所有现有数据集。
我们合成的图表展现出卓越的多样性:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load Model
model_path = "opendatalab/ChartVerse-Coder"
model = AutoModelForCausalLM.from_pretrained(
model_path, torch_dtype="auto", device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_path)
# System Prompt
prompt = """You are a Python visualization expert. Generate a random Python visualization code focusing on charts, tables, or diagrams.
Requirements:
- Choose any visualization type (chart, table, flowchart, diagram, etc.)
- Create sample data
- Use Python visualization library (matplotlib, graphviz, etc.)
- Make it visually appealing with proper labels, titles, and colors
- Include sufficient visual elements
- Carefully design the layout to avoid any overlapping text or elements
- Adjust figure size, margins, and spacing for optimal clarity
- Make it visually appealing with proper labels, titles, and colors
Output format: Only output the Python visualization code wrapped in ```python```
"""
# Generate Chart Code
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
# High-temperature sampling for diversity
outputs = model.generate(
**inputs,
max_new_tokens=4096,
temperature=1.0,
top_p=0.95,
top_k=20,
do_sample=True
)
generated_code = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_code)import re
import matplotlib.pyplot as plt
# Extract code from response
code_match = re.search(r'```python\n(.*?)```', generated_code, re.DOTALL)
if code_match:
code = code_match.group(1)
exec(code) # This will save the figure as 'image.png'@misc{liu2026chartversescalingchartreasoning,
title={ChartVerse: Scaling Chart Reasoning via Reliable Programmatic Synthesis from Scratch},
author={Zheng Liu and Honglin Lin and Chonghan Qin and Xiaoyang Wang and Xin Gao and Yu Li and Mengzhang Cai and Yun Zhu and Zhanping Zhong and Qizhi Pei and Zhuoshi Pan and Xiaoran Shang and Bin Cui and Conghui He and Wentao Zhang and Lijun Wu},
year={2026},
eprint={2601.13606},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.13606},
}本模型基于 Apache 2.0 许可证发布。