tencent_hunyuan/POINTS-Seeker
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

HuggingFace Paper

🌟 模型概述

POINTS-Seeker-8B 是一款先进的多模态智能体搜索模型,它从零开始构建,旨在突破大型多模态模型(LMM)中静态参数化知识的认知局限。不同于在现有LMM基础上简单集成搜索工具的方式,POINTS-Seeker 采用智能体种子训练(Agentic Seeding) 进行原生训练——这是一个专门的训练阶段,用于植入智能体行为的基础前提——并配备了V-Fold自适应历史感知压缩机制,有效解决了长序列交互中的性能瓶颈问题。POINTS-Seeker-8B 在长序列、知识密集型视觉推理任务上展现出卓越性能。

快速开始

使用 Transformers 运行

请先通过以下命令安装 WePOINTS:

git clone https://github.com/WePOINTS/WePOINTS.git
cd ./WePOINTS
pip install -e .
from transformers import AutoModelForCausalLM, AutoTokenizer, Qwen2VLImageProcessor
import torch

user_prompt = "explain the image"  # replace with your instruction
image_path = 'your image path'
model_path = 'tencent/POINTS-Seeker'
model = AutoModelForCausalLM.from_pretrained(model_path,
                                             trust_remote_code=True,
                                             dtype=torch.bfloat16,
                                             device_map='cuda')
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
image_processor = Qwen2VLImageProcessor.from_pretrained(model_path)
content = [
            dict(type='image', image=image_path),
            dict(type='text', text=user_prompt)
          ]
messages = [
        {
            'role': 'user',
            'content': content
        }
    ]
generation_config = {
        'max_new_tokens': 2048,
        'do_sample': False
    }
response = model.chat(
    messages,
    tokenizer,
    image_processor,
    generation_config
)
print(response)

多模态智能体搜索

请参考我们的 github 仓库