HuggingFace镜像/Eagle2-2B
模型介绍文件和版本分析
下载使用量0

Eagle-2

[

📂GitHub📂 GitHub📂GitHub

](https://github.com/NVlabs/EAGLE) [

📜Eagle2技术报告📜 Eagle2 技术报告📜Eagle2技术报告

](http://arxiv.org/abs/2501.14818) [

🤗HF演示🤗 HF 演示🤗HF演示

](https://huggingface.co/spaces/nvidia/Eagle2-Demo)

最新动态:

  • 我们已将模型架构更新为 eagle_2_5_vl,以支持 generate 功能。

简介

我们非常高兴发布最新的 Eagle2 系列视觉语言模型。开源视觉语言模型(VLM)在缩小与专有模型差距方面取得了显著进展。然而,关于数据策略和实现的关键细节往往缺失,限制了可复现性和创新。在本项目中,我们从数据中心的角度专注于 VLM 后训练,分享从零开始构建有效数据策略的见解。通过将这些策略与稳健的训练方法和模型设计相结合,我们推出了 Eagle2 这一系列高性能 VLM。我们的工作旨在赋能开源社区,通过透明流程开发具有竞争力的 VLM。

在本仓库中,我们开源了 Eagle2-2B,这是一款轻量级模型,在保持稳定性能的同时实现了卓越的效率和速度。

模型库

我们提供以下模型:

模型名称语言模型视觉模型最大长度HF 链接
Eagle2-1BQwen2.5-0.5B-InstructSiglip16K🤗 链接
Eagle2-2BQwen2.5-1.5B-InstructSiglip16K🤗 链接
Eagle2-9BQwen2.5-7B-InstructSiglip+ConvNext16K🤗 链接

基准测试结果

基准测试InternVL2-2BInternVL2.5-2BInternVL2-4BQwen2-VL-2BEagle2-2B
DocVQAtest86.988.789.290.188.0
ChartQAtest76.279.281.573.082.0
InfoVQAtest58.960.967.065.565.8
TextVQAval73.474.374.479.779.1
OCRBench784804788809818
MMEsum1876.82138.22059.81872.02109.8
RealWorldQA57.360.160.762.663.1
AI2Dtest74.174.974.778.979.3
MMMUval36.343.647.941.143.1
MMVetGPT-4-Turbo39.560.851.049.553.8
HallBenchavg37.942.641.941.745.8
MathVistatestmini46.351.358.643.054.7
MMstar50.153.754.348.056.4

快速开始

我们提供了一个推理脚本,帮助您快速开始使用模型。我们支持多种输入类型:

  • 纯文本输入
  • 单张图像输入
  • 多张图像输入
  • 视频输入

安装依赖项

pip install transformers
pip install flash-attn

单张图像

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.ilankelman.org/stopsigns/australia.jpg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)]
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True)
inputs = inputs.to("cuda")
model = model.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1024)
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

流式生成

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel, AutoTokenizer
import torch

from transformers import TextIteratorStreamer
import threading


model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, attn_implementation='flash_attention_2', torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.ilankelman.org/stopsigns/australia.jpg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)]
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True)
inputs = inputs.to("cuda")
model = model.to("cuda")

streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

generation_kwargs = dict(
    **inputs,
    streamer=streamer,
    max_new_tokens=1024,
    do_sample=True,
    top_p=0.95,
    temperature=0.8
)
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()


for new_text in streamer:
    print(new_text, end="", flush=True)

多图像

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.ilankelman.org/stopsigns/australia.jpg",
            },
            {
                "type": "image",
                "image": "https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png",
            },
            {"type": "text", "text": "Describe these two images."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)]
image_inputs, video_inputs = processor.process_vision_info(messages)
inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True)
inputs = inputs.to("cuda")
model = model.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1024)
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

单个视频


from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video",
                "video": "../Eagle2-8B/space_woaudio.mp4",
            },
            {"type": "text", "text": "Describe this video."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)]
image_inputs, video_inputs, video_kwargs = processor.process_vision_info(messages, return_video_kwargs=True)

inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True, videos_kwargs=video_kwargs)
inputs = inputs.to("cuda")
model = model.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1024)
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

多个视频

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "video",
                "video": "../Eagle2-8B/space_woaudio.mp4",
                "nframes": 10,
            },
            {
                "type": "video",
                "video": "../Eagle2-8B/video_ocr.mp4",
                "nframes": 10,
            },
            {"type": "text", "text": "Describe these two videos respectively."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)]
image_inputs, video_inputs, video_kwargs = processor.process_vision_info(messages, return_video_kwargs=True)
inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True, videos_kwargs=video_kwargs)
inputs = inputs.to("cuda")
model = model.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1024)
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

批量推理

from PIL import Image
import requests
from transformers import AutoProcessor, AutoModel
import torch
model = AutoModel.from_pretrained("nvidia/Eagle2-1B",trust_remote_code=True, torch_dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("nvidia/Eagle2-1B", trust_remote_code=True, use_fast=True)
processor.tokenizer.padding_side = "left"

messages1 = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.ilankelman.org/stopsigns/australia.jpg",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

messages2 = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://www.nvidia.com/content/dam/en-zz/Solutions/about-nvidia/logo-and-brand/01-nvidia-logo-vert-500x200-2c50-d@2x.png",
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

text_list = [processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
) for messages in [messages1, messages2]]
image_inputs, video_inputs = processor.process_vision_info([messages1, messages2])
inputs = processor(text = text_list, images=image_inputs, videos=video_inputs, return_tensors="pt", padding=True)
inputs = inputs.to("cuda")
model = model.to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=1024)
output_text = processor.batch_decode(
    generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

待办事项

  • 支持 vLLM 推理
  • 提供 AWQ 量化权重
  • 提供微调脚本

许可协议/使用条款

  • 代码根据 Apache 2.0 许可协议发布,详见 LICENSE 文件。
  • 预训练模型权重根据 知识共享署名-非商业性使用 4.0 国际许可协议 发布
  • 本服务为研究预览版,仅供非商业用途,受以下许可协议和条款约束:
    • Qwen2.5-1.5B-Instruct 模型许可协议:Apache-2.0
    • PaliGemma 模型许可协议:Gemma 许可协议

引用

伦理考量

NVIDIA 认为可信 AI 是一项共同责任,我们已制定相关政策和实践,以支持广泛 AI 应用的开发。当开发者按照我们的服务条款下载或使用本模型时,应与内部模型团队合作,确保该模型满足相关行业和用例的要求,并应对未预见的产品误用问题。

请通过 此处 报告安全漏洞或 NVIDIA AI 相关问题。