OpenBMB 开源社区/MiniCPM-V-4-GPTQ
模型介绍文件和版本Pull Requests讨论分析
下载使用量0

可在手机端运行的GPT-4V级别多模态大模型:支持单图、多图与视频理解

GitHub | Demo

MiniCPM-V 4.0

MiniCPM-V 4.0 是MiniCPM-V系列的最新高效模型。该模型基于SigLIP2-400M和MiniCPM4-3B构建,总参数量为41亿。它继承了MiniCPM-V 2.6强大的单图、多图和视频理解能力,并大幅提升了运行效率。MiniCPM-V 4.0的显著特点包括:

  • 🔥 领先的视觉能力 尽管仅拥有41亿参数,MiniCPM-V 4.0在OpenCompass的8项主流基准综合评测中平均得分为69.0,性能超越GPT-4.1-mini-20250414、MiniCPM-V 2.6(81亿参数,OpenCompass得分65.2)和Qwen2.5-VL-3B-Instruct(38亿参数,OpenCompass得分64.5)。同时,它在多图理解和视频理解任务上也表现出色。

  • 🚀 卓越的效率 MiniCPM-V 4.0专为端侧部署设计,可在终端设备上流畅运行。例如,在iPhone 16 Pro Max上,它实现了首token延迟低于2秒,解码速度超过17 token/s,且无发热问题。在并发请求下,其吞吐量也表现优异。

  • 💫 易用性 MiniCPM-V 4.0可通过多种方式轻松使用,包括llama.cpp、Ollama、vLLM、SGLang、LLaMA-Factory和本地网页演示等。我们还开源了可在iPhone和iPad上运行的iOS应用。通过结构清晰的Cookbook,您可以轻松上手,其中包含详细的使用说明和实用示例。

评测结果

点击查看OpenCompass单图评测结果
modelSizeOpencompassOCRBenchMathVistaHallusionBenchMMMUMMVetMMBench V1.1MMStarAI2D
闭源模型
GPT-4v-20240409-63.565655.243.961.767.579.856.078.6
Gemini-1.5-Pro-64.575458.345.660.664.073.959.179.1
GPT-4.1-mini-20250414-68.984070.949.355.074.380.960.976.0
Claude 3.5 Sonnet-20241022-70.679865.355.566.470.181.765.181.2
开源模型
Qwen2.5-VL-3B-Instruct3.8B64.582861.246.651.260.076.856.381.4
InternVL2.5-4B3.7B65.182060.846.651.861.578.258.781.4
Qwen2.5-VL-7B-Instruct8.3B70.988868.151.958.069.782.264.184.3
InternVL2.5-8B8.1B68.182164.549.056.262.882.563.284.6
MiniCPM-V-2.68.1B65.285260.848.149.860.078.057.582.1
MiniCPM-o-2.68.7B70.288973.351.150.967.280.663.386.1
MiniCPM-V-4.04.1B69.089466.950.851.268.079.762.882.9
点击查看ChartQA、MME、RealWorldQA、TextVQA、DocVQA、MathVision、DynaMath、WeMath、Object HalBench和MM Halbench单图评测结果
modelSizeChartQAMMERealWorldQATextVQADocVQAMathVisionDynaMathWeMathObj HalMM Hal
CHAIRs↓CHAIRi↓score avg@3↑hall rate avg@3↓
闭源模型
GPT-4v-20240409-78.5192761.478.088.4-------
Gemini-1.5-Pro-87.2-67.578.893.141.031.550.5----
GPT-4.1-mini-20250414------45.347.7-----
Claude 3.5 Sonnet-20241022-90.8-60.174.195.235.635.744.0----
开源模型
Qwen2.5-VL-3B-Instruct3.8B84.0215765.479.393.921.913.222.918.310.83.9 33.3
InternVL2.5-4B3.7B84.0233864.376.891.618.415.221.213.78.73.2 46.5
Qwen2.5-VL-7B-Instruct8.3B87.3234768.584.995.725.421.836.213.37.94.1 31.6
InternVL2.5-8B8.1B84.8234470.179.193.017.09.423.518.311.63.6 37.2
MiniCPM-V-2.68.1B79.4234865.080.190.817.59.020.47.34.74.0 29.9
MiniCPM-o-2.68.7B86.9237268.182.093.521.710.425.26.33.44.1 31.3
MiniCPM-V-4.04.1B84.4229868.580.892.920.714.232.76.33.54.1 29.2
点击查看Mantis、Blink和Video-MME多图与视频理解评测结果
modelSizeMantisBlinkVideo-MME
wo subsw subs
闭源模型
GPT-4v-20240409-62.754.659.963.3
Gemini-1.5-Pro--59.175.081.3
GPT-4o-20240513--68.071.977.2
开源模型
Qwen2.5-VL-3B-Instruct3.8B-47.661.567.6
InternVL2.5-4B3.7B62.750.862.363.6
Qwen2.5-VL-7B-Instruct8.3B-56.465.171.6
InternVL2.5-8B8.1B67.754.864.266.9
MiniCPM-V-2.68.1B69.153.060.963.6
MiniCPM-o-2.68.7B71.956.763.969.6
MiniCPM-V-4.04.1B71.454.061.265.8

示例

math

可通过iOS demo在iPhone 16 Pro Max上本地运行。

使用方法

from PIL import Image
import torch
from transformers import AutoModel, AutoTokenizer

model_path = 'openbmb/MiniCPM-V-4'
model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
                                  # sdpa or flash_attention_2, no eager
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
    model_path, trust_remote_code=True)



image = Image.open('./assets/single.png').convert('RGB')

# First round chat 
question = "What is the landform in the picture?"
msgs = [{'role': 'user', 'content': [image, question]}]

answer = model.chat(
    msgs=msgs,
    image=image,
    tokenizer=tokenizer
)
print(answer)


# Second round chat, pass history context of multi-turn conversation
msgs.append({"role": "assistant", "content": [answer]})
msgs.append({"role": "user", "content": [
            "What should I pay attention to when traveling here?"]})

answer = model.chat(
    msgs=msgs,
    image=None,
    tokenizer=tokenizer
)
print(answer)

许可协议

模型许可

  • MiniCPM-o/V 模型权重及代码基于 Apache-2.0 许可协议开源。
  • 为帮助我们更好地了解和支持用户,如您方便,恳请您考虑填写一份简短的注册"问卷",我们将不胜感激。

声明

  • MiniCPM-V 4.0 作为一个大型多模态模型(LMM),通过学习海量多模态语料生成内容,但它不具备理解能力、无法表达个人观点或进行价值判断。MiniCPM-V 4.0 生成的任何内容均不代表模型开发者的观点和立场。
  • 对于因使用 MiniCPM-V 模型而引发的任何问题,包括但不限于数据安全问题、舆情风险,或因模型的误导、误用、传播或滥用所产生的任何风险和问题,我们不承担责任。

核心技术及其他多模态项目

👏 欢迎探索 MiniCPM-V 2.6 的核心技术以及我们团队的其他多模态项目:

VisCPM | RLHF-V | LLaVA-UHD | RLAIF-V

引用

如果您觉得我们的工作对您有所帮助,请考虑引用我们的论文 📝 并为该项目点赞 ❤️!

@article{yao2024minicpm,
  title={MiniCPM-V: A GPT-4V Level MLLM on Your Phone},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Wang, Chongyi and Cui, Junbo and Zhu, Hongji and Cai, Tianchi and Li, Haoyu and Zhao, Weilin and He, Zhihui and others},
  journal={Nat Commun 16, 5509 (2025)},
  year={2025}
}