Vintern-1B-v2-ViTable-docvqa 是 5CD-AI/Vintern-1B-v2 多模态模型针对越南语 DocVQA(表格数据)的微调版本。
| 模型 | ANLS | 语义相似度 | MLLM-as-judge (Gemini) |
|---|---|---|---|
| Gemini 1.5 Flash | 0.35 | 0.56 | 0.40 |
| Vintern-1B-v2 | 0.04 | 0.45 | 0.50 |
| Vintern-1B-v2-ViTable-docvqa | 0.50 | 0.71 | 0.59 |
查看此 🤗 HF 演示,或者您可以在 Colab 中打开:
引用:
@misc{doan2024vintern1befficientmultimodallarge,
title={Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese},
author={Khang T. Doan and Bao G. Huynh and Dung T. Hoang and Thuc D. Pham and Nhat H. Pham and Quan T. M. Nguyen and Bang Q. Vo and Suong N. Hoang},
year={2024},
eprint={2408.12480},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2408.12480},
}