HunYuan-OCR 项目说明

项目简介

本项目目标为“小窗体验需求”，基于 vLLM 部署腾讯混元 OCR（HunYuan-OCR）大模型，支持图片转文本的能力。模型完整地部署在 昇腾 910B NPU 上，并使用 docker 和 docker-compose 方式进行一键部署，实现快速体验 OCR 推理服务。

项目目录结构如下：

.
├── app        # 网络应用相关功能与服务
├── t-unit     # 单元测试工具（包含 http 与 base64 测试样例）
├── deploy     # 部署相关代码，包含 vllm-ascend 与 hunyuanocr 部分
├── model      # 模型及其权重文件（下载后放置于此目录）

基本环境

硬件要求：昇腾 910B NPU
操作系统：Linux
推荐 Python 版本：3.10 及以上

一键部署流程

一、代码获取

从 atomgit 拉取项目源码：

git clone git@atomgit.com:wuyw/models_inference.git
cd models_inference/hunyuan-ocr

二、资源准备

下载 vllm-ascend 资源
参考：https://github.com/vllm-project/vllm-ascend
下载 HunYuan-OCR 模型
- 从 AtomGit 下载。
- 下载完成后，将模型权重目录放置于 models_inference/hunyuan-ocr/model/ 下。

三、vllm-ascend 镜像构建

git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
git pull
git fetch origin --tags
git checkout v0.12.0rc0
docker build -t vllm-ascend-cann8.3.rc2-910b-ubuntu22.04-py3.11:latest .

提示：镜像构建过程较长（约 2 小时）。如遇到因网络导致的失败，可多次重试 docker build 命令。

四、项目镜像构建

返回项目主目录后执行：

cd models_inference/hunyuan-ocr
docker build -t models_inference_hunyuanocr_2b:latest .

五、启动服务

使用 docker-compose 进行一键启动：

docker-compose up -d

快速测试

测试用例位于 t-unit 目录，包含网络图片 URL 和 Base64 图片两种测试方式。

网络图片 URL 测试

curl -X POST http://localhost:18015/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunyuan-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "请提取图片中信息"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://example.com/test-image.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 15000
  }'

Base64 图片测试

# 1. 将图片编码为 base64，并去除换行符
base64 example.jpg | tr -d '\n' > img.b64

# 2. 发送推理请求
curl -X POST http://localhost:18005/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hunyuan-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "请提取图片中信息"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "'"$(cat img.b64)"'"
            }
          }
        ]
      }
    ],
    "max_tokens": 15000
  }'