DeepSeek-OCR:可用于图片、PDF的OCR识别及文档转换。项目适配DeepSeek-OCR模型，支持昇腾NPU，提供离线和在线推理方式，能高效处理多模态OCR任务。【此简介由AI生成】

注意

本模型仓代码，是针对此开源链接进行的适配 https://github.com/deepseek-ai/DeepSeek-OCR

一、准备运行环境

表 1 版本配套表

配套	版本	环境准备指导
Python	3.11/3.12	-
torch	2.7.1	-
vllm	commit: 83f478bb19489b41e9d208b47b4bb5a95ac171ac	-
vllm_ascend	main	-

1.1 获取CANN安装包&环境准备

设备支持 Atlas 800I A2(8 x 64G); Atlas 800I A3(8 x 128G); 只支持单卡
获取CANN包
环境准备指导

1.2 CANN安装

方式一：软件包安装

# 增加软件包可执行权限，{version}表示软件版本号，{arch}表示CPU架构，{soc}表示昇腾AI处理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校验软件包安装文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安装
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install

# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh

1.3 vllm环境安装

# 先卸载环境中的vllm和vllm_ascend
# 安装vllm
git clone --branch main https://github.com/vllm-project/vllm.git
cd vllm
git checkout 2918c1b49c88c29783c86f78d2c4221cb9622379
VLLM_TARGET_DEVICE=empty  pip install -v -e . 

# 安装vllm_ascend
git clone --branch main https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
git checkout c272747d13c955126a59f7538a2054a24d31a285
pip install -v -e .

二、下载权重

权重链接

# huggingface路径
https://huggingface.co/deepseek-ai/DeepSeek-OCR
# 魔乐路径
https://modelers.cn/models/deepseek-ai/DeepSeek-OCR

三、DeepSeek-OCR使用

3.1 下载到本地

git clone https://modelers.cn/vLLM_Ascend/DeepSeek-OCR.git
cd DeepSeek-OCR

3.2 安装依赖

pip install -r requirements.txt

3.3 执行推理

3.3.1 vllm离线推理方式

设置环境变量

export VLLM_USE_V1=1
export VLLM_ASCEND_ENABLE_NZ=0
export TOKENIZERS_PARALLELISM=false
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2

1、图像

model_base='deepseek-ai/DeepSeek-OCR'
python run_image.py --model_path ${model_base} --input_path test.jpg --save_dir ./output

2、pdf

python run_pdf.py --model_path ${model_base} --input_path test.pdf --save_dir ./output

3、基准测试

python run_benchmark.py --model_path ${model_base} --input_path ./OmniDocBench --save_dir ./output

参数说明：

--model_path：DeepSeek-OCR的权重路径
--input_path：输入图片/pdf/数据集的路径
--save_dir：输出结果的保存路径

3.3.2 vllm在线推理方式

由于当前vllm_ascend暂未合入对OCR的适配修改代码，所以需要手动修改下代码 1、将patch中的文件放到vllm_ascend/patch/worker路径下 2、修改vllm_ascend/patch/worker/init.py，增加以下两行代码

import vllm_ascend.patch.worker.patch_deepseekmoe
import vllm_ascend.patch.worker.patch_sam

3、启动服务化：

export VLLM_USE_V1=1
export VLLM_ASCEND_ENABLE_NZ=0
export TOKENIZERS_PARALLELISM=false
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2

vllm serve /deepseek-ai/DeepSeek-OCR \
    --served-model-name deepseekocr \
    --trust-remote-code \
    -tp 1  \
    --port 1020 \
    --max_model_len 8192 \
    --disable-mm-preprocessor-cache \
    --no-enable-prefix-caching \
    --gpu-memory-utilization 0.9 \
    --allowed-local-media-path ./dataset

3.4 PROMPT的例子

# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'

注意

本模型仓代码，是针对此开源链接进行的适配 https://github.com/deepseek-ai/DeepSeek-OCR

一、准备运行环境

表 1 版本配套表

配套	版本	环境准备指导
Python	3.11/3.12	-
torch	2.7.1	-
vllm	commit: 83f478bb19489b41e9d208b47b4bb5a95ac171ac	-
vllm_ascend	main	-

1.1 获取CANN安装包&环境准备

设备支持 Atlas 800I A2(8 x 64G); Atlas 800I A3(8 x 128G); 只支持单卡
获取CANN包
环境准备指导

1.2 CANN安装

方式一：软件包安装

# 增加软件包可执行权限，{version}表示软件版本号，{arch}表示CPU架构，{soc}表示昇腾AI处理器的版本。
chmod +x ./Ascend-cann-toolkit_{version}_linux-{arch}.run
chmod +x ./Ascend-cann-kernels-{soc}_{version}_linux.run
# 校验软件包安装文件的一致性和完整性
./Ascend-cann-toolkit_{version}_linux-{arch}.run --check
./Ascend-cann-kernels-{soc}_{version}_linux.run --check
# 安装
./Ascend-cann-toolkit_{version}_linux-{arch}.run --install
./Ascend-cann-kernels-{soc}_{version}_linux.run --install

# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh

1.3 vllm环境安装

# 先卸载环境中的vllm和vllm_ascend
# 安装vllm
git clone --branch main https://github.com/vllm-project/vllm.git
cd vllm
git checkout 2918c1b49c88c29783c86f78d2c4221cb9622379
VLLM_TARGET_DEVICE=empty  pip install -v -e . 

# 安装vllm_ascend
git clone --branch main https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
git checkout c272747d13c955126a59f7538a2054a24d31a285
pip install -v -e .

二、下载权重

权重链接

# huggingface路径
https://huggingface.co/deepseek-ai/DeepSeek-OCR
# 魔乐路径
https://modelers.cn/models/deepseek-ai/DeepSeek-OCR

三、DeepSeek-OCR使用

3.1 下载到本地

git clone https://modelers.cn/vLLM_Ascend/DeepSeek-OCR.git
cd DeepSeek-OCR

3.2 安装依赖

pip install -r requirements.txt

3.3 执行推理

3.3.1 vllm离线推理方式

设置环境变量

export VLLM_USE_V1=1
export VLLM_ASCEND_ENABLE_NZ=0
export TOKENIZERS_PARALLELISM=false
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2

1、图像

model_base='deepseek-ai/DeepSeek-OCR'
python run_image.py --model_path ${model_base} --input_path test.jpg --save_dir ./output

2、pdf

python run_pdf.py --model_path ${model_base} --input_path test.pdf --save_dir ./output

3、基准测试

python run_benchmark.py --model_path ${model_base} --input_path ./OmniDocBench --save_dir ./output

参数说明：

--model_path：DeepSeek-OCR的权重路径
--input_path：输入图片/pdf/数据集的路径
--save_dir：输出结果的保存路径

3.3.2 vllm在线推理方式

import vllm_ascend.patch.worker.patch_deepseekmoe
import vllm_ascend.patch.worker.patch_sam

3、启动服务化：

export VLLM_USE_V1=1
export VLLM_ASCEND_ENABLE_NZ=0
export TOKENIZERS_PARALLELISM=false
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export TASK_QUEUE_ENABLE=2

vllm serve /deepseek-ai/DeepSeek-OCR \
    --served-model-name deepseekocr \
    --trust-remote-code \
    -tp 1  \
    --port 1020 \
    --max_model_len 8192 \
    --disable-mm-preprocessor-cache \
    --no-enable-prefix-caching \
    --gpu-memory-utilization 0.9 \
    --allowed-local-media-path ./dataset

3.4 PROMPT的例子

# document: <image>\n<|grounding|>Convert the document to markdown.
# other image: <image>\n<|grounding|>OCR this image.
# without layouts: <image>\nFree OCR.
# figures in document: <image>\nParse the figure.
# general: <image>\nDescribe this image in detail.
# rec: <image>\nLocate <|ref|>xxxx<|/ref|> in the image.
# '先天下之忧而忧'