git checkout 恢复git checkout -b exp/rapidocr_{optimization}_$(date +%s)
git add -A && git commit -m "exp(...): ...
- metric_primary: XX → XX
- metric_cer_diff: XX
- status: PASS/FAIL
"{
"experiment_id": "exp_001_rapidocr_env",
"timestamp": "2026-05-14T15:00:00Z",
"hypothesis": "TASK_QUEUE_ENABLE=2 + CPU_AFFINITY_CONF=2 可提升推理吞吐",
"changes": [],
"metrics": {
"throughput_images_per_sec": 22.0,
"latency_p50_ms": 44.0,
"latency_p99_ms": 58.0,
"cer_diff": 0.001,
"memory_mb": 2800
},
"status": "PASS",
"commit_hash": "a1b2c3d"
}if current.throughput > best.throughput * 1.05 and current.cer_diff < 0.01:
git_merge_to_main()
best = current
else:
git_checkout_main()| 时间段 | 阶段 | 预算 | 说明 |
|---|---|---|---|
| 0:00-0:30 | 启动准备 | 0.5小时 | 代码克隆、环境配置、基线建立、git初始化 |
| 0:30-1:30 | 路径A快速优化 | 1小时 | torch_npu 亲和算子 + 环境变量配置 |
| 1:30-2:30 | 路径B模型转换 | 1小时 | ONNX → ATC → OM |
| 2:30-3:30 | 性能分析闭环 | 1小时 | L1性能分析 → 瓶颈定位 |
| 3:30-4:30 | 方案择优与验证 | 1小时 | 双路径对比、精度验收 |
| 4:30-5:00 | 成果交付与提交 | 0.5小时 | SKILL.md编写、AtomGit推送 |
goal: "RapidOCR Ascend 910 推理性能达标且精度误差<1%"
termination:
success: throughput >= baseline_target AND cer_diff < 0.01
failure: time_budget_exceeded (5h) OR consecutive_rollbacks >= 5
metrics:
primary: throughput
secondary: latency_p50_ms, latency_p99_ms
guardrail: cer_diff, memory_mb
loop:
interval: "每个实验完成后立即评估"
max_iterations: 50
actions:
on_pass: git merge → update best → generate_next_hypothesis
on_fail: git rollback → log_failure → try_alternative_path
on_timeout: package_current_best → submit_deliverablesnpu-smi info
python -c "import torch; import torch_npu; print(torch_npu.__version__)"cd /workspace
git clone https://ai.gitcode.com/hf_mirrors/pitapo/rapidocr rapidocr-ascend
cd rapidocr-ascend
pip install -r requirements.txt # torch, torchvision, onnxruntime, opencv-python
git init
git add -A && git commit -m "init: clone rapidocr upstream"python eval/baseline_run.py --device npu --batch 1,4,8,16git branch main-bestgit checkout -b exp/env_optimization
git checkout main-best -- .
export TASK_QUEUE_ENABLE=2
export CPU_AFFINITY_CONF=2
python eval/perf_eval.py --device npugit checkout -b exp/det_backbone_optimization
# ResNet/MobileNet backbone 的 BN + ReLU
# DB head 的卷积和 sigmoid
python eval/perf_eval.pygit checkout -b exp/rec_model_optimization
# CRNN: CNN backbone + BiLSTM + CTC
# SVTR/Transformer: Attention → npu_fusion_attention (bf16)
python eval/perf_eval.py --dtype bf16git checkout -b exp/preprocess_optimization
# resize、normalize、padding 优化
# 尝试用 torchvision.transforms 或 NPU 预处理
python eval/perf_eval.pygit checkout -b exp/contiguous_layout
python eval/perf_eval.pygit checkout -b exp/om_conversion
git checkout main-best -- .
# 检测和识别模型分别导出
python om_model/export_det_onnx.py --model ch_PP-OCRv4_det
python om_model/export_rec_onnx.py --model ch_PP-OCRv4_rec
python eval/compare_onnx_torch.pybash om_model/atc_convert_det.sh
bash om_model/atc_convert_rec.shpython inference_acl.py --det_model om_model/det.om --rec_model om_model/rec.om --device npu
python eval/perf_eval.py --backend aclimport torch_npu
prof = torch_npu.profiler.profile(...)| 实验 | 环境变量 | 触发条件 |
|---|---|---|
| exp/task_queue | TASK_QUEUE_ENABLE=2 | 空闲时间 > 20% |
| exp/cpu_affinity | CPU_AFFINITY_CONF=2 | CPU 调度开销高 |
| exp/tcmalloc | LD_PRELOAD=libtcmalloc.so | malloc 热点 |
| exp/alloc_conf | PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:512 | 显存碎片 |
| exp/fp16 | torch.float16 AMP | 模型支持且精度允许 |
python eval/final_compare.pycer_diff < 0.01git tag v1.0-optimized
git checkout v1.0-optimizedmkdir -p deliverables
cp inference.py readme.md eval/ logs/ SKILL.md deliverables/
REPO_URL="https://oauth2:${ATOMGIT_USER_TOKEN}@gitcode.com/yourname/rapidocr-ascend.git"
git remote add origin "$REPO_URL"
git push -u origin main| 脚本 | 功能 |
|---|---|
eval/baseline_run.py | 跑基线 |
eval/perf_eval.py | 性能评测 |
eval/accuracy_eval.py | 精度评测(CER/WER) |
eval/compare_gpu_cpu.py | 三端对比 |
eval/parse_profiling.py | 解析 profiling |
eval/final_compare.py | 双路径最终对比 |
om_model/export_det_onnx.py | 检测模型 ONNX 导出 |
om_model/export_rec_onnx.py | 识别模型 ONNX 导出 |
om_model/atc_convert_det.sh | 检测模型 ATC 转换 |
om_model/atc_convert_rec.sh | 识别模型 ATC 转换 |
inference_acl.py | ACL 推理入口 |
inference.py | 最终交付推理脚本 |
readme.md | 部署文档 |
SKILL.md | 优化记录与复现指南 |
if experiment.cer_diff > 0.01:
rollback("精度超标")
elif experiment.throughput < best.throughput * 0.95:
rollback("性能倒退 > 5%")
elif experiment.memory_mb > best.memory_mb * 1.3:
rollback("显存暴涨 > 30%")
elif experiment.latency_p99_ms > best.latency_p99_ms * 1.2:
rollback("尾部延迟恶化 > 20%")
else:
merge_to_main_best()| 时间节点 | 熔断动作 |
|---|---|
| 1h | Path A 无成功实验 → 全力投入 Path B |
| 2h | Path B OM 转换失败 → 放弃 Path B,Path A + profiling 兜底 |
| 3h | 仍无达标方案 → 启用激进优化 |
| 4h | 仍未达标 → 锁定当前 best,开始生成交付件 |
| 4.5h | 停止所有实验,全力完成交付 |
| 5h | 强制终止,提交当前最优结果 |