hcomm

公开

HCOMM（Huawei Communication）是HCCL的通信基础库，提供通信域以及通信资源的管理能力。

Co-authored-by: p_ch<pengchenghao1@huawei.com> # message auto-generated for no-merge-commit merge: !1515 merge roce-write-slicing into master RoCE Write/Read slicing for large RDMA transfers Created-by: p_ch Commit-by: p_ch Merged-by: cann-robot Description: ## 描述对于 Host Roce 场景，新增对GB级别大数据进行切块后 Write / Read 操作的支持。 ### CpuRoceEndpoint 新增成员函数 GetCapabilities 返回此 Endpoint 拥有的能力。目前仅包含 ibv post send 能传输的数据的最大尺寸。因缺乏 hccp 接口，目前固定为 1G。已上环境测试通过。 ### HostCpuRoceChannel 支持数据切片在 ParseInputParam 新增获取本端 Endpoint 的传输数据最大值的逻辑，并存储在成员变量。更新 Write / Read 逻辑：将需要传输的数据按照传输数据最大值进行切分，再通过 ibv post send 发送。更新 WriteWithNotify 逻辑：将需要传输的数据按照传输数据最大值进行切分，前 N - 1 个以 Write 操作，最后一个以 WriteWithImm 操作发送。 ## 关联的Issue 2026033010876 ## 测试上环境测试单边通信 Write 2G 数据成功。新增了 5 个UT测试用例 | 测试用例 | 覆盖的新分支 | | --- | ---| |Ut_Init_When_EndpointIsNotCpuRoce| ParseInputParam 中 dynamic_cast<CpuRoceEndpoint*> 返回 nullptr → HCCL_E_INTERNAL| |Ut_WriteAndReadAndWriteWithNotify_When_MaxMsgSizeIsZero| Write/Read/WriteWithNotify 中 maxMsgSize_ == 0 守卫 → HCCL_E_INTERNAL| |Ut_Write_When_LenExceedsMaxMsgSize| Write 的分片循环 (len=250, maxMsgSize_=100 → 3 chunks)| |Ut_Read_When_LenExceedsMaxMsgSize| Read 的分片循环 (同上)| |Ut_WriteWithNotify_When_LenExceedsMaxMsgSize| WriteWithNotify 的分片：前 2 块 PostRdmaOp(RDMA_WRITE) + 尾块 RDMA_WRITE_WITH_IMM| 另外在 fixture 后新增了 MockNonRoceEndpoint 类，用于模拟非 CpuRoceEndpoint 的 Endpoint，使 dynamic_cast 失败。 ## 文档更新  ## 类型标签  - [x] Bug修复 - [ ] 新特性 - [ ] 性能优化 - [ ] 文档更新 - [ ] 其他，请描述： See merge request: cann/hcomm!1515