Qwen-7B-Chat-Int4 昇腾NPU部署

概述

Qwen-7B-Chat-Int4是一个拥有70亿参数的对话模型，采用4位量化优化，专为在华为昇腾NPU基础设施上高效部署而设计。

核心特性

模型规模：70亿参数
量化方式：INT4（分组大小：128）
架构：QWenLMHeadModel
上下文长度：32,768 tokens
优化目标：昇腾910B系列NPU

模型信息

基础模型：Qwen/Qwen-7B-Chat
量化方法：GPTQ
量化位数：4
量化分组大小：128
依赖库：Transformers

部署指南

完整部署说明请参见Qwen-7B-Chat-Int4-detailed.md。

快速开始

# Clone this repository
git clone https://gitcode.com/weixin_72661020/Qwen-7B-Chat-Int4.git

# Navigate to the project directory  
cd Qwen-7B-Chat-Int4

# Check out the detailed documentation
cat Qwen-7B-Chat-Int4-detailed.md

包含文件

README.md - 本文件（项目概述）
Qwen-7B-Chat-Int4-detailed.md - 完整部署指南
SKILL.md - Claude 代码技能规范
inference.py - 支持 NPU 的推理脚本
evaluation.md - 测试结果与兼容性报告
model_info.json - 模型规格说明

许可证

本项目采用 Apache License 2.0 许可证 - 详情参见 LICENSE 文件。

Qwen-7B-Chat-Int4 昇腾NPU部署

概述

Qwen-7B-Chat-Int4是一个拥有70亿参数的对话模型，采用4位量化优化，专为在华为昇腾NPU基础设施上高效部署而设计。

核心特性

模型规模：70亿参数
量化方式：INT4（分组大小：128）
架构：QWenLMHeadModel
上下文长度：32,768 tokens
优化目标：昇腾910B系列NPU

模型信息

基础模型：Qwen/Qwen-7B-Chat
量化方法：GPTQ
量化位数：4
量化分组大小：128
依赖库：Transformers

部署指南

完整部署说明请参见Qwen-7B-Chat-Int4-detailed.md。

快速开始

# Clone this repository
git clone https://gitcode.com/weixin_72661020/Qwen-7B-Chat-Int4.git

# Navigate to the project directory  
cd Qwen-7B-Chat-Int4

# Check out the detailed documentation
cat Qwen-7B-Chat-Int4-detailed.md

包含文件

README.md - 本文件（项目概述）
Qwen-7B-Chat-Int4-detailed.md - 完整部署指南
SKILL.md - Claude 代码技能规范
inference.py - 支持 NPU 的推理脚本
evaluation.md - 测试结果与兼容性报告
model_info.json - 模型规格说明

许可证

本项目采用 Apache License 2.0 许可证 - 详情参见 LICENSE 文件。

Qwen-7B-Chat-Int4 昇腾NPU部署

概述

核心特性

模型信息

部署指南

快速开始

包含文件

许可证

相关链接

Qwen-7B-Chat-Int4 昇腾NPU部署

概述

核心特性

模型信息

部署指南

快速开始

包含文件

许可证

相关链接