Camelidae和Qwen2idae模型采用参数高效稀疏构建技术进行训练。
我们提出参数高效稀疏构建方法,以帮助密集型模型学习不同领域(包括代码和数学)的知识。该方法执行指令调优并高效利用MoE结构。
具体而言,参数高效稀疏构建利用包括QLoRA和Adapter在内的参数高效技术,来执行高效的稀疏升级。
| Camelidae系列 | 下载地址 |
|---|---|
| Camelidae-8x7B | 🤗 HuggingFace |
| Camelidae-8x13B | 🤗 HuggingFace |
| Camelidae-8x34B | 🤗 HuggingFace |
| Camelidae-8x34B-pro | 🤗 即将发布 |
| Qwen2idae系列 | 下载地址 |
|---|---|
| Qwen2idae-16x14B-v1.0 | 🤗 HuggingFace |
| Qwen2idae-16x7B-v1.0 | 🤗 即将发布 |
| Qwen2idae-16x1.8B-v1.0 | 🤗 即将发布 |
| 模型 | 激活参数 | MMLU(5轮示例) | GSM8k(5轮示例) | MATH(4轮示例) | HumanEval(0轮示例) | MBPP(4轮示例) | HellaSwag(10轮示例) |
|---|---|---|---|---|---|---|---|
| GPT3.5 | - | 70.0% | 57.1% | 34.1% | 48.1% | - | 85.5% |
| LLaMA2-70B-chat | 70B | 63.8% | 59.3% | 10.4% | 32.3% | 35.6% | 84.8% |
| Camelidae-8x34B-pro | 35B | 75.7% | 79.4% | 24.0% | 48.8% | 43.2% | 85.2% |
| Camelidae-8x34B | 35B | 75.6% | 78.3% | 22.6% | 43.9% | 41.4% | 85.3% |
| SUSChat-34B | 34B | 76.4% | 72.3% | 22.0% | 11.6% | 40.2% | 83.9% |
| Yi-34B-chat | 34B | 74.8% | 67.6% | 17.3% | 20.1% | 41.0% | 83.9% |
| Qwen2idae-16x14B-v1.0 | 15B | 66.7% | 77.8% | 29.9% | 62.8% | 48.6% | 82.3% |
| Mixtral-8x7B-instruct | 14B | 68.7% | 71.7% | 22.1% | 25.6% | 40.6% | 86.5% |
| Camelidae-8x13B | 13B | 54.4% | 52.6% | 9.8% | 30.6% | 30.4% | 82.5% |
| LLaMA2-13B-chat | 13B | 53.9% | 37.1% | 5.2% | 18.9% | 27.2% | 81.9% |
| Camelidae-8x7B | 7B | 48.3% | 44.0% | 5.8% | 18.3% | 23.4% | 79.2% |
| LLaMA2-7B-chat | 7B | 47.2% | 26.3% | 3.9% | 12.2% | 17.6% | 78.6% |
我们对所有模型的前三名分数分别进行了加粗处理。
from openmind import AutoTokenizer, AutoModelForCausalLM, is_torch_npu_available
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import sys
import time
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to model",
default="zhouhui/Camelidae-8x13B",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
start_time = time.time()
model = AutoModelForCausalLM.from_pretrained(model_path,trust_remote_code=True).to(device)
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model.eval()
prompt = "Hello, who are you?"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
max_new_tokens = 100
outputs = model.generate(input_ids=input_ids, max_length=max_new_tokens, do_sample=True, temperature=0.7, top_p=0.3, top_k=0)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()@article{wu2024parameter,
title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
author={Wu, Haoyuan and Zheng, Haisheng and Yu, Bei},
journal={arXiv preprint arXiv:2401.02731},
year={2024}
}本仓库中的源代码遵循 Apache 2.0 许可协议。Camelidae 模型仅供学术研究和免费商业使用,所有使用必须遵守 facebookresearch 和 01-ai 的许可协议。