Qwen2-7B

简介

Qwen2 是 Qwen 系列大型语言模型的全新版本。在 Qwen2 中，我们发布了一系列基础语言模型和指令微调语言模型，参数规模从 0.5B 到 72B 不等，其中还包括一个混合专家（Mixture-of-Experts）模型。本仓库包含的是 7B 参数规模的 Qwen2 基础语言模型。

与当前最先进的开源语言模型（包括之前发布的 Qwen1.5）相比，Qwen2 在一系列针对语言理解、语言生成、多语言能力、代码生成、数学运算、推理等任务的基准测试中，普遍超越了大多数开源模型，并展现出与专有模型相竞争的能力。

更多详情，请参考我们的博客、GitHub 和文档。

模型详情

Qwen2 是一个语言模型系列，包含不同规模的解码器语言模型。对于每种规模，我们都发布了基础语言模型和经过对齐的对话模型。该系列模型基于 Transformer 架构，采用了 SwiGLU 激活函数、注意力 QKV 偏置、分组查询注意力（group query attention）等技术。此外，我们还改进了分词器，使其能够更好地适应多种自然语言和代码。

环境要求

Qwen2 的代码已集成到最新版本的 Hugging Face Transformers 库中，建议您安装 transformers>=4.37.0，否则可能会遇到以下错误：

KeyError: 'qwen2'

使用方法

我们不建议您直接使用基础语言模型进行文本生成。相反，您可以对此模型进行后期训练，例如SFT、RLHF、持续预训练等。

您可以使用以下代码进行文本补全：


import argparse
import torch
from openmind_hub import snapshot_download
from openmind import AutoModelForCausalLM, AutoTokenizer


def parse_args():
    parser = argparse.ArgumentParser(description="Eval the LLM model")
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default=None,
    )

    args = parser.parse_args()
    return args


def main():
    args = parse_args()
    if args.model_name_or_path:
        model_path = args.model_name_or_path
    else:
        model_path = snapshot_download(
            "JiangSuAscend/Qwen1.5-7B-Chat",
            revision="main",
            ignore_patterns=["*.h5", "*.ot", "*.msgpack"],
        )

    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
        model_path, torch_dtype=torch.float16, device_map="auto"
    )

    prompt = "Q: What is the biggest animal?\nA:"
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    input_ids = input_ids.to(model.device)
    generation_output = model.generate(input_ids=input_ids, max_new_tokens=32)

    print(tokenizer.decode(generation_output[0]))


if __name__ == "__main__":
    main()

性能表现

基础模型的评估主要围绕自然语言理解、通用问答、代码生成、数学能力、科学知识、推理能力、多语言能力等方面的模型性能展开。

评估所用数据集包括：

英文任务：MMLU（5-shot）、MMLU-Pro（5-shot）、GPQA（5-shot）、Theorem QA（5-shot）、BBH（3-shot）、HellaSwag（10-shot）、Winogrande（5-shot）、TruthfulQA（0-shot）、ARC-C（25-shot）

代码任务：EvalPlus（0-shot）（包含HumanEval、MBPP、HumanEval+、MBPP+）、MultiPL-E（0-shot）（包含Python、C++、JAVA、PHP、TypeScript、C#、Bash、JavaScript）

数学任务：GSM8K（4-shot）、MATH（4-shot）

中文任务：C-Eval（5-shot）、CMMLU（5-shot）

多语言任务：Multi-Exam（包含M3Exam 5-shot、IndoMMLU 3-shot、ruMMLU 5-shot、mMMLU 5-shot）、Multi-Understanding（包含BELEBELE 5-shot、XCOPA 5-shot、XWinograd 5-shot、XStoryCloze 0-shot、PAWS-X 5-shot）、Multi-Mathematics（包含MGSM 8-shot）、Multi-Translation（包含Flores-101 5-shot）

Qwen2-7B 性能表现

数据集	Mistral-7B	Gemma-7B	Llama-3-8B	Qwen1.5-7B	Qwen2-7B
参数数量	7.2B	8.5B	8.0B	7.7B	7.6B
非嵌入参数数量	7.0B	7.8B	7.0B	6.5B	6.5B
英文任务
MMLU	64.2	64.6	66.6	61.0	70.3
MMLU-Pro	30.9	33.7	35.4	29.9	40.0
GPQA	24.7	25.7	25.8	26.7	31.8
Theorem QA	19.2	21.5	22.1	14.2	31.1
BBH	56.1	55.1	57.7	40.2	62.6
HellaSwag	83.2	82.2	82.1	78.5	80.7
Winogrande	78.4	79.0	77.4	71.3	77.0
ARC-C	60.0	61.1	59.3	54.2	60.6
TruthfulQA	42.2	44.8	44.0	51.1	54.2
代码任务
HumanEval	29.3	37.2	33.5	36.0	51.2
MBPP	51.1	50.6	53.9	51.6	65.9
EvalPlus	36.4	39.6	40.3	40.0	54.2
MultiPL-E	29.4	29.7	22.6	28.1	46.3
数学任务
GSM8K	52.2	46.4	56.0	62.5	79.9
MATH	13.1	24.3	20.5	20.3	44.2
中文任务
C-Eval	47.4	43.6	49.5	74.1	83.2
CMMLU	-	-	50.8	73.1	83.9
多语言任务
Multi-Exam	47.1	42.7	52.3	47.7	59.2
Multi-Understanding	63.3	58.3	68.6	67.6	72.0
Multi-Mathematics	26.3	39.1	36.3	37.3	57.5
Multi-Translation	23.3	31.2	31.9	28.4	31.5

引用

如果您觉得我们的工作有帮助，欢迎引用我们的成果。

@article{qwen2,
  title={Qwen2 Technical Report},
  year={2024}
}