Qwen2-0.5B

简介

Qwen2 是 Qwen 系列大型语言模型的全新版本。本次发布的 Qwen2 包含多个基础语言模型和指令微调语言模型，参数规模从 0.5B 到 72B 不等，其中还包括一个混合专家（Mixture-of-Experts）模型。本仓库包含的是 0.5B 参数的 Qwen2 基础语言模型。

与当前最先进的开源语言模型（包括之前发布的 Qwen1.5）相比，Qwen2 在一系列针对语言理解、文本生成、多语言能力、代码编写、数学运算、推理等任务的基准测试中，普遍超越了大多数开源模型，并展现出与专有模型相竞争的实力。

模型详情

Qwen2 是一个语言模型系列，包含不同参数规模的解码器语言模型。对于每个规模，我们都会发布基础语言模型和经过对齐的对话模型。该系列模型基于 Transformer 架构，采用 SwiGLU 激活函数、注意力 QKV 偏置、分组查询注意力（group query attention）等技术。此外，我们还改进了分词器，使其能够更好地适应多种自然语言和代码。

使用方法

我们不建议直接使用基础语言模型进行文本生成。相反，您可以在此模型的基础上进行后训练，例如监督微调（SFT）、基于人类反馈的强化学习（RLHF）、持续预训练等。

您可以使用以下代码进行文本补全：

import argparse
import torch
from openmind import pipeline, is_torch_npu_available
from openmind_hub import snapshot_download


def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default=None,
    )
    args = parser.parse_args()
    return args


def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"

    generator = pipeline('text-generation', model=model_path, device=device)

    output = generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5, num_beams=5)

    print(f">>>output={output}", flush=True)


if __name__ == "__main__":
    main()

性能表现

基础模型的评估主要围绕自然语言理解、通用问答、代码生成、数学能力、科学知识、推理能力、多语言能力等方面的模型性能展开。

评估所用数据集包括：

英文任务：MMLU（5-shot）、MMLU-Pro（5-shot）、GPQA（5-shot）、Theorem QA（5-shot）、BBH（3-shot）、HellaSwag（10-shot）、Winogrande（5-shot）、TruthfulQA（0-shot）、ARC-C（25-shot）

代码任务：EvalPlus（0-shot）（包含HumanEval、MBPP、HumanEval+、MBPP+）、MultiPL-E（0-shot）（包含Python、C++、JAVA、PHP、TypeScript、C#、Bash、JavaScript）

数学任务：GSM8K（4-shot）、MATH（4-shot）

中文任务：C-Eval（5-shot）、CMMLU（5-shot）

多语言任务：Multi-Exam（包含M3Exam 5-shot、IndoMMLU 3-shot、ruMMLU 5-shot、mMMLU 5-shot）、Multi-Understanding（包含BELEBELE 5-shot、XCOPA 5-shot、XWinograd 5-shot、XStoryCloze 0-shot、PAWS-X 5-shot）、Multi-Mathematics（包含MGSM 8-shot）、Multi-Translation（包含Flores-101 5-shot）

Qwen2-0.5B 与 Qwen2-1.5B 性能表现

数据集	Phi-2	Gemma-2B	MiniCPM	Qwen1.5-1.8B	Qwen2-0.5B	Qwen2-1.5B
#Non-Emb 参数规模	2.5B	2.0B	2.4B	1.3B	0.35B	1.3B
MMLU	52.7	42.3	53.5	46.8	45.4	56.5
MMLU-Pro	-	15.9	-	-	14.7	21.8
Theorem QA	-	-	-	-	8.9	15.0
HumanEval	47.6	22.0	50.0	20.1	22.0	31.1
MBPP	55.0	29.2	47.3	18.0	22.0	37.4
GSM8K	57.2	17.7	53.8	38.4	36.5	58.5
MATH	3.5	11.8	10.2	10.1	10.7	21.7
BBH	43.4	35.2	36.9	24.2	28.4	37.2
HellaSwag	73.1	71.4	68.3	61.4	49.3	66.6
Winogrande	74.4	66.8	-	60.3	56.8	66.2
ARC-C	61.1	48.5	-	37.9	31.5	43.9
TruthfulQA	44.5	33.1	-	39.4	39.7	45.9
C-Eval	23.4	28.0	51.1	59.7	58.2	70.6
CMMLU	24.2	-	51.1	57.8	55.1	70.3