Falcon3开放基础模型系列是一组预训练和指令调优的大型语言模型,参数规模从10亿到100亿不等。
本仓库包含Falcon3-10B-Instruct模型。在发布时,该模型在推理、语言理解、指令遵循、代码和数学任务上均达到了最先进的结果。Falcon3-10B-Instruct支持4种语言(英语、法语、西班牙语、葡萄牙语),上下文长度最长可达32K。
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "tiiuae/Falcon3-10B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "How many hours in one day?"
messages = [
{"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)以下表格报告了我们的内部管道基准测试结果。
| 类别 | 基准测试 | Yi-1.5-9B-Chat | Mistral-Nemo-Base-2407 (12B) | Falcon3-10B-Instruct |
|---|---|---|---|---|
| 通用 | MMLU (5-shot) | 68.8 | 66.0 | 73.9 |
| MMLU-PRO (5-shot) | 38.8 | 34.3 | 44 | |
| IFEval | 57.8 | 63.4 | 78 | |
| 数学 | GSM8K (5-shot) | 77.1 | 77.6 | 84.9 |
| GSM8K (8-shot, COT) | 76 | 80.4 | 84.6 | |
| MATH Lvl-5 (4-shot) | 3.3 | 5.9 | 22.1 | |
| 推理 | Arc Challenge (25-shot) | 58.3 | 63.4 | 66.2 |
| GPQA (0-shot) | 35.6 | 33.2 | 33.5 | |
| GPQA (0-shot, COT) | 16 | 12.7 | 32.6 | |
| MUSR (0-shot) | 41.9 | 38.1 | 41.1 | |
| BBH (3-shot) | 50.6 | 47.5 | 58.4 | |
| 常识理解 | PIQA (0-shot) | 76.4 | 78.2 | 78.4 |
| SciQ (0-shot) | 61.7 | 76.4 | 90.4 | |
| Winogrande (0-shot) | - | - | 71 | |
| OpenbookQA (0-shot) | 43.2 | 47.4 | 48.2 | |
| 指令遵循 | MT-Bench (平均) | 8.3 | 8.6 | 8.2 |
| Alpaca (WC) | 25.8 | 45.4 | 24.7 | |
| 工具使用 | BFCL AST (平均) | 48.4 | 74.2 | 90.5 |
| 代码 | EvalPlus (0-shot) (平均) | 69.4 | 58.9 | 74.7 |
| Multipl-E (0-shot) (平均) | - | 34.5 | 45.8 |
即将发布....
如果Falcon3系列对您的工作有所帮助,欢迎引用我们。
@misc{Falcon3,
title = {The Falcon 3 family of Open Models},
author = {TII Team},
month = {December},
year = {2024}
}详细结果可查看此处
| 指标 | 数值 |
|---|---|
| 平均值 | 35.19 |
| IFEval(零样本) | 78.17 |
| BBH(三样本) | 44.82 |
| MATH Lvl 5(四样本) | 25.91 |
| GPQA(零样本) | 10.51 |
| MuSR(零样本) | 13.61 |
| MMLU-PRO(五样本) | 38.10 |