NuminaMath 是一系列语言模型,通过两阶段监督微调训练,旨在利用思维链(CoT)和工具集成推理(TIR)解决数学问题:
NuminaMath 7B CoT 是第一阶段的模型,它在 AI-MO/NuminaMath-CoT 数据集上进行了微调。该数据集是一个包含 86 万+数学竞赛题解对的大规模数据集。
from openmind import AutoModelForCausalLM, AutoTokenizer, pipeline , is_torch_npu_available
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import time
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to model",
default="jeffding/NuminaMath-7B-CoT-openmind",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
# Load model from HuggingFace Hub
model = AutoModelForCausalLM.from_pretrained(model_path,
device_map=device,
trust_remote_code=False,
revision="main")
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True)
start_time = time.time()
prompt = "Tell me about AI"
prompt_template=f'''<s>[INST] {prompt} [/INST]
'''
print("\n\n*** Generate:")
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.to(device)
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
print(tokenizer.decode(output[0]))
# Inference can also be done using transformers' pipeline
print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.95,
top_k=40,
repetition_penalty=1.1
)
print(pipe(prompt_template)[0]['generated_text'])
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()
以下是如何使用 🤗 Transformers 中的 pipeline() 函数运行模型的方法:
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="AI-MO/NuminaMath-7B-TIR", torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{"role": "user", "content": "For how many values of the constant $k$ will the polynomial $x^{2}+kx+36$ have two distinct integer roots?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
gen_config = {
"max_new_tokens": 1024,
"do_sample": False,
"tokenizer": pipe.tokenizer,
}
outputs = pipe(prompt, **gen_config)
text = outputs[0]["generated_text"]
print(text)NuminaMath 7B CoT 旨在解决竞赛级数学这一特定领域的问题。因此,该模型不应用于通用聊天应用。在使用贪婪解码时,我们发现该模型能够解决 AMC 12 级别的题目,但在面对更难的 AIME 和数学奥林匹克级别题目时,往往难以生成有效的解决方案。该模型在解决几何问题时也存在困难,这可能是由于其容量有限以及缺乏视觉等其他模态。
训练过程中使用了以下超参数:
如果您发现 NuminaMath 7B TIR 对您的工作有所帮助,请按以下方式引用:
@misc{numina_math_7b,
author = {Edward Beeching and Shengyi Costa Huang and Albert Jiang and Jia Li and Benjamin Lipkin and Zihan Qina and Kashif Rasul and Ziju Shen and Roman Soletskyi and Lewis Tunstall},
title = {NuminaMath 7B CoT},
year = {2024},
publisher = {Numina & Hugging Face},
journal = {Hugging Face repository},
howpublished = {\url{https://huggingface.co/AI-MO/NuminaMath-7B-CoT}}
}训练过程中使用了以下超参数: