🚨 本模型是 C4AI Command R+ 的非量化版本。您可以通过 bitsandbytes 找到 C4AI Command R+ 的量化版本[此处]。
C4AI Command R+ 是一个拥有 1040 亿参数模型的开放权重研究版本,具备高度先进的能力,包括检索增强生成(RAG)和工具使用,可实现复杂任务的自动化。该模型生成过程中的工具使用支持多步骤工具调用,使模型能够在多个步骤中组合多种工具来完成困难任务。C4AI Command R+ 是一个多语言模型,在 10 种语言上进行了性能评估:英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、阿拉伯语和简体中文。Command R+ 针对多种使用场景进行了优化,包括推理、总结和问答。
C4AI Command R+ 是 Cohere For AI 和 Cohere 开放权重模型系列的一部分。我们的小型配套模型是 [C4AI Command R]
开发方:Cohere 和 Cohere For AI
试用 C4AI Command R+
使用方法
请从包含此模型必要更改的源代码仓库安装 openmind。
from openmind import AutoTokenizer, AutoModelForCausalLM
model_id = "AI-Research/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)通过 bitsandbytes 量化的模型,8 位精度
from openmind import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
model_id = "AI-Research/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
# Format message with the command-r-plus chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)通过 bitsandbytes 量化的模型,4 位精度
此模型是 C4AI Command R+ 的非量化版本。您可以通过 [这里] 找到使用 bitsandbytes 量化的 C4AI Command R+ 版本。
输入:模型仅输入文本。
输出:模型仅生成文本。
模型架构:这是一个自回归语言模型,采用优化的 transformer 架构。预训练后,该模型使用监督微调(SFT)和偏好训练,使模型行为与人类对有用性和安全性的偏好保持一致。
支持语言:该模型经过优化,可在以下语言中表现良好:英语、法语、西班牙语、意大利语、德语、巴西葡萄牙语、日语、韩语、简体中文和阿拉伯语。
预训练数据还包括以下 13 种语言:俄语、波兰语、土耳其语、越南语、荷兰语、捷克语、印尼语、乌克兰语、罗马尼亚语、希腊语、印地语、希伯来语、波斯语。
上下文长度:Command R+ 支持 128K 的上下文长度。
Command R+ 已提交至 [Open LLM 排行榜]。我们在下方列出了结果,并与 openmind 上当前可用的最强大的 state-of-art 开源权重模型进行了直接比较。我们注意到,只有当所有模型的评估都以 标准化方式 使用公开可用的代码实现时,这些结果才有助于进行比较,因此不应将其用于与未提交至该排行榜的模型进行比较,也不应与无法以相同方式复制的自行报告的数字进行比较。
| 模型 | 平均分 | Arc (Challenge) | Hella Swag | MMLU | Truthful QA | Winogrande | GSM8k |
|---|---|---|---|---|---|---|---|
| CohereForAI/c4ai-command-r-plus | 74.6 | 70.99 | 88.6 | 75.7 | 56.3 | 85.4 | 70.7 |
| [DBRX Instruct] | 74.5 | 68.9 | 89 | 73.7 | 66.9 | 81.8 | 66.9 |
| [Mixtral 8x7B-Instruct] | 72.7 | 70.1 | 87.6 | 71.4 | 65 | 81.1 | 61.1 |
| [Mixtral 8x7B Chat] | 72.6 | 70.2 | 87.6 | 71.2 | 64.6 | 81.4 | 60.7 |
| [CohereForAI/c4ai-command-r-v01] | 68.5 | 65.5 | 87 | 68.2 | 52.3 | 81.5 | 56.6 |
| [Llama 2 70B] | 67.9 | 67.3 | 87.3 | 69.8 | 44.9 | 83.7 | 54.1 |
| [Yi-34B-Chat] | 65.3 | 65.4 | 84.2 | 74.9 | 55.4 | 80.1 | 31.9 |
| [Gemma-7B] | 63.8 | 61.1 | 82.2 | 64.6 | 44.8 | 79 | 50.9 |
| [LLama 2 70B Chat] | 62.4 | 64.6 | 85.9 | 63.9 | 52.8 | 80.5 | 26.7 |
| [Mistral-7B-v0.1] | 61 | 60 | 83.3 | 64.2 | 42.2 | 78.4 | 37.8 |
我们在此包含这些指标是因为它们经常被要求提供,但需要注意的是,这些指标并未涵盖 RAG、多语言能力、工具使用性能或开放式生成的评估,而我们认为 Command R+ 在这些方面处于 state-of-art 水平。有关 RAG、多语言能力和工具使用的评估,请点击此处了解更多。关于开放式生成的评估,Command R+ 目前正在 chatbot arena 上进行评估。
Command R+ 经过专门训练,具备基础生成能力。这意味着它能够基于提供的文档片段列表生成响应,并在响应中包含基础跨度(引用)以指明信息来源。这可用于实现基础摘要以及检索增强生成(RAG)的最终步骤等功能。这种行为是通过混合监督微调与偏好微调,并使用特定的提示模板训练到模型中的。偏离此提示模板可能会降低性能,但我们鼓励进行实验。
Command R+ 的基础生成行为将对话作为输入(可选用户提供的系统前言,用于指示任务、上下文和期望的输出风格),同时还包括检索到的文档片段列表。文档片段应为文本块,而非长文档,通常每个文本块约 100-400 字。文档片段由键值对组成。键应为简短的描述性字符串,值可以是文本或半结构化数据。
默认情况下,Command R+ 生成基础响应的过程如下:首先预测哪些文档是相关的,然后预测将引用哪些文档,接着生成答案,最后在答案中插入基础跨度。详见下方示例。这被称为“准确”基础生成。
该模型还训练了多种其他回答模式,可通过更改提示来选择。分词器支持“快速”引用模式,该模式将直接生成带有基础跨度的答案,而无需先完整写出答案。这会牺牲部分基础准确性,以换取生成更少的 tokens。
有关 Command R+ 基础生成提示模板使用的全面文档,请参见此处。
以下代码片段展示了一个用于渲染提示的最小工作示例。
from transformers import AutoTokenizer
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# define conversation input:
conversation = [
{"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# define documents to ground on:
documents = [
{ "title": "Tall penguins", "text": "Emperor penguins are the tallest growing up to 122 cm in height." },
{ "title": "Penguin habitats", "text": "Emperor penguins only live in Antarctica."}
]
# render the tool use prompt as a string:
grounded_generation_prompt = tokenizer.apply_grounded_generation_template(
conversation,
documents=documents,
citation_mode="accurate", # or "fast"
tokenize=False,
add_generation_prompt=True,
)
print(grounded_generation_prompt)The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|><results>
Document: 0
title: Tall penguins
text: Emperor penguins are the tallest growing up to 122 cm in height.
Document: 1
title: Penguin habitats
text: Emperor penguins only live in Antarctica.
</results><|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Carefully perform the following instructions, in order, starting each with a new line.
Firstly, Decide which of the retrieved documents are relevant to the user's last input by writing 'Relevant Documents:' followed by comma-separated list of document numbers. If none are relevant, you should instead write 'None'.
Secondly, Decide which of the retrieved documents contain facts that should be cited in a good answer to the user's last input by writing 'Cited Documents:' followed a comma-separated list of document numbers. If you dont want to cite any of them, you should instead write 'None'.
Thirdly, Write 'Answer:' followed by a response to the user's last input in high quality natural english. Use the retrieved documents to help you. Do not insert any citations or grounding markup.
Finally, Write 'Grounded answer:' followed by a response to the user's last input in high quality natural english. Use the symbols <co: doc> and </co: doc> to indicate when a fact comes from a document in the search result, e.g <co: 0>my fact</co: 0> for a fact from document 0.<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Relevant Documents: 0,1
Cited Documents: 0,1
Answer: The Emperor Penguin is the tallest or biggest penguin in the world. It is a bird that lives only in Antarctica and grows to a height of around 122 centimetres.
Grounded answer: The <co: 0>Emperor Penguin</co: 0> is the <co: 0>tallest</co: 0> or biggest penguin in the world. It is a bird that <co: 1>lives only in Antarctica</co: 1> and <co: 0>grows to a height of around 122 centimetres.</co: 0>单步工具调用(或“函数调用”)使 Command R+ 能够与 API、数据库或搜索引擎等外部工具进行交互。单步工具调用包含两个模型推理步骤:
Command R+ 经过专门训练,具备单步工具调用(或“函数调用”)能力。这些能力通过结合监督微调与偏好微调,并使用特定的提示模板训练到模型中。偏离此提示模板可能会降低性能。因此,我们建议使用下文所述的提示模板。
Command R+ 的单步工具调用功能以对话(可选择包含用户-系统序言)以及可用工具列表作为输入。然后,模型会生成一个 JSON 格式的操作列表,用于对部分工具执行操作。Command R+ 可以多次使用提供的某个工具。
该模型经过训练,能够识别一种特殊的 directly_answer 工具,用于表明它不想使用其他任何工具。在多种场景下,例如向用户打招呼或提出澄清问题时,不调用特定工具的能力会非常有用。我们建议包含 directly_answer 工具,但如有需要也可将其移除或重命名。
有关 Command R+ 单步工具调用提示模板的完整文档,请参见此处和此处。
您可以使用 apply_tool_use_template() 函数来渲染单步工具调用提示模板。下面的代码片段展示了渲染此提示的最小工作示例。
from transformers import AutoTokenizer
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# define conversation input:
conversation = [
{"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# Define tools available for the model to use:
tools = [
{
"name": "internet_search",
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
"parameter_definitions": {
"query": {
"description": "Query to search the internet with",
"type": 'str',
"required": True
}
}
},
{
'name': "directly_answer",
"description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
'parameter_definitions': {}
}
]
# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_tool_use_template(
conversation,
tools=tools,
tokenize=False,
add_generation_prompt=True,
)
print(tool_use_prompt)from transformers import AutoTokenizer
model_id = "CohereForAI/c4ai-command-r-plus"
tokenizer = AutoTokenizer.from_pretrained(model_id)
# define conversation input:
conversation = [
{"role": "user", "content": "Whats the biggest penguin in the world?"}
]
# Define tools available for the model to use
# Type hints and docstrings from Python functions are automatically extracted
def internet_search(query: str):
"""
Returns a list of relevant document snippets for a textual query retrieved from the internet
Args:
query: Query to search the internet with
"""
pass
def directly_answer():
"""
Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
"""
pass
tools = [internet_search, directly_answer]
# render the tool use prompt as a string:
tool_use_prompt = tokenizer.apply_chat_template(
conversation,
tools=tools,
tokenize=False,
add_generation_prompt=True,
)
print(tool_use_prompt)<BOS_TOKEN><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|># Safety Preamble
The instructions in this section override those in the task description and style guide sections. Don't answer questions that are harmful or immoral.
# System Preamble
## Basic Rules
You are a powerful conversational AI trained by Cohere to help people. You are augmented by a number of tools, and your job is to use and consume the output of these tools to best help the user. You will see a conversation history between yourself and a user, ending with an utterance from the user. You will then see a specific instruction instructing you what kind of response to generate. When you answer the user's requests, you cite your sources in your answers, according to those instructions.
# User Preamble
## Task and Context
You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.
## Style Guide
Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.
## Available Tools
Here is a list of tools that you have available to you:
```python
def internet_search(query: str) -> List[Dict]:
"""Returns a list of relevant document snippets for a textual query retrieved from the internet
Args:
query (str): Query to search the internet with
"""
pass
```
```python
def directly_answer() -> List[Dict]:
"""Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history
"""
pass
```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Whats the biggest penguin in the world?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|SYSTEM_TOKEN|>Write 'Action:' followed by a json-formatted list of actions that you want to perform in order to produce a good response to the user's last input. You can use any of the supplied tools any number of times, but you should aim to execute the minimum number of necessary actions for the input. You should use the `directly-answer` tool if calling the other tools is unnecessary. The list of actions you want to call should be formatted as a list of json objects, for example:
```json
[
{
"tool_name": title of the tool in the specification,
"parameters": a dict of parameters to input into the tool as they are defined in the specs, or {} if it takes no parameters
}
]```<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>Action: ```json
[
{
"tool_name": "internet_search",
"parameters": {
"query": "biggest penguin in the world"
}
}
]
```多步骤工具使用适用于构建能够规划并执行一系列使用多种工具的操作的智能体。与单步骤工具使用不同,模型可以执行多个推理周期,通过“行动→观察→反思”的循环迭代,直至确定最终响应。有关更多详情,请参阅我们的多步骤工具使用文档。
Command R+ 经过专门训练,具备多步骤工具使用(或“智能体”)能力。这些能力通过混合监督微调与偏好微调,并使用特定的提示模板植入模型。偏离此提示模板可能会降低性能。因此,我们建议使用下文所述的提示模板。
该提示模板目前尚未在 openmind 分词器中提供。不过,关于 Command R+ 多步骤工具使用提示模板的全面使用文档可在此处和此处找到。
Command R+ 经过优化,可通过请求代码片段、代码解释或代码重写来与您的代码交互。对于纯代码补全任务,其开箱即用的性能可能不尽如人意。为获得更好的性能,我们还建议在处理代码生成相关指令时使用较低的温度(甚至贪婪解码)。
如对本模型卡片中的内容有错误反馈或其他疑问,请联系 info@for.ai。
我们希望通过向全球研究人员发布这一性能卓越的 1040 亿参数模型的权重,使基于社区的研究工作更加易于开展。本模型受 CC-BY-NC 许可协议(含可接受使用附录)约束,同时也要求遵守 C4AI 可接受使用政策。
您可以通过此处在 playground 中尝试 Command R+ 对话功能。