QwQ-32B-Preview

简介

QwQ-32B-Preview 是由Qwen团队开发的实验性研究模型，专注于提升AI推理能力。作为预览版本，它展现出了良好的分析能力，但也存在一些重要局限性：

语言混合与语码转换：模型可能会混合使用不同语言或意外切换语言，影响响应的清晰度。
递归推理循环：模型可能陷入循环推理模式，导致响应冗长却没有明确结论。
安全与伦理考量：该模型需要增强安全措施以确保可靠和安全的性能，用户在部署时应保持谨慎。
性能与基准测试局限性：模型在数学和编码任务上表现出色，但在其他领域（如常识推理和细微语言理解）仍有提升空间。

规格说明：

类型：因果语言模型
训练阶段：预训练与后训练
架构：采用RoPE、SwiGLU、RMSNorm和Attention QKV偏置的transformers
参数数量：325亿
非嵌入层参数数量：310亿
层数：64
注意力头数量（GQA）：Q头40个，KV头8个
上下文长度：完整32,768个token

更多详情，请参阅我们的博客。您也可以查看Qwen2.5的GitHub和文档。

环境要求

Qwen2.5的代码已集成到最新版的Hugging Face transformers中，建议您使用最新版本的transformers。

若使用transformers<4.37.0，您将遇到以下错误：

KeyError: 'qwen2'

快速开始

这里提供一个使用 apply_chat_template 的代码片段，向您展示如何加载分词器和模型以及如何生成内容。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/QwQ-32B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many r in strawberry."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

引用说明

如果您觉得我们的研究工作对您有所帮助，欢迎引用我们的成果。

@misc{qwq-32b-preview,
    title = {QwQ: Reflect Deeply on the Boundaries of the Unknown},
    url = {https://qwenlm.github.io/blog/qwq-32b-preview/},
    author = {Qwen Team},
    month = {November},
    year = {2024}
}

@article{qwen2,
      title={Qwen2 Technical Report}, 
      author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
}

简介

QwQ-32B-Preview 是由Qwen团队开发的实验性研究模型，专注于提升AI推理能力。作为预览版本，它展现出了良好的分析能力，但也存在一些重要局限性：

语言混合与语码转换：模型可能会混合使用不同语言或意外切换语言，影响响应的清晰度。

递归推理循环：模型可能陷入循环推理模式，导致响应冗长却没有明确结论。

安全与伦理考量：该模型需要增强安全措施以确保可靠和安全的性能，用户在部署时应保持谨慎。

性能与基准测试局限性：模型在数学和编码任务上表现出色，但在其他领域（如常识推理和细微语言理解）仍有提升空间。

规格说明：

类型：因果语言模型

训练阶段：预训练与后训练

架构：采用RoPE、SwiGLU、RMSNorm和Attention QKV偏置的transformers

参数数量：325亿

非嵌入层参数数量：310亿

层数：64

注意力头数量（GQA）：Q头40个，KV头8个

上下文长度：完整32,768个token

更多详情，请参阅我们的博客。您也可以查看Qwen2.5的GitHub和文档。

快速开始

这里提供一个使用 apply_chat_template 的代码片段，向您展示如何加载分词器和模型以及如何生成内容。

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/QwQ-32B-Preview"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many r in strawberry."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

引用说明

如果您觉得我们的研究工作对您有所帮助，欢迎引用我们的成果。

@misc{qwq-32b-preview,
    title = {QwQ: Reflect Deeply on the Boundaries of the Unknown},
    url = {https://qwenlm.github.io/blog/qwq-32b-preview/},
    author = {Qwen Team},
    month = {November},
    year = {2024}
}

@article{qwen2,
      title={Qwen2 Technical Report}, 
      author={An Yang and Baosong Yang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Zhou and Chengpeng Li and Chengyuan Li and Dayiheng Liu and Fei Huang and Guanting Dong and Haoran Wei and Huan Lin and Jialong Tang and Jialin Wang and Jian Yang and Jianhong Tu and Jianwei Zhang and Jianxin Ma and Jin Xu and Jingren Zhou and Jinze Bai and Jinzheng He and Junyang Lin and Kai Dang and Keming Lu and Keqin Chen and Kexin Yang and Mei Li and Mingfeng Xue and Na Ni and Pei Zhang and Peng Wang and Ru Peng and Rui Men and Ruize Gao and Runji Lin and Shijie Wang and Shuai Bai and Sinan Tan and Tianhang Zhu and Tianhao Li and Tianyu Liu and Wenbin Ge and Xiaodong Deng and Xiaohuan Zhou and Xingzhang Ren and Xinyu Zhang and Xipin Wei and Xuancheng Ren and Yang Fan and Yang Yao and Yichang Zhang and Yu Wan and Yunfei Chu and Yuqiong Liu and Zeyu Cui and Zhenru Zhang and Zhihao Fan},
      journal={arXiv preprint arXiv:2407.10671},
      year={2024}
}