HuggingFace镜像/C-Qwen3-Embedding-Reranker-0.6B
模型介绍文件和版本分析
下载使用量0

Qwen3-Embedding-0.6B

模型亮点

Qwen3 Embedding模型系列是Qwen家族最新的专有模型,专为文本嵌入和排序任务设计。它基于Qwen3系列的稠密基础模型构建,提供了涵盖多种尺寸(0.6B、4B和8B)的全面文本嵌入与重排序模型。该系列继承了其基础模型卓越的多语言能力、长文本理解能力和推理能力。Qwen3 Embedding系列在文本检索、代码检索、文本分类、文本聚类和双语文本挖掘等多项文本嵌入与排序任务中均取得了显著进展。

卓越的通用性:该嵌入模型在各类下游应用评估中均实现了最先进的性能。8B尺寸的嵌入模型在MTEB多语言排行榜中位列第一(截至2025年6月5日,得分70.58),而重排序模型则在多种文本检索场景中表现出色。

全面的灵活性:Qwen3 Embedding系列提供了从0.6B到8B全谱系尺寸的嵌入和重排序模型,可满足对效率和效果有不同优先级的多样化使用场景。开发者可以无缝组合这两个模块。此外,嵌入模型支持所有维度的灵活向量定义,且嵌入和重排序模型均支持用户自定义指令,以提升特定任务、语言或场景下的性能。

多语言能力:得益于Qwen3模型的多语言能力,Qwen3 Embedding系列支持超过100种语言,包括多种编程语言,并提供强大的多语言、跨语言和代码检索能力。

模型概述

Qwen3-Embedding-0.6B具有以下特点:

  • 模型类型:文本嵌入
  • 支持语言:100+种语言
  • 参数数量:0.6B
  • 上下文长度:32k
  • 嵌入维度:最高1024,支持用户自定义32至1024范围内的输出维度

有关基准测试评估、硬件要求和推理性能等更多详细信息,请参考我们的博客和GitHub。

Qwen3 嵌入系列模型列表

模型类型模型名称规模层数序列长度嵌入维度MRL 支持指令感知
Text EmbeddingQwen3-Embedding-0.6B0.6B2832K1024是是
Text EmbeddingQwen3-Embedding-4B4B3632K2560是是
Text EmbeddingQwen3-Embedding-8B8B3632K4096是是
Text RerankingQwen3-Reranker-0.6B0.6B2832K--是
Text RerankingQwen3-Reranker-4B4B3632K--是
Text RerankingQwen3-Reranker-8B8B3632K--是

注意:

  • MRL Support 表示嵌入模型是否支持自定义最终嵌入的维度。
  • Instruction Aware 表示嵌入或重排序模型是否支持根据不同任务自定义输入指令。
  • 我们的评估表明,对于大多数下游任务,使用指令(instruct)通常比不使用指令能带来 1% 到 5% 的性能提升。因此,我们建议开发人员针对其特定任务和场景创建定制化指令。在多语言环境中,我们也建议用户用英语编写指令,因为模型训练过程中使用的大多数指令原本都是用英语编写的。

使用方法

当使用 4.51.0 之前版本的 Transformers 时,你可能会遇到以下错误:

KeyError: 'qwen3'

Sentence Transformers 使用方法

# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0

from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer("Qwen/Qwen3-Embedding-0.6B")

# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
#     "Qwen/Qwen3-Embedding-0.6B",
#     model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
#     tokenizer_kwargs={"padding_side": "left"},
# )

# The queries and documents to embed
queries = [
    "What is the capital of China?",
    "Explain gravity",
]
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]

# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)

# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.7646, 0.1414],
#         [0.1355, 0.6000]])

Transformers 使用方法

# Requires transformers>=4.51.0

import torch
import torch.nn.functional as F

from torch import Tensor
from transformers import AutoTokenizer, AutoModel


def last_token_pool(last_hidden_states: Tensor,
                 attention_mask: Tensor) -> Tensor:
    left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
    if left_padding:
        return last_hidden_states[:, -1]
    else:
        sequence_lengths = attention_mask.sum(dim=1) - 1
        batch_size = last_hidden_states.shape[0]
        return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]


def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen3-Embedding-0.6B', padding_side='left')
model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B')

# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained('Qwen/Qwen3-Embedding-0.6B', attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()

max_length = 8192

# Tokenize the input texts
batch_dict = tokenizer(
    input_texts,
    padding=True,
    truncation=True,
    max_length=max_length,
    return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])

# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7645568251609802, 0.14142508804798126], [0.13549736142158508, 0.5999549627304077]]

vLLM 使用方法

# Requires vllm>=0.8.5
import torch
import vllm
from vllm import LLM

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery:{query}'

# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'

queries = [
    get_detailed_instruct(task, 'What is the capital of China?'),
    get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
    "The capital of China is Beijing.",
    "Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents

model = LLM(model="Qwen/Qwen3-Embedding-0.6B", task="embed")

outputs = model.embed(input_texts)
embeddings = torch.tensor([o.outputs.embedding for o in outputs])
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [[0.7620252966880798, 0.14078938961029053], [0.1358368694782257, 0.6013815999031067]]

📌 提示:我们建议开发者根据自身具体场景、任务和语言来自定义 instruct。我们的测试表明,在大多数检索场景中,查询端不使用 instruct 可能会导致检索性能下降约 1% 至 5%。

文本嵌入推理(TEI)使用方法

您可以在 NVIDIA GPU 上运行/部署 TEI,具体如下:

docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B --dtype float16

或在 CPU 设备上使用:

docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-0.6B

然后,通过发送 HTTP POST 请求生成嵌入,具体如下:

curl http://localhost:8080/embed \
    -X POST \
    -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \
    -H "Content-Type: application/json"

评估

MTEB(多语言)

模型大小平均(任务)平均(类型)双语挖掘分类聚类实例检索多标签分类配对分类重排序检索STS
NV-Embed-v27B56.2949.5857.8457.2940.801.0418.6378.9463.8256.7271.10
GritLM-7B7B60.9253.7470.5361.8349.753.4522.7779.9463.7858.3173.33
BGE-M30.6B59.5652.1879.1160.3540.88-3.1120.180.7662.7954.6074.12
multilingual-e5-large-instruct0.6B63.2255.0880.1364.9450.75-0.4022.9180.8662.6157.1276.81
gte-Qwen2-1.5B-instruct1.5B59.4552.6962.5158.3252.050.7424.0281.5862.5860.7871.61
gte-Qwen2-7b-Instruct7B62.5155.9373.9261.5552.774.9425.4885.1365.5560.0873.98
text-embedding-3-large-58.9351.4162.1760.2746.89-2.6822.0379.1763.8959.2771.68
Cohere-embed-multilingual-v3.0-61.1253.2370.5062.9546.89-1.8922.7479.8864.0759.1674.80
Gemini Embedding-68.3759.5979.2871.8254.595.1829.1683.6365.5867.7179.40
Qwen3-Embedding-0.6B0.6B64.3356.0072.2266.8352.335.0924.5980.8361.4164.6476.17
Qwen3-Embedding-4B4B69.4560.8679.3672.3357.1511.5626.7785.0565.0869.6080.86
Qwen3-Embedding-8B8B70.5861.6980.8974.0057.6510.0628.6686.4065.6370.8881.08

注:对比模型的分数来源于2025年5月24日MTEB在线排行榜。

MTEB(英文 v2)

MTEB 英文 / 模型参数规模任务平均值类型平均值分类聚类配对分类重排序检索STS摘要
multilingual-e5-large-instruct0.6B65.5361.2175.5449.8986.2448.7453.4784.7229.89
NV-Embed-v27.8B69.8165.0087.1947.6688.6949.6162.8483.8235.21
GritLM-7B7.2B67.0763.2281.2550.8287.2949.5954.9583.0335.65
gte-Qwen2-1.5B-instruct1.5B67.2063.2685.8453.5487.5249.2550.2582.5133.94
stella_en_1.5B_v51.5B69.4365.3289.3857.0688.0250.1952.4283.2736.91
gte-Qwen2-7B-instruct7.6B70.7265.7788.5258.9785.950.4758.0982.6935.74
gemini-embedding-exp-03-07-73.367.6790.0559.3987.748.5964.3585.2938.28
Qwen3-Embedding-0.6B0.6B70.7064.8885.7654.0584.3748.1861.8386.5733.43
Qwen3-Embedding-4B4B74.6068.1089.8457.5187.0150.7668.4688.7234.39
Qwen3-Embedding-8B8B75.2268.7190.4358.5787.5251.5669.4488.5834.83

C-MTEB(MTEB 中文)

C-MTEB 模型参数规模任务平均值类型平均值分类聚类配对分类重排序检索STS
multilingual-e5-large-instruct0.6B58.0858.2469.8048.2364.5257.4563.6545.81
bge-multilingual-gemma29B67.6475.3159.3086.6768.2873.7355.19-
gte-Qwen2-1.5B-instruct1.5B67.1267.7972.5354.6179.568.2171.8660.05
gte-Qwen2-7B-instruct7.6B71.6272.1975.7766.0681.1669.2475.7065.20
ritrieve_zh_v10.3B72.7173.8576.8866.585.9872.8676.9763.92
Qwen3-Embedding-0.6B0.6B66.3367.4571.4068.7476.4262.5871.0354.52
Qwen3-Embedding-4B4B72.2773.5175.4677.8983.3466.0577.0361.26
Qwen3-Embedding-8B8B73.8475.0076.9780.0884.2366.9978.2163.53

引用

如果您觉得我们的工作对您有所帮助,欢迎引用我们的成果。

@article{qwen3embedding,
  title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
  author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
  journal={arXiv preprint arXiv:2506.05176},
  year={2025}
}