HuggingFace镜像/mxbai-embed-2d-large-v1-openmind

来自 Mixedbread 的轻量型句子嵌入模型系列。

🪆mxbai-embed-2d-large-v1🪆

这是我们的 2DMSE 句子嵌入模型。它支持自适应 transformer 层和嵌入大小。更多信息请参见我们的博客文章。

TLDR：TLDR：2D-🪆 允许您缩减模型和嵌入层。仅缩减嵌入模型可获得与 nomics embeddings model 等其他模型相当的结果。将模型缩减至约 50%，无需进一步训练即可保持高达 85% 的性能。

快速开始

这里，我们提供了多种生成具有自适应层和嵌入大小的句子嵌入的方法。对于此版本，建议将自适应层设置为 20 到 24。

使用 Openmind

from openmind import AutoTokenizer, AutoModel, is_torch_npu_available
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import time

# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--model_name_or_path",
        type=str,
        help="Path to model",
        default="jeffding/mxbai-embed-2d-large-v1-openmind",
    )
    args = parser.parse_args()
    return args

def main():
    args = parse_args()
    model_path = args.model_name_or_path

    if is_torch_npu_available():
        device = "npu:0"
    else:
        device = "cpu"
        
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
    model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(device)
    start_time = time.time()
    sentences = ['如何更换花呗绑定银行卡', 'How to replace the Huabei bundled bank card']
    # Tokenize sentences
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(device)

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)
    # Perform pooling. In this case, mean pooling.
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
    print("Sentence embeddings:")
    print(sentence_embeddings)
    
    end_time = time.time()
    print(f"硬件环境：{device},推理执行时间：{end_time - start_time}秒")
    
if __name__ == "__main__":
    main()

sentence-transformers

目前，使用我们模型的最佳方式是借助最新版本的 sentence-transformers。

python -m pip install -U sentence-transformers

from sentence_transformers import models, SentenceTransformer
from sentence_transformers.util import cos_sim

# 1. load model with `cls` pooling
model = SentenceTransformer("mixedbread-ai/mxbai-embed-2d-large-v1")

# 2. set adaptive layer and embedding size.
# it is recommended to set layers from 20 to 24.
new_num_layers = 22  # 1D: set layer size
model[0].auto_model.encoder.layer = model[0].auto_model.encoder.layer[:new_num_layers]

new_embedding_size = 768  # 2D: set embedding size

# 3. encode
embeddings = model.encode(
    [
        'Who is german and likes bread?',
        'Everybody in Germany.'
    ]
)

# Similarity of the first sentence with the other two
similarities = cos_sim(embeddings[0, :new_embedding_size], embeddings[1, :new_embedding_size])

print('similarities:', similarities)

angle-emb

您也可以使用最新的 angle-emb 进行推理，方法如下：

python -m pip install -U angle-emb

from angle_emb import AnglE
from sentence_transformers.util import cos_sim

# 1. load model
model = AnglE.from_pretrained("mixedbread-ai/mxbai-embed-2d-large-v1", pooling_strategy='cls').cuda()

# 2. set adaptive layer and embedding size.
# it is recommended to set layers from 20 to 24.
layer_index = 22  # 1d: layer
embedding_size = 768  # 2d: embedding size

# 3. encode
embeddings = model.encode([
    'Who is german and likes bread?',
    'Everybody in Germany.'
], layer_index=layer_index, embedding_size=embedding_size)

similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)

Transformers.js

如果您尚未安装，可以通过以下命令从 NPM 安装 Transformers.js JavaScript 库：

npm i @xenova/transformers

然后，您可以按以下方式使用该模型来计算嵌入：

import { pipeline, cos_sim } from '@xenova/transformers';
// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'mixedbread-ai/mxbai-embed-2d-large-v1', {
    quantized: false, // (Optional) remove this line to use the 8-bit quantized model
});

// Compute sentence embeddings (with `cls` pooling)
const sentences = ['Who is german and likes bread?', 'Everybody in Germany.' ];
const output = await extractor(sentences, { pooling: 'cls' });

// Set embedding size and truncate embeddings
const new_embedding_size = 768;
const truncated = output.slice(null, [0, new_embedding_size]);

// Compute cosine similarity
console.log(cos_sim(truncated[0].data, truncated[1].data)); // 0.6979532021425204

使用 API

您可以通过以下方式借助我们的 API 使用该模型：

from mixedbread_ai.client import MixedbreadAI
from sklearn.metrics.pairwise import cosine_similarity
import os

mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")

english_sentences = [
    'What is the capital of Australia?',
    'Canberra is the capital of Australia.'
] 

res = mxbai.embeddings(
     input=english_sentences,
     model="mixedbread-ai/mxbai-embed-2d-large-v1",
     dimensions=512,
)
embeddings = [entry.embedding for entry in res.data]

similarities = cosine_similarity([embeddings[0]], [embeddings[1]])
print(similarities)

该 API 原生支持 INT8 和二进制量化！更多信息请查看文档。

评估

更多信息请参见我们的博客文章。

社区

欢迎加入我们的 Discord 社区，分享您的反馈和想法！我们随时为您提供帮助，也很乐意与您交流。

许可证

Apache 2.0