来自 Mixedbread 的轻量型句子嵌入模型系列。
这是我们的 2DMSE 句子嵌入模型。它支持自适应 transformer 层和嵌入大小。更多信息请参见我们的 博客文章。
TLDR:TLDR:2D-🪆 允许您缩减模型和嵌入层。仅缩减嵌入模型可获得与 nomics embeddings model 等其他模型相当的结果。将模型缩减至约 50%,无需进一步训练即可保持高达 85% 的性能。
这里,我们提供了多种生成具有自适应层和嵌入大小的句子嵌入的方法。对于此版本,建议将自适应层设置为 20 到 24。
from openmind import AutoTokenizer, AutoModel, is_torch_npu_available
from openmind_hub import snapshot_download
import torch.nn.functional as F
from torch import Tensor
import openmind
import torch
import argparse
import time
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] # First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument(
"--model_name_or_path",
type=str,
help="Path to model",
default="jeffding/mxbai-embed-2d-large-v1-openmind",
)
args = parser.parse_args()
return args
def main():
args = parse_args()
model_path = args.model_name_or_path
if is_torch_npu_available():
device = "npu:0"
else:
device = "cpu"
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained(model_path,trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).to(device)
start_time = time.time()
sentences = ['如何更换花呗绑定银行卡', 'How to replace the Huabei bundled bank card']
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt').to(device)
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("Sentence embeddings:")
print(sentence_embeddings)
end_time = time.time()
print(f"硬件环境:{device},推理执行时间:{end_time - start_time}秒")
if __name__ == "__main__":
main()目前,使用我们模型的最佳方式是借助最新版本的 sentence-transformers。
python -m pip install -U sentence-transformersfrom sentence_transformers import models, SentenceTransformer
from sentence_transformers.util import cos_sim
# 1. load model with `cls` pooling
model = SentenceTransformer("mixedbread-ai/mxbai-embed-2d-large-v1")
# 2. set adaptive layer and embedding size.
# it is recommended to set layers from 20 to 24.
new_num_layers = 22 # 1D: set layer size
model[0].auto_model.encoder.layer = model[0].auto_model.encoder.layer[:new_num_layers]
new_embedding_size = 768 # 2D: set embedding size
# 3. encode
embeddings = model.encode(
[
'Who is german and likes bread?',
'Everybody in Germany.'
]
)
# Similarity of the first sentence with the other two
similarities = cos_sim(embeddings[0, :new_embedding_size], embeddings[1, :new_embedding_size])
print('similarities:', similarities)您也可以使用最新的 angle-emb 进行推理,方法如下:
python -m pip install -U angle-embfrom angle_emb import AnglE
from sentence_transformers.util import cos_sim
# 1. load model
model = AnglE.from_pretrained("mixedbread-ai/mxbai-embed-2d-large-v1", pooling_strategy='cls').cuda()
# 2. set adaptive layer and embedding size.
# it is recommended to set layers from 20 to 24.
layer_index = 22 # 1d: layer
embedding_size = 768 # 2d: embedding size
# 3. encode
embeddings = model.encode([
'Who is german and likes bread?',
'Everybody in Germany.'
], layer_index=layer_index, embedding_size=embedding_size)
similarities = cos_sim(embeddings[0], embeddings[1:])
print('similarities:', similarities)如果您尚未安装,可以通过以下命令从 NPM 安装 Transformers.js JavaScript 库:
npm i @xenova/transformers然后,您可以按以下方式使用该模型来计算嵌入:
import { pipeline, cos_sim } from '@xenova/transformers';
// Create a feature-extraction pipeline
const extractor = await pipeline('feature-extraction', 'mixedbread-ai/mxbai-embed-2d-large-v1', {
quantized: false, // (Optional) remove this line to use the 8-bit quantized model
});
// Compute sentence embeddings (with `cls` pooling)
const sentences = ['Who is german and likes bread?', 'Everybody in Germany.' ];
const output = await extractor(sentences, { pooling: 'cls' });
// Set embedding size and truncate embeddings
const new_embedding_size = 768;
const truncated = output.slice(null, [0, new_embedding_size]);
// Compute cosine similarity
console.log(cos_sim(truncated[0].data, truncated[1].data)); // 0.6979532021425204您可以通过以下方式借助我们的 API 使用该模型:
from mixedbread_ai.client import MixedbreadAI
from sklearn.metrics.pairwise import cosine_similarity
import os
mxbai = MixedbreadAI(api_key="{MIXEDBREAD_API_KEY}")
english_sentences = [
'What is the capital of Australia?',
'Canberra is the capital of Australia.'
]
res = mxbai.embeddings(
input=english_sentences,
model="mixedbread-ai/mxbai-embed-2d-large-v1",
dimensions=512,
)
embeddings = [entry.embedding for entry in res.data]
similarities = cosine_similarity([embeddings[0]], [embeddings[1]])
print(similarities)该 API 原生支持 INT8 和二进制量化!更多信息请查看 文档。
更多信息请参见我们的 博客文章。
欢迎加入我们的 Discord 社区,分享您的反馈和想法!我们随时为您提供帮助,也很乐意与您交流。
Apache 2.0