Reka Edge 是一款高效的 70 亿参数多模态视觉语言模型,可接受图像/视频+文本输入并生成文本输出。该模型经过专门优化,在图像理解、视频分析、目标检测和智能体工具使用方面展现出行业领先的性能。
通过我们的公告博客文章了解更多关于 Reka Edge 的信息。
| 基准测试 | Reka Edge | Cosmos-Reason2 8B | Qwen 3.5 9B | Gemini 3 Pro |
|---|---|---|---|---|
| VQA-V2 视觉问答 | 88.40 | 79.82 | 83.22 | 89.78 |
| MLVU 视频理解 | 74.30 | 37.85 | 52.39 | 80.68 |
| MMVU 多模态视频理解 | 71.68 | 51.52 | 68.64 | 78.88 |
| RefCOCO-A 目标检测 | 93.13 | 90.98 | 93.62 | 81.46 |
| RefCOCO-B 目标检测 | 86.70 | 85.74 | 88.83 | 82.85 |
| VideoHallucer 幻觉检测 | 59.57 | 51.65 | 56.00 | 66.78 |
| Mobile Actions 工具使用 | 88.40 | 77.94 | 91.78 | 89.39 |
| 指标 | Reka Edge | Cosmos-Reason2 8B | Qwen 3.5 9B | Gemini 3 Pro* |
|---|---|---|---|---|
| 输入 tokens 针对 1024 x 1024 图像 | 331 | 1063 | 1041 | 1094 |
| 端到端延迟(秒) | 4.69 ± 2.48 | 10.56 ± 3.47 | 10.31 ± 1.81 | 16.67 ± 4.47 |
| TTFT (秒) 首 token 生成时间 | 0.522 ± 0.452 | 0.844 ± 0.923 | 0.60 ± 0.65 | 13.929 ± 3.872 |
*Gemini 3 Pro 通过 API 调用测量;其他模型通过本地推理测量。
开始使用:
cmake -B build
cmake --build build --target llama-server -j
cmake --build build --target llama-quantize -jconvert_reka_vlm_to_gguf.py)python3 convert_reka_vlm_to_gguf.py /path/to/reka/weights \
--outfile /path/to/reka-text-f16.gguf \
--outtype f16
# Export the vision encoder
python3 convert_reka_vlm_to_gguf.py /path/to/reka/weights \
--mmproj \
--outfile /path/to/reka-mmproj-f16.gguf \
--outtype f16quantize_reka_...)对模型进行简单量化# Example usage for text decoder quantization
bash inference/hf_release/quantize_reka_q4_last8_q8.sh /path/to/reka-text-f16.gguf /path/to/reka-text-q4_last8_q8.gguf./build/bin/llama-server -m /path/to/reka-text-f16.gguf \
--mmproj /path/to/reka-mmproj-f16.gguf \
-t 8 -c 2048 --host 0.0.0.0 --port 8080 --reasoning off \运行该模型最简单的方法是使用随附的 example.py 脚本。它采用 PEP 723 内联元数据,因此 uv 会自动解析依赖项,无需手动安装步骤:
uv run example.py --image media/hamburger.jpg --prompt "What is in this image?"通过量化,Reka Edge 还可在以下设备上运行:
如需将 Reka Edge 部署到自定义边缘计算平台的支持,请联系我们。
如果您不想使用该脚本,可手动安装依赖项并粘贴以下代码:
uv pip install "transformers==4.57.3" torch torchvision pillow tiktoken imageio einops avimport torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
model_id = "RekaAI/reka-edge-2603"
# Load processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
).eval()
# Move to MPS (Apple Silicon GPU)
device = torch.device("mps")
model = model.to(device)
# Prepare an image + text query
image_path = "media/hamburger.jpg" # included in the model repo
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "What is in this image?"},
],
}
]
# Tokenize using the chat template
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
)
# Move tensors to device
for key, val in inputs.items():
if isinstance(val, torch.Tensor):
if val.is_floating_point():
inputs[key] = val.to(device=device, dtype=torch.float16)
else:
inputs[key] = val.to(device=device)
# Generate
with torch.inference_mode():
# Stop on <sep> token (end-of-turn) in addition to default EOS
sep_token_id = processor.tokenizer.convert_tokens_to_ids("<sep>")
output_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
eos_token_id=[processor.tokenizer.eos_token_id, sep_token_id],
)
# Decode only the generated tokens
input_len = inputs["input_ids"].shape[1]
new_tokens = output_ids[0, input_len:]
output_text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
# Strip any trailing <sep> turn-boundary marker
output_text = output_text.replace("<sep>", "").strip()
print(output_text)该模型也支持视频输入。使用 --video 替代 --image:
uv run example.py --video media/dashcam.mp4 --prompt "Is this person falling asleep?"messages = [
{
"role": "user",
"content": [
{"type": "video", "video": "media/dashcam.mp4"},
{"type": "text", "text": "Is this person falling asleep?"},
],
}
]给定输入图像,我们使用 Detect: {expression} 来指示模型执行目标检测,其中 {expression} 可以描述单个对象或多个对象。
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "Detect: red car, man in the white"},
],
}
]从内容列表中省略图像条目:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is the capital of France?"},
],
}
]然后运行与上述相同的分词和生成步骤。
bfloat16。请始终使用 torch.float16。不要使用 device_map="auto"——它与 MPS 不兼容。请先将模型加载到 CPU,然后调用 .to("mps")。transformers==4.57.3 导出的。使用其他版本可能会导致加载错误或异常行为。如需高吞吐量服务,您可以使用 vllm-reka 插件。此插件扩展了标准的 vLLM,以支持 Reka 的自定义架构和优化的分词器。
请按照我们的 vllm-reka 安装说明 安装该插件以及 vLLM。
您可以通过运行 vllm-reka 中的 serve.sh 脚本启动兼容 OpenAI 的 API 服务器,并将 $MODEL_PATH 设置为 RekaAI/reka-edge-2603。
bash serve.sh我们在此默认启用 BitsAndBytes 量化以减少内存占用。若要禁用量化,请从 server.sh 中移除 --quantization 标志。
服务器运行后,您可以使用 OpenAI API 格式发送请求:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
timeout=3600
)
# Video query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}},
{"type": "text", "text": "Describe the video"},
],
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Image query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "What is in this image?"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Object detection query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "Detect: green banana"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Text-only query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": "What is the capital of France?",
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)**trust_remote_code=True**,因为该模型使用了自定义架构代码(Yasa2ForConditionalGeneration),此代码捆绑在本仓库中,并通过 auto_map 配置进行加载。