HuggingFace镜像/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit

Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit:可在 Apple Silicon 设备上本地运行，用于技术规划、复杂逻辑谜题及高风险决策支持。它是 Qwen3.5-27B 模型的 4 位 MLX 量化版本，优化了推理能力，减小至 14GB，支持 24GB+ 内存的 Mac 设备。【此简介由AI生成】 - AtomGit AI社区

Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-4bit-MLX

由 BeastCode 进行量化

这是 Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled 的高性能4位MLX量化版本。专为Apple Silicon（M系列芯片）优化，可在本地提供深度、智能体级别的推理能力。

原始BF16权重大小为55.6 GB。经过此次转换后，模型体积缩减至14 GB，使得任何配备24 GB及以上统一内存的Mac都能流畅运行，且仍有足够空间支持大上下文窗口。

🧠 为何选择此模型？

大多数本地LLM属于"反应式"——它们在尚未完全梳理清楚逻辑之前就开始生成响应。而本模型则是深思熟虑型的。

它从最先进的Claude 4.6 Opus推理轨迹中蒸馏而来，采用高级思维链（CoT）框架。在给出最终答案前，它会进入一个内部</think>状态，在此状态下：

解构复杂的多层提示，将其分解为可管理的子任务
模拟不同的解决路径，并在您看到结果之前自我纠正逻辑错误
减少冗余，采用Claude的结构化思维模式，而非基础推理模型中常见的循环思维

这使它成为Apple硬件上技术规划、复杂逻辑谜题和高风险决策支持的首选模型。

📊 性能基准测试

在Apple M4 Pro（64 GB）上测试 · mlx-lm 0.30.7 · macOS 15

指标	结果
模型加载时间	2.4秒
提示处理速度	86.5 tokens/秒
生成速度	15.7 tokens/秒
峰值内存占用	15.6 GB
比特率	4.501 bits/weight
最终大小	14 GB（3个分片）

💻 系统要求


硬件	Apple Silicon Mac（M1、M2、M3、M4或更高版本）
最低内存	24 GB统一内存
推荐内存	32 GB以上（为长上下文推理预留空间）
操作系统	macOS 13.5或更高版本
Python	3.10+（推荐Homebrew Python 3.12）

🚀 快速开始

1. 安装 MLX 库

pip install mlx-lm

2. 在终端中运行

python -m mlx_lm.chat \
  --model BeastCode/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit

3. Python 集成 — 推荐方法

使用 apply_chat_template 并设置 enable_thinking=True。这是触发推理模式的规范方式，无需手动构建提示词。

from mlx_lm import load, generate

model, tokenizer = load("BeastCode/Qwen3.5-27B-Claude-4.6-Opus-Distilled-MLX-4bit")

messages = [
    {
        "role": "user",
        "content": (
            "A farmer needs to cross a river with a wolf, a goat, and a cabbage. "
            "The boat can only hold the farmer and one other item. "
            "If left alone, the wolf eats the goat, and the goat eats the cabbage. "
            "How can he get everything across safely?"
        ),
    }
]

# enable_thinking=True inserts the ` 标记。这两种方法会产生完全相同的结果。

</details>

```python
prompt = (
    "<|im_start|>system\n"
    "You are a highly analytical assistant.\n"
    "<|im_end|>\n"
    "<|im_start|>user\n"
    "A farmer needs to cross a river with a wolf, a goat, and a cabbage. "
    "The boat can only hold the farmer and one other item. "
    "If left alone, the wolf eats the goat, and the goat eats the cabbage. "
    "How can he get everything across safely?\n"
    "<|im_end|>\n"
    "<|im_start|>assistant\n"
    "` 模块对于验证逻辑非常宝贵，但为了界面更简洁，你可能希望将其移除：

```python
import re

def strip_thinking(text: str) -> str:
    """Remove the internal \s*', '', text, flags=re.DOTALL).strip()

🏆 模型对比

模型	大小	推理风格	硬件目标
本模型 (27B)	14 GB	Claude 4.6 蒸馏版	24 GB+ 苹果电脑
Qwen3.5-9B	~5 GB	快速/直观	基础款 8 GB / 16 GB 苹果电脑
Qwen3.5-72B	~42 GB	深度/详尽	64 GB+ Ultra/Max 机型

🙏 致谢

核心权重： 阿里巴巴 Qwen 团队 — Qwen 3.5 27B
推理微调： Jackrong 所做的 Claude 4.6 Opus 蒸馏工作
推理引擎： 苹果 MLX 团队，让苹果芯片实现高速本地推理成为可能