DeepSeek-V4-Pro-w4a8-mtp

This is a quantized variant of deepseek-ai/DeepSeek-V4-Pro with W4A8 weight-activation quantization and Multi-Token Prediction (MTP) support.

Model Details

Base Model: DeepSeek-V4-Pro
Quantization: W4A8 (4-bit weights, 8-bit activations)
Architecture: deepseek_v4
Library: transformers

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "DeepSeek-V4-Pro-w4a8-mtp",
    trust_remote_code=True,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DeepSeek-V4-Pro-w4a8-mtp", trust_remote_code=True)

License

This model inherits the MIT license from the original DeepSeek-V4-Pro model.