Skip to content

Open-source models

Frontier-class, but you own the weights.

Llama 4 Maverick · DeepSeek V3.2 · Qwen3 235B · Mistral Large 3 — self-host, fine-tune, run anywhere. No data leaves your infra.

Llama 4 MaverickDeepSeek V3.2Qwen3 235BMistral Large 3

Open-source flagships

Model pricing

ProviderModelContextInput /MOutput /M
Meta
Llama 4 Maverick ★ latest1M$0.15$0.60
Llama 4 Scout10M$0.08$0.30
DeepSeek
DeepSeek V3.2 ★ latest131K$0.25$0.38
DeepSeek R1-0528164K$0.50$2.15
V4-Flash1M$0.14$0.28
V4-Pro1M$0.44$0.87
Alibaba
Qwen3 235B ★ latest256K$0.46$1.82
Qwen3 32B41K$0.08$0.28
Qwen3 30B-A3B41K$0.08$0.28
MMistral
Mistral Large 3 ★ latest256K$0.50$1.50
Mistral Medium 3131K$0.40$2.00
Mistral Small 3.2131K$0.075$0.20
USD per million tokens. All models open weights (MIT or Apache 2.0). DeepSeek V4 prices reflect official API promotion through May 2026. Sources: OpenRouter · DeepSeek · Qwen · Mistral — May 2026.

Benchmarks

Expert knowledge
MMLU-Pro
80.585.083.073.1
M
Graduate-level STEM
GPQA Diamond
69.882.477.543.9
M
Broad knowledge
MMLU
85.588.593.185.5
M
Competitive coding
LiveCodeBench
43.473.3†51.8

Performance comparison

Benchmark

Llama 4 Mav

DeepSeek V3.2

Qwen3 235B

MMistral Lg 3

Broad knowledge
MMLU
85.5%88.5%93.1%~85.5%
Expert reasoning
MMLU-Pro
80.5%85.0%83.0%73.1%
Graduate-level STEM
GPQA Diamond
69.8%82.4%77.5%~43.9%
Agentic coding
SWE-bench Verified
~34%73.1%
Competitive coding
LiveCodeBench
43.4%73.3%†51.8%
Context window
max tokens
1M131K256K256K

Bold values indicate the highest score per benchmark. †LiveCodeBench score is for DeepSeek R1-0528 (reasoning model); DeepSeek V3.2 has no published LiveCodeBench score. Sources: official model cards (Meta, DeepSeek, Qwen3, Mistral), DeepSeek-V3.2 technical report (arXiv:2512.02556), and CodeSOTA Open LLM Leaderboard (codesota.com, May 2026). Mistral GPQA is a third-party estimate. — = no published score.

Pick the right model

01

Long-context document and code analysis

Use Llama 4 Scout — 10M token context, the longest of any open-weight model. Ingest a full codebase, year-long conversation logs, or a multi-volume document set without chunking. Llama 4 Community License; self-host royalty-free.

02

Deep reasoning, math, and agentic coding

Use DeepSeek R1-0528 — leads coding benchmarks (73.3% LiveCodeBench, 73.1% SWE-bench Verified). MIT licensed; deploy on your own GPU cluster for near-frontier reasoning at a fraction of closed-model cost.

03

Multilingual and Asian-language tasks

Use Qwen3 235B — highest MMLU in this lineup (93.1%), trained on 36+ languages with strong coverage of Chinese, Japanese, Korean, and Arabic. Apache 2.0 licensed; fine-tune for language-specific domains without restrictions.

04

EU-regulated and privacy-sensitive workloads

Use Mistral Large 3 — built by a French AI lab, deployable on AWS Paris or Azure EU, GDPR-native with no data leaving European infrastructure. Apache 2.0 licensed; multimodal (text + image) at predictable cost.

Open source vs. frontier

On LIFEOSAI