AI model comparison — quality, price and open source
The leading AI models from the US, Europe and China, compared by quality (market benchmarks), cost in USD per million tokens and open-source status.
Data as of 2026-06-25 · automated research (Artificial Analysis, LMArena, official pricing) — verify before deciding.
🏆 Quality (SW dev + arena)
| Model | Quality | SWE-bench-Pro | LiveCodeBench | Terminal-Bench | GPQA | ARC-AGI-2 | LMArena |
|---|---|---|---|---|---|---|---|
| 🇺🇸 Claude Opus 4.8Anthropic · USA | 65.4 | 69.2 | — | 74.6 | 84 | 14 | 1455 |
| 🇺🇸 GPT-5.5OpenAI · USA | 63.5 | 58.6 | — | 82.7 | 85 | 16 | 1445 |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | 60.5 | 15.56 | 83.3 | 39.6 | 82 | 9 | 1465 |
| 🇺🇸 Gemini 3.1 ProGoogle · USA | 59.9 | 54.2 | — | 68.5 | 84 | 15 | 1470 |
| 🇨🇳 GLM-5.2Zhipu AI · China | 55.4 | — | 82.8 | 40.5 | 78 | 7 | 1450 |
| 🇺🇸 Grok 4.3xAI · USA | 49.3 | — | 79.4 | — | 84 | 16 | 1445 |
| 🇺🇸 MAI-Thinking-1Microsoft · USA | 48.7 | 52.8 | 87.7 | 46.0 | 84.2 | — | — |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | 41.9 | — | — | 59.1 | 80 | 9 | 1430 |
| 🇺🇸 Llama 4 MaverickMeta · USA | 40.7 | — | 43.4 | — | 70 | 5 | 1420 |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | 33.1 | — | — | — | 81 | 7 | 1480 |
| 🇨🇳 Kimi K2.6Moonshot AI · China | 32.8 | — | — | — | 78 | 9 | 1460 |
| 🇪🇺 Mistral Large 3 (25.12)Mistral AI · Europa | 32.2 | — | — | — | 72 | 6 | 1410 |
| 🇪🇺 Magistral Small 1.2Mistral AI · Europa | 21.2 | — | 70.88 | — | 70.07 | 4 | — |
Quality = our own 0-100 index weighting SWE-bench-Pro and LiveCodeBench (SW dev), Terminal-Bench (OS control), LMArena (human preference) and GPQA; ARC-AGI-2 is NOT in the index (informational). ARC-AGI-2 (arcprize.org) tracks AGI progress: models score VERY low → still far from AGI. %, except LMArena (Elo).
💵 Economics (USD / 1M tokens)
| Model | Input | Cache | Output |
|---|---|---|---|
| 🇺🇸 Claude Opus 4.8Anthropic · USA | $5.0 | $0.5 | $25.0 |
| 🇺🇸 GPT-5.5OpenAI · USA | $5.0 | $0.5 | $30.0 |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | $0.28 | $0.03 | $0.87 |
| 🇺🇸 Gemini 3.1 ProGoogle · USA | $1.25 | $0.31 | $10.0 |
| 🇨🇳 GLM-5.2Zhipu AI · China | $0.6 | $0.11 | $2.2 |
| 🇺🇸 Grok 4.3xAI · USA | $3.0 | $0.75 | $15.0 |
| 🇺🇸 MAI-Thinking-1Microsoft · USA | — | — | — |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | $3.0 | $0.3 | $15.0 |
| 🇺🇸 Llama 4 MaverickMeta · USA | $0.2 | — | $0.6 |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | $1.2 | $0.6 | $6.0 |
| 🇨🇳 Kimi K2.6Moonshot AI · China | $0.6 | $0.15 | $2.5 |
| 🇪🇺 Mistral Large 3 (25.12)Mistral AI · Europa | $2.0 | — | $6.0 |
| 🇪🇺 Magistral Small 1.2Mistral AI · Europa | $0.5 | — | $1.5 |
🔓 Open source & type
| Model | Open source | License | Type |
|---|---|---|---|
| 🇺🇸 Claude Opus 4.8Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 GPT-5.5OpenAI · USA | No | Proprietary | Proprietary (API only) |
| 🇨🇳 DeepSeek V4-ProDeepSeek · China | Yes | MIT | Open-weight |
| 🇺🇸 Gemini 3.1 ProGoogle · USA | No | Proprietary | Proprietary (API only) |
| 🇨🇳 GLM-5.2Zhipu AI · China | Yes | MIT | Open-weight |
| 🇺🇸 Grok 4.3xAI · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 MAI-Thinking-1Microsoft · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Claude Sonnet 4.6Anthropic · USA | No | Proprietary | Proprietary (API only) |
| 🇺🇸 Llama 4 MaverickMeta · USA | Yes | Llama 4 Community | Open-weight |
| 🇨🇳 Qwen3.7-MaxAlibaba · China | No | Proprietary | Proprietary (API only) |
| 🇨🇳 Kimi K2.6Moonshot AI · China | Yes | Modified MIT | Open-weight |
| 🇪🇺 Mistral Large 3 (25.12)Mistral AI · Europa | Yes | Mistral Research License (no comercial) | Open-weight |
| 🇪🇺 Magistral Small 1.2Mistral AI · Europa | Yes | Apache-2.0 | Open-weight |
🖥️ Open source you can self-host
Small/medium models you can run locally. Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).
| Model | Quality | SWE-bench-Pro | LiveCodeBench | GPQA | Params | RAM Q4 | RAM Q8 | GPU (VRAM) | CPU / Mac | License |
|---|---|---|---|---|---|---|---|---|---|---|
| Gemma 3 27BGoogle | 29.6 | — | 29.7 | 24.3 | 27B | 16 GB | 31 GB | ≥16 GB | Limitado (mejor GPU/Mac ≥32 GB) | Gemma |
| Qwen3-32BAlibaba | 19.0 | — | 60.6 | 68.4 | 32.8B | 20 GB | 38 GB | ≥24 GB | Limitado (mejor GPU/Mac ≥32 GB) | Apache-2.0 |
| Qwen3-8BAlibaba | 18.4 | — | 60.3 | 63.3 | 8.2B | 6 GB | 11 GB | ≥8 GB | Sí (CPU/Mac, fluido) | Apache-2.0 |
| DeepSeek-R1-Distill-Qwen-14BDeepSeek | 16.5 | — | 53.1 | 59.1 | 14B | 9 GB | 17 GB | ≥12 GB | Sí (CPU lento · Mac 16 GB) | MIT |
| Phi-4Microsoft | 5.6 | — | — | 56.1 | 14.7B | 10 GB | 18 GB | ≥12 GB | Sí (CPU lento · Mac 16 GB) | MIT |
| Mistral Small 3Mistral AI | 4.5 | — | — | 45.3 | 24B | 15 GB | 28 GB | ≥16 GB | Limitado (mejor GPU/Mac ≥32 GB) | Apache-2.0 |
| Llama 3.1 8BMeta | 3.0 | — | — | 30.4 | 8B | 6 GB | 10 GB | ≥8 GB | Sí (CPU/Mac, fluido) | Llama 3.1 Community |
| Gemma 3 12BGoogle | 2.5 | — | — | 25.4 | 12B | 8 GB | 15 GB | ≥8 GB | Sí (CPU lento · Mac 16 GB) | Gemma |
| Gemma 3 4BGoogle | 1.5 | — | — | 15.0 | 4B | 4 GB | 6 GB | ≥8 GB | Sí (CPU/Mac, fluido) | Gemma |