AI News

AI model comparison — quality, price and open source

The leading AI models from the US, Europe and China, compared by quality (market benchmarks), cost in USD per million tokens and open-source status.

Data as of 2026-06-25 · automated research (Artificial Analysis, LMArena, official pricing) — verify before deciding.

🏆 Quality (SW dev + arena)

ModelQualitySWE-bench-ProLiveCodeBenchTerminal-BenchGPQAARC-AGI-2LMArena
🇺🇸 Claude Opus 4.8Anthropic · USA65.469.274.684141455
🇺🇸 GPT-5.5OpenAI · USA63.558.682.785161445
🇨🇳 DeepSeek V4-ProDeepSeek · China60.515.5683.339.68291465
🇺🇸 Gemini 3.1 ProGoogle · USA59.954.268.584151470
🇨🇳 GLM-5.2Zhipu AI · China55.482.840.57871450
🇺🇸 Grok 4.3xAI · USA49.379.484161445
🇺🇸 MAI-Thinking-1Microsoft · USA48.752.887.746.084.2
🇺🇸 Claude Sonnet 4.6Anthropic · USA41.959.18091430
🇺🇸 Llama 4 MaverickMeta · USA40.743.47051420
🇨🇳 Qwen3.7-MaxAlibaba · China33.18171480
🇨🇳 Kimi K2.6Moonshot AI · China32.87891460
🇪🇺 Mistral Large 3 (25.12)Mistral AI · Europa32.27261410
🇪🇺 Magistral Small 1.2Mistral AI · Europa21.270.8870.074

Quality = our own 0-100 index weighting SWE-bench-Pro and LiveCodeBench (SW dev), Terminal-Bench (OS control), LMArena (human preference) and GPQA; ARC-AGI-2 is NOT in the index (informational). ARC-AGI-2 (arcprize.org) tracks AGI progress: models score VERY low → still far from AGI. %, except LMArena (Elo).

💵 Economics (USD / 1M tokens)

ModelInputCacheOutput
🇺🇸 Claude Opus 4.8Anthropic · USA$5.0$0.5$25.0
🇺🇸 GPT-5.5OpenAI · USA$5.0$0.5$30.0
🇨🇳 DeepSeek V4-ProDeepSeek · China$0.28$0.03$0.87
🇺🇸 Gemini 3.1 ProGoogle · USA$1.25$0.31$10.0
🇨🇳 GLM-5.2Zhipu AI · China$0.6$0.11$2.2
🇺🇸 Grok 4.3xAI · USA$3.0$0.75$15.0
🇺🇸 MAI-Thinking-1Microsoft · USA
🇺🇸 Claude Sonnet 4.6Anthropic · USA$3.0$0.3$15.0
🇺🇸 Llama 4 MaverickMeta · USA$0.2$0.6
🇨🇳 Qwen3.7-MaxAlibaba · China$1.2$0.6$6.0
🇨🇳 Kimi K2.6Moonshot AI · China$0.6$0.15$2.5
🇪🇺 Mistral Large 3 (25.12)Mistral AI · Europa$2.0$6.0
🇪🇺 Magistral Small 1.2Mistral AI · Europa$0.5$1.5

🔓 Open source & type

ModelOpen sourceLicenseType
🇺🇸 Claude Opus 4.8Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 GPT-5.5OpenAI · USANoProprietaryProprietary (API only)
🇨🇳 DeepSeek V4-ProDeepSeek · ChinaYesMITOpen-weight
🇺🇸 Gemini 3.1 ProGoogle · USANoProprietaryProprietary (API only)
🇨🇳 GLM-5.2Zhipu AI · ChinaYesMITOpen-weight
🇺🇸 Grok 4.3xAI · USANoProprietaryProprietary (API only)
🇺🇸 MAI-Thinking-1Microsoft · USANoProprietaryProprietary (API only)
🇺🇸 Claude Sonnet 4.6Anthropic · USANoProprietaryProprietary (API only)
🇺🇸 Llama 4 MaverickMeta · USAYesLlama 4 CommunityOpen-weight
🇨🇳 Qwen3.7-MaxAlibaba · ChinaNoProprietaryProprietary (API only)
🇨🇳 Kimi K2.6Moonshot AI · ChinaYesModified MITOpen-weight
🇪🇺 Mistral Large 3 (25.12)Mistral AI · EuropaYesMistral Research License (no comercial)Open-weight
🇪🇺 Magistral Small 1.2Mistral AI · EuropaYesApache-2.0Open-weight

🖥️ Open source you can self-host

Small/medium models you can run locally. Memory estimated at 4-bit (Q4) and 8-bit (Q8) quantization; on Apple Silicon it is UNIFIED memory (RAM=VRAM).

ModelQualitySWE-bench-ProLiveCodeBenchGPQAParamsRAM Q4RAM Q8GPU (VRAM)CPU / MacLicense
Gemma 3 27BGoogle29.629.724.327B16 GB31 GB≥16 GBLimitado (mejor GPU/Mac ≥32 GB)Gemma
Qwen3-32BAlibaba19.060.668.432.8B20 GB38 GB≥24 GBLimitado (mejor GPU/Mac ≥32 GB)Apache-2.0
Qwen3-8BAlibaba18.460.363.38.2B6 GB11 GB≥8 GBSí (CPU/Mac, fluido)Apache-2.0
DeepSeek-R1-Distill-Qwen-14BDeepSeek16.553.159.114B9 GB17 GB≥12 GBSí (CPU lento · Mac 16 GB)MIT
Phi-4Microsoft5.656.114.7B10 GB18 GB≥12 GBSí (CPU lento · Mac 16 GB)MIT
Mistral Small 3Mistral AI4.545.324B15 GB28 GB≥16 GBLimitado (mejor GPU/Mac ≥32 GB)Apache-2.0
Llama 3.1 8BMeta3.030.48B6 GB10 GB≥8 GBSí (CPU/Mac, fluido)Llama 3.1 Community
Gemma 3 12BGoogle2.525.412B8 GB15 GB≥8 GBSí (CPU lento · Mac 16 GB)Gemma
Gemma 3 4BGoogle1.515.04B4 GB6 GB≥8 GBSí (CPU/Mac, fluido)Gemma