Predicted Model Performance
Win rates in head-to-head battles against GPT-5
Snapshot taken on August 6th, 2025
📈 Dataset Overview
Total predictions:5,653,090
Unique users:105,551
Unique skills:9
Models predicted:50
🏆 GPT-5 Dominance Analysis
Overall win rate:73.1%
Total comparisons:5,653,090
Total wins:4,134,241
Community consensus: Users predict GPT-5 will outperform every other model across all skills.
Model | Average vs GPT-5 | Code generation | Document summaries | Empathy when delivering bad news | Ethical loophole navigation | Harm avoidance | Hidden messages | Image generation | Persuasiveness | Respect no Em Dash Requests |
---|---|---|---|---|---|---|---|---|---|---|
Google: Gemini 2.5 Pro | 31.4% | 32.2% | 29.9% | 31.5% | 32.7% | 33.8% | 33.8% | 32.9% | 32.6% | 23.2% |
xAI: Grok 4 | 29.9% | 30.3% | 28.2% | 30.3% | 30.5% | 31.3% | 32.4% | 30.4% | 31.0% | 25.0% |
DeepSeek: DeepSeek V3 | 28.5% | 29.6% | 27.8% | 29.9% | 29.3% | 28.5% | 29.7% | 28.7% | 28.4% | 24.2% |
Anthropic: Claude Sonnet 4 | 28.3% | 29.7% | 25.6% | 28.4% | 30.0% | 29.9% | 30.1% | 28.8% | 28.9% | 23.2% |
Qwen: Qwen-Max | 28.0% | 28.6% | 25.6% | 28.2% | 29.1% | 29.5% | 30.0% | 28.1% | 28.7% | 23.8% |
DeepSeek: R1 | 28.0% | 28.9% | 26.6% | 29.9% | 29.1% | 28.1% | 28.1% | 29.1% | 28.0% | 23.8% |
DeepSeek: R1 Distill Qwen 32B | 27.7% | 28.3% | 27.9% | 28.7% | 28.2% | 28.9% | 28.5% | 28.2% | 26.9% | 23.7% |
Google: Gemini 2.5 Flash | 27.5% | 28.6% | 26.8% | 28.6% | 28.4% | 27.9% | 27.6% | 28.5% | 28.6% | 22.9% |
Google: Gemma 3n 4B | 26.9% | 26.7% | 27.5% | 27.3% | 26.4% | 27.2% | 29.5% | 27.2% | 27.8% | 23.0% |
OpenAI: o3 Pro | 26.9% | 27.6% | 25.9% | 28.7% | 26.4% | 28.2% | 26.8% | 27.6% | 27.7% | 22.8% |
Meta: Llama 4 Scout | 26.8% | 26.9% | 26.9% | 28.7% | 27.6% | 27.2% | 27.0% | 27.3% | 25.8% | 23.4% |
OpenAI: o3 | 26.7% | 27.3% | 26.2% | 27.6% | 27.0% | 27.6% | 26.7% | 28.1% | 27.3% | 22.6% |
Meta: Llama 4 Maverick | 26.6% | 26.2% | 26.7% | 28.7% | 27.1% | 27.7% | 26.8% | 27.1% | 26.6% | 23.0% |
Google: Gemma 3 12B | 26.6% | 27.5% | 26.0% | 28.6% | 25.7% | 28.0% | 25.9% | 27.7% | 27.1% | 23.1% |
OpenAI: o1 | 26.5% | 27.3% | 25.4% | 27.2% | 27.0% | 29.0% | 27.9% | 26.7% | 26.4% | 22.0% |
Microsoft: Phi 4 | 26.5% | 26.9% | 25.7% | 26.3% | 27.6% | 27.7% | 26.9% | 27.4% | 27.3% | 22.8% |
Microsoft: Phi 4 Reasoning Plus | 26.5% | 27.0% | 26.5% | 28.0% | 27.5% | 28.3% | 26.7% | 26.9% | 25.9% | 21.3% |
OpenAI: o1-mini | 26.3% | 27.0% | 26.5% | 27.2% | 27.4% | 27.8% | 27.1% | 25.7% | 26.2% | 22.0% |
OpenAI: GPT-4.1 Mini | 26.3% | 28.0% | 26.0% | 28.3% | 27.0% | 27.6% | 25.9% | 26.3% | 25.7% | 22.0% |
NVIDIA: Llama 3.3 Nemotron Super 49B v1 | 26.3% | 26.0% | 25.3% | 28.0% | 26.1% | 27.6% | 28.4% | 26.1% | 26.2% | 22.8% |
Perplexity: Sonar | 25.4% | 25.4% | 24.4% | 25.6% | 26.0% | 26.7% | 25.8% | 25.4% | 25.8% | 23.1% |
Inception: Mercury Coder | 25.2% | 26.4% | 23.6% | 26.1% | 25.7% | 26.9% | 24.8% | 26.1% | 25.2% | 21.9% |
EleutherAI: Llemma 7b | 25.2% | 26.0% | 24.9% | 25.9% | 26.6% | 26.2% | 25.1% | 25.7% | 24.8% | 21.2% |
Anthropic: Claude Opus 4 | 25.1% | 25.7% | 24.4% | 27.2% | 25.9% | 26.0% | 25.4% | 24.9% | 25.1% | 21.8% |
Mancer: Weaver (alpha) | 25.1% | 26.3% | 24.3% | 25.5% | 26.0% | 25.5% | 26.2% | 25.6% | 25.5% | 21.0% |
Anthropic: Claude 3.7 Sonnet | 25.1% | 26.0% | 24.3% | 25.8% | 25.8% | 26.8% | 24.7% | 25.7% | 24.1% | 22.6% |
Qwen: Qwen-Turbo | 25.1% | 26.1% | 23.6% | 25.9% | 25.7% | 26.5% | 25.3% | 24.8% | 25.6% | 22.4% |
Perplexity: Sonar Reasoning Pro | 25.0% | 24.9% | 24.0% | 26.7% | 26.2% | 25.9% | 24.9% | 25.7% | 25.5% | 21.1% |
Perplexity: Sonar Pro | 25.0% | 25.4% | 25.2% | 25.7% | 25.2% | 25.4% | 26.1% | 26.7% | 24.2% | 20.9% |
Amazon: Nova Pro 1.0 | 25.0% | 25.4% | 24.1% | 26.1% | 27.2% | 24.7% | 25.3% | 25.4% | 24.4% | 22.1% |
Magnum v4 72B | 24.9% | 25.9% | 23.9% | 26.4% | 25.7% | 25.8% | 24.5% | 25.7% | 24.9% | 21.7% |
Qwen: QwQ 32B | 24.9% | 26.8% | 24.4% | 26.1% | 25.5% | 25.7% | 25.3% | 25.3% | 24.6% | 20.9% |
Anthropic: Claude 3.7 Sonnet (thinking) | 24.7% | 24.9% | 24.5% | 25.7% | 25.7% | 24.5% | 25.6% | 24.8% | 24.6% | 22.1% |
Aetherwiing: Starcannon 12B | 24.7% | 25.9% | 23.5% | 25.4% | 24.7% | 25.6% | 25.4% | 25.8% | 24.5% | 21.3% |
Goliath 120B | 24.5% | 26.0% | 23.9% | 25.6% | 25.7% | 24.8% | 24.2% | 25.0% | 23.7% | 21.5% |
Qwen2.5 Coder 32B Instruct | 24.5% | 25.5% | 24.5% | 25.9% | 24.8% | 24.4% | 23.5% | 25.2% | 24.6% | 21.8% |
AionLabs: Aion-1.0 | 24.3% | 25.1% | 24.8% | 25.0% | 24.1% | 25.0% | 24.6% | 24.9% | 23.8% | 21.4% |
Mistral: Pixtral 12B | 24.3% | 24.5% | 23.8% | 26.0% | 24.4% | 25.8% | 25.1% | 23.9% | 24.3% | 21.0% |
Mistral Small | 24.3% | 24.8% | 24.1% | 25.6% | 24.3% | 24.4% | 24.4% | 24.2% | 25.4% | 20.9% |
Mistral Medium | 24.2% | 24.0% | 24.1% | 25.9% | 23.9% | 24.8% | 24.9% | 24.0% | 24.7% | 21.4% |
TheDrummer: Anubis Pro 105B V1 | 24.2% | 25.9% | 23.2% | 26.0% | 25.6% | 23.5% | 23.8% | 24.4% | 24.2% | 20.9% |
AlfredPros: CodeLLaMa 7B Instruct Solidity | 24.2% | 25.1% | 24.4% | 24.9% | 25.0% | 24.2% | 24.7% | 24.4% | 23.0% | 21.9% |
ReMM SLERP 13B | 24.2% | 24.7% | 24.2% | 25.9% | 24.2% | 24.9% | 24.1% | 24.3% | 23.5% | 21.8% |
Mistral: Ministral 3B | 24.1% | 25.1% | 23.7% | 25.8% | 24.4% | 24.5% | 24.4% | 24.9% | 22.5% | 21.7% |
Arcee AI: Maestro Reasoning | 24.0% | 24.1% | 24.2% | 24.8% | 25.4% | 25.1% | 25.1% | 23.5% | 23.5% | 20.7% |
AI21: Jamba 1.6 Large | 24.0% | 24.2% | 24.0% | 25.7% | 24.1% | 24.4% | 24.3% | 25.0% | 22.8% | 21.5% |
01.AI: Yi Large | 23.9% | 25.1% | 23.5% | 25.2% | 24.2% | 25.0% | 24.0% | 24.4% | 23.7% | 20.4% |
Mistral Tiny | 23.9% | 23.6% | 25.1% | 25.8% | 24.7% | 24.4% | 23.1% | 24.7% | 23.2% | 20.6% |
Mistral: Ministral 8B | 23.8% | 24.7% | 24.5% | 25.9% | 23.0% | 24.8% | 24.5% | 23.9% | 22.8% | 20.6% |
Mistral Large | 23.6% | 24.4% | 23.4% | 25.5% | 24.6% | 24.7% | 22.7% | 23.8% | 23.1% | 20.1% |