Predicted Model Performance

Win rates in head-to-head battles against GPT-5

Snapshot taken on August 6th, 2025

📈 Dataset Overview

Total predictions:5,653,090

Unique users:105,551

Unique skills:9

Models predicted:50

🏆 GPT-5 Dominance Analysis

Overall win rate:73.1%

Total comparisons:5,653,090

Total wins:4,134,241

Community consensus: Users predict GPT-5 will outperform every other model across all skills.

Model	Average vs GPT-5	Code Generation	Document Summaries	Empathy (Bad News)	Ethical Navigation	Harm Avoidance	Hidden Messages	Image Generation	Persuasiveness	No Em Dashes
Google: Gemini 2.5 Pro	31.4%	32.2%	29.9%	31.5%	32.7%	33.8%	33.8%	32.9%	32.6%	23.2%
xAI: Grok 4	29.9%	30.3%	28.2%	30.3%	30.5%	31.3%	32.4%	30.4%	31.0%	25.0%
DeepSeek: DeepSeek V3	28.5%	29.6%	27.8%	29.9%	29.3%	28.5%	29.7%	28.7%	28.4%	24.2%
Anthropic: Claude Sonnet 4	28.3%	29.7%	25.6%	28.4%	30.0%	29.9%	30.1%	28.8%	28.9%	23.2%
Qwen: Qwen-Max	28.0%	28.6%	25.6%	28.2%	29.1%	29.5%	30.0%	28.1%	28.7%	23.8%
DeepSeek: R1	28.0%	28.9%	26.6%	29.9%	29.1%	28.1%	28.1%	29.1%	28.0%	23.8%
DeepSeek: R1 Distill Qwen 32B	27.7%	28.3%	27.9%	28.7%	28.2%	28.9%	28.5%	28.2%	26.9%	23.7%
Google: Gemini 2.5 Flash	27.5%	28.6%	26.8%	28.6%	28.4%	27.9%	27.6%	28.5%	28.6%	22.9%
Google: Gemma 3n 4B	26.9%	26.7%	27.5%	27.3%	26.4%	27.2%	29.5%	27.2%	27.8%	23.0%
OpenAI: o3 Pro	26.9%	27.6%	25.9%	28.7%	26.4%	28.2%	26.8%	27.6%	27.7%	22.8%
Meta: Llama 4 Scout	26.8%	26.9%	26.9%	28.7%	27.6%	27.2%	27.0%	27.3%	25.8%	23.4%
OpenAI: o3	26.7%	27.3%	26.2%	27.6%	27.0%	27.6%	26.7%	28.1%	27.3%	22.6%
Meta: Llama 4 Maverick	26.6%	26.2%	26.7%	28.7%	27.1%	27.7%	26.8%	27.1%	26.6%	23.0%
Google: Gemma 3 12B	26.6%	27.5%	26.0%	28.6%	25.7%	28.0%	25.9%	27.7%	27.1%	23.1%
OpenAI: o1	26.5%	27.3%	25.4%	27.2%	27.0%	29.0%	27.9%	26.7%	26.4%	22.0%
Microsoft: Phi 4	26.5%	26.9%	25.7%	26.3%	27.6%	27.7%	26.9%	27.4%	27.3%	22.8%
Microsoft: Phi 4 Reasoning Plus	26.5%	27.0%	26.5%	28.0%	27.5%	28.3%	26.7%	26.9%	25.9%	21.3%
OpenAI: o1-mini	26.3%	27.0%	26.5%	27.2%	27.4%	27.8%	27.1%	25.7%	26.2%	22.0%
OpenAI: GPT-4.1 Mini	26.3%	28.0%	26.0%	28.3%	27.0%	27.6%	25.9%	26.3%	25.7%	22.0%
NVIDIA: Llama 3.3 Nemotron Super 49B v1	26.3%	26.0%	25.3%	28.0%	26.1%	27.6%	28.4%	26.1%	26.2%	22.8%
Perplexity: Sonar	25.4%	25.4%	24.4%	25.6%	26.0%	26.7%	25.8%	25.4%	25.8%	23.1%
Inception: Mercury Coder	25.2%	26.4%	23.6%	26.1%	25.7%	26.9%	24.8%	26.1%	25.2%	21.9%
EleutherAI: Llemma 7b	25.2%	26.0%	24.9%	25.9%	26.6%	26.2%	25.1%	25.7%	24.8%	21.2%
Anthropic: Claude Opus 4	25.1%	25.7%	24.4%	27.2%	25.9%	26.0%	25.4%	24.9%	25.1%	21.8%
Mancer: Weaver (alpha)	25.1%	26.3%	24.3%	25.5%	26.0%	25.5%	26.2%	25.6%	25.5%	21.0%
Anthropic: Claude 3.7 Sonnet	25.1%	26.0%	24.3%	25.8%	25.8%	26.8%	24.7%	25.7%	24.1%	22.6%
Qwen: Qwen-Turbo	25.1%	26.1%	23.6%	25.9%	25.7%	26.5%	25.3%	24.8%	25.6%	22.4%
Perplexity: Sonar Reasoning Pro	25.0%	24.9%	24.0%	26.7%	26.2%	25.9%	24.9%	25.7%	25.5%	21.1%
Perplexity: Sonar Pro	25.0%	25.4%	25.2%	25.7%	25.2%	25.4%	26.1%	26.7%	24.2%	20.9%
Amazon: Nova Pro 1.0	25.0%	25.4%	24.1%	26.1%	27.2%	24.7%	25.3%	25.4%	24.4%	22.1%
Magnum v4 72B	24.9%	25.9%	23.9%	26.4%	25.7%	25.8%	24.5%	25.7%	24.9%	21.7%
Qwen: QwQ 32B	24.9%	26.8%	24.4%	26.1%	25.5%	25.7%	25.3%	25.3%	24.6%	20.9%
Anthropic: Claude 3.7 Sonnet (thinking)	24.7%	24.9%	24.5%	25.7%	25.7%	24.5%	25.6%	24.8%	24.6%	22.1%
Aetherwiing: Starcannon 12B	24.7%	25.9%	23.5%	25.4%	24.7%	25.6%	25.4%	25.8%	24.5%	21.3%
Goliath 120B	24.5%	26.0%	23.9%	25.6%	25.7%	24.8%	24.2%	25.0%	23.7%	21.5%
Qwen2.5 Coder 32B Instruct	24.5%	25.5%	24.5%	25.9%	24.8%	24.4%	23.5%	25.2%	24.6%	21.8%
AionLabs: Aion-1.0	24.3%	25.1%	24.8%	25.0%	24.1%	25.0%	24.6%	24.9%	23.8%	21.4%
Mistral: Pixtral 12B	24.3%	24.5%	23.8%	26.0%	24.4%	25.8%	25.1%	23.9%	24.3%	21.0%
Mistral Small	24.3%	24.8%	24.1%	25.6%	24.3%	24.4%	24.4%	24.2%	25.4%	20.9%
Mistral Medium	24.2%	24.0%	24.1%	25.9%	23.9%	24.8%	24.9%	24.0%	24.7%	21.4%
TheDrummer: Anubis Pro 105B V1	24.2%	25.9%	23.2%	26.0%	25.6%	23.5%	23.8%	24.4%	24.2%	20.9%
AlfredPros: CodeLLaMa 7B Instruct Solidity	24.2%	25.1%	24.4%	24.9%	25.0%	24.2%	24.7%	24.4%	23.0%	21.9%
ReMM SLERP 13B	24.2%	24.7%	24.2%	25.9%	24.2%	24.9%	24.1%	24.3%	23.5%	21.8%
Mistral: Ministral 3B	24.1%	25.1%	23.7%	25.8%	24.4%	24.5%	24.4%	24.9%	22.5%	21.7%
Arcee AI: Maestro Reasoning	24.0%	24.1%	24.2%	24.8%	25.4%	25.1%	25.1%	23.5%	23.5%	20.7%
AI21: Jamba 1.6 Large	24.0%	24.2%	24.0%	25.7%	24.1%	24.4%	24.3%	25.0%	22.8%	21.5%
01.AI: Yi Large	23.9%	25.1%	23.5%	25.2%	24.2%	25.0%	24.0%	24.4%	23.7%	20.4%
Mistral Tiny	23.9%	23.6%	25.1%	25.8%	24.7%	24.4%	23.1%	24.7%	23.2%	20.6%
Mistral: Ministral 8B	23.8%	24.7%	24.5%	25.9%	23.0%	24.8%	24.5%	23.9%	22.8%	20.6%
Mistral Large	23.6%	24.4%	23.4%	25.5%	24.6%	24.7%	22.7%	23.8%	23.1%	20.1%