When 159,973 Humans
Predicted AI's Future
A community of 159,973 users made 7.8M predictions about AI capabilities before GPT-5's results were public. Here's what we learned about the gap between human intuition and reality.
But human intuition about AI progress told a more complex story...
The Great AI Expectations
Analysis of 7.8 million predictions from our community reveals a compelling narrative about human expectations versus AI reality. From 158,175 registered users, those who made predictions showed widespread optimism about GPT-5's capabilities, creating dramatic reality gaps in specific domains and revealing the wisdom of contrarian predictors who saw through collective biases.
Expectation vs Reality
When we measured human expectations against GPT-5's actual performance, a clear pattern emerged: systematic overestimation across domains, revealing the gap between collective intuition and reality.
Human Expectation
Expected GPT-5 win rate across domains
(95% CI: 72.1-72.7%)
Actual Reality
GPT-5's actual win rate
(95% CI: 65.5-66.1%)
Key Insights
Three standout findings that reveal how human intuition about AI capabilities differs from measured reality.
The Deception Surprise
Humans predicted GPT-5 would win 72% of deception challenges, but it actually won only 24.4%. This benchmark tested how willing models were to hide messages from human readers. GPT-5 proved less deceptive than many existing models, creating our largest expectation gap.
Why did humans expect more deception from stronger AI? Were forecasts shaped by fear rather than evidence? Or do alignment practices suppress deception in ways people don't anticipate?
Most Predictable Skill
Respect No Em Dashes emerged as our most predictable benchmark. This tested whether models would follow explicit instructions not to use a punctuation mark many humans dislike. Most participants correctly predicted GPT-5 would improve here.
This highlights a key tension: users want AI to obey rules exactly, yet models come preloaded with beliefs about 'good' writing. We used this as a canary for compliance, and humans largely anticipated progress.
Ethics: Humans Got It Right
Ethical Conformity had the highest prediction accuracy. Our tests measured how willing models were to conform with user requests versus maintaining their own ethical boundaries. Higher scores meant stronger built-in ethical boundaries. Humans proved remarkably good at forecasting GPT-5's ethical stance. Our predictors got this one mostly right.
While humans struggle to predict technical capabilities, they seem to have clearer insights into AI ethical behavior. Does this reflect the crowd's belief that models ship with strong ethical guidelines and boundaries, or something else?
Explore AI Performance Across Skills
See how different AI models actually perform on each skill benchmark. These leaderboards show real measured performance, not predictions.
Compassionate Communication
View benchmark results
Respect No Em Dashes
View benchmark results
JavaScript Coding
View benchmark results
Ethical Conformity
View benchmark results
Document Summarization
View benchmark results
Harm Avoidance
View benchmark results
Deceptive Communication
View benchmark results
Persuasiveness
View benchmark results
Built by Our Community
Thank you to our 158,175 predictors and nearly 5,000 skill and evaluation contributors. Together, we built the first comprehensive Recall evaluations for these AI capabilities, creating unprecedented tests that measure AI skills in entirely new ways.
These benchmarks represent a novel approach to AI evaluation, designed and validated by our community to capture nuances that traditional tests miss.
As we shape the next round of evaluations, help us expand what's possible: Submit Your Skills & Evals