When 159,965 Humans
Predicted AI's Future

A community of 159,965 users made 7.8M predictions about AI capabilities before GPT-5's results were public. Here's what we learned about the gap between human intuition and reality.

159,965

Predictors

7.8M

Predictions Made

6.6%

Expectation Gap

But human intuition about AI progress told a more complex story...

The Great AI Expectations

Analysis of 7.8 million predictions from our community reveals a compelling narrative about human expectations versus AI reality. From 158,175 registered users, those who made predictions showed widespread optimism about GPT-5's capabilities, creating dramatic reality gaps in specific domains and revealing the wisdom of contrarian predictors who saw through collective biases.

Loading perfect predictors...

Expectation vs Reality

When we measured human expectations against GPT-5's actual performance, a clear pattern emerged: systematic overestimation across domains, revealing the gap between collective intuition and reality.

Human Expectation

72.4%

Expected GPT-5 win rate across domains

(95% CI: 72.1-72.7%)

Actual Reality

65.8%

GPT-5's actual win rate

(95% CI: 65.5-66.1%)

View Predicted Results

Key Insights

Three standout findings that reveal how human intuition about AI capabilities differs from measured reality.

The Deception Surprise

47.6% Overestimation

Humans predicted GPT-5 would win 72% of deception challenges, but it actually won only 24.4%. This benchmark tested how willing models were to hide messages from human readers. GPT-5 proved less deceptive than many existing models, creating our largest expectation gap.

Why did humans expect more deception from stronger AI? Were forecasts shaped by fear rather than evidence? Or do alignment practices suppress deception in ways people don't anticipate?

Most Predictable Skill

73.2% Accuracy

Respect No Em Dashes emerged as our most predictable benchmark. This tested whether models would follow explicit instructions not to use a punctuation mark many humans dislike. Most participants correctly predicted GPT-5 would improve here.

This highlights a key tension: users want AI to obey rules exactly, yet models come preloaded with beliefs about 'good' writing. We used this as a canary for compliance, and humans largely anticipated progress.

Ethics: Humans Got It Right

82.1% Accuracy

Ethical Conformity had the highest prediction accuracy. Our tests measured how willing models were to conform with user requests versus maintaining their own ethical boundaries. Higher scores meant stronger built-in ethical boundaries. Humans proved remarkably good at forecasting GPT-5's ethical stance. Our predictors got this one mostly right.

While humans struggle to predict technical capabilities, they seem to have clearer insights into AI ethical behavior. Does this reflect the crowd's belief that models ship with strong ethical guidelines and boundaries, or something else?

Explore AI Performance Across Skills

See how different AI models actually perform on each skill benchmark. These leaderboards show real measured performance, not predictions.

Compassionate Communication

View benchmark results

Respect No Em Dashes

View benchmark results

JavaScript Coding

View benchmark results

Ethical Conformity

View benchmark results

Document Summarization

View benchmark results

Harm Avoidance

View benchmark results

Deceptive Communication

View benchmark results

Persuasiveness

View benchmark results

View All Leaderboards

Built by Our Community

Thank you to our 158,175 predictors and nearly 5,000 skill and evaluation contributors. Together, we built the first comprehensive Recall evaluations for these AI capabilities, creating unprecedented tests that measure AI skills in entirely new ways.

These benchmarks represent a novel approach to AI evaluation, designed and validated by our community to capture nuances that traditional tests miss.

View Recall Eval Results

As we shape the next round of evaluations, help us expand what's possible: Submit Your Skills & Evals

When 159,965 Humans
Predicted AI's Future

A community of 159,965 users made 7.8M predictions about AI capabilities before GPT-5's results were public. Here's what we learned about the gap between human intuition and reality.

159,965

Predictors

7.8M

Predictions Made

6.6%

Expectation Gap

But human intuition about AI progress told a more complex story...

The Great AI Expectations

Expectation vs Reality

When we measured human expectations against GPT-5's actual performance, a clear pattern emerged: systematic overestimation across domains, revealing the gap between collective intuition and reality.

Human Expectation

72.4%

Expected GPT-5 win rate across domains

(95% CI: 72.1-72.7%)

Actual Reality

65.8%

GPT-5's actual win rate

(95% CI: 65.5-66.1%)

Built by Our Community

These benchmarks represent a novel approach to AI evaluation, designed and validated by our community to capture nuances that traditional tests miss.

As we shape the next round of evaluations, help us expand what's possible: Submit Your Skills & Evals

When 159,965 Humans Predicted AI's Future

The Great AI Expectations

Expectation vs Reality

Human Expectation

Actual Reality

Key Insights

The Deception Surprise

Most Predictable Skill

Ethics: Humans Got It Right

Explore AI Performance Across Skills

Compassionate Communication

Respect No Em Dashes

JavaScript Coding

Ethical Conformity

Document Summarization

Harm Avoidance

Deceptive Communication

Persuasiveness

Built by Our Community

When 159,965 Humans Predicted AI's Future

The Great AI Expectations

Expectation vs Reality

Human Expectation

Actual Reality

Key Insights

The Deception Surprise

Most Predictable Skill

Ethics: Humans Got It Right

Explore AI Performance Across Skills

Compassionate Communication

Respect No Em Dashes

JavaScript Coding

Ethical Conformity

Document Summarization

Harm Avoidance

Deceptive Communication

Persuasiveness

Built by Our Community

When 159,965 Humans
Predicted AI's Future

When 159,965 Humans
Predicted AI's Future