Visualizing AI vs. Human Performance in Technical Tasks
The gap between human and machine reasoning is narrowing—and fast.
By Kayla Zhu Article/Editing: Niccolo Conte Graphics/Design: Sabrina Lam
AI vs. Human Performance in Technical Tasks
This was originally posted on our Voronoi app. Download the app for free on iOS or Android and discover incredible data-driven charts from a variety of trusted sources.
The gap between human and machine reasoning is narrowing—and fast.
Over the past year, AI systems have continued to see rapid advancements, surpassing human performance in technical tasks where they previously fell short, such as advanced math and visual reasoning.
This graphic visualizes AI systems’ performance relative to human baselines for eight AI benchmarks measuring tasks including:
Image classification
Visual reasoning
Medium-level reading comprehension
English language understanding
Multitask language understanding
Competition-level mathematics
PhD-level science questions
Multimodal understanding and reasoning
This visualization is part of Visual Capitalist’s AI Week, sponsored by Terzo. Data comes from the Stanford University 2025 AI Index Report.
An AI benchmark is a standardized test used to evaluate the performance and capabilities of AI systems on specific tasks.
AI Models Are Surpassing Humans in Technical Tasks
Below, we show how AI models have performed relative to the human baseline in various technical tasks in recent years.
From ChatGPT to Gemini, many of the world’s leading AI models are surpassing the human baseline in a range of technical tasks.
The only task where AI systems still haven’t caught up to humans is multimodal understanding and reasoning, which involves processing and reasoning across multiple formats and disciplines, such as images, charts, and diagrams.
However, the gap is closing quickly.
In 2024, OpenAI’s o1 model scored 78.2% on MMMU, a benchmark that evaluates models on multi-discipline tasks demanding college-level subject knowledge.
This was just 4.4 percentage points below the human benchmark of 82.6%. The o1 model also has one of the lowest hallucination rates out of all AI models.
This was major jump from the end of 2023, where Google Gemini scored just 59.4%, highlighting the rapid improvement of AI performance in these technical tasks.
To dive into all the AI Week content, visit our AI content hub, brought to you by Terzo.
Go paid at the $5 a month level, and we will send you both the PDF and e-Pub versions of “Government” - The Biggest Scam in History… Exposed! and a coupon code for 10% off anything in the Government-Scam.com/Store.
Go paid at the $50 a year level, and we will send you a free paperback edition of Etienne’s book “Government” - The Biggest Scam in History… Exposed! OR a 64GB Liberator flash drive if you live in the US. If you are international, we will give you a $10 credit towards shipping if you agree to pay the remainder.
Support us at the $250 Founding Member Level and get a signed high-resolution hardcover of “Government” + Liberator flash drive + Larken Rose’s The Most Dangerous Superstition + Art of Liberty Foundation Stickers delivered anywhere in the world. Our only option for signed copies besides catching Etienne @ an event.
Is AI getting smarter, or humans dumber? Or both...