A New AI Performance Test That Even the Best AI Models Can't Pass

Posted on January 24, 2025

Table of Contents

A new AI performance test has been made. It is told that it is so difficult that even the best of the best AI models aren’t able to get an ideal score. This test is called “Humanity’s Last Exam”. This is designed to give a tough time to all the Artificial Intelligence models in a very different way then before.

What is Humanity’s Last Exam?

Humanity’s Last Exam is made by the Center for AI Safety (CAIS). CIAS is a non-profit organisation which focuses on the AI safety. And another one which is Scale AI which has specialised in the AI technology. This latest test challenges the boundaries of the AI performance which offers over 3000 Questions across the different subjects i.e. humanities, science, maths etc.

The test isn’t just about answering questions—it’s about understanding complex problems. It even includes graphs, diagrams, and visuals to make things even trickier. It’s designed to be a real challenge for AI models, even the most advanced ones.

New AI Performance Test

Can AI Models Ace the Test?

The truth is, AI models are struggling to score 100% on Humanity’s Last Exam. Studies show that even the best AI models on the market have failed to get all the answers right. For instance, back in 2021, the earlier AI models could hardly score 10 out of 100 in the tests which focused the Mathematics.

This focuses on the intensity of the difficulty of test. It also tells that AI is still too far away from understanding and reasoning levels of the human beings.

Global Support for Humanity’s Last Exam

The development of this test took a lot. Over 1k professors and researchers from more than 50 countries made their contributions. Their backing shows how important this performance test is for the future of AI development. It could become a key benchmark for evaluating AI abilities and performance in the future.

Also Read: OpenAI Operator AI Tool: The Future of Computer Control

Source: Scale

What is the Humanity’s Last Exam for AI?

Humanity’s Last Exam is a new performance test designed to challenge AI models across various subjects, including math, science, and humanities. It includes over 3,000 questions with visuals to test AI's reasoning abilities.

Why can’t AI models pass Humanity’s Last Exam?

Despite being highly advanced, even the best AI models struggle to score 100% on the test. AI often fails in complex reasoning tasks and cannot fully grasp the nuances of some subjects.

How many questions are in Humanity’s Last Exam?

The test contains approximately 3,000 questions across multiple categories, including mathematics, science, and humanities, designed to challenge AI in different areas of knowledge.

Who developed the Humanity’s Last Exam?

The test was developed by the Center for AI Safety (CAIS), a nonprofit organization focused on AI safety, and Scale AI, a company specializing in AI technology.

How can I see sample questions from the test?

You can find sample questions from Humanity’s Last Exam online through a provided link, giving you a glimpse of the challenging questions designed for AI models.