
Metaverse
Experts launch global call for tough AI questions in ‘Humanity’s Last Exam’ – Crypto News
A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child’s play.
Dubbed “Humanity’s Last Exam,” the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.
The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which “destroyed the most popular reasoning benchmarks,” said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk’s xAI startup.
Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like U.S. history, the other probing models’ ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.
At the time of those papers, AI was giving almost random answers to questions on the exams. “They’re now crushed,” Hendrycks told Reuters.
As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.
These common benchmarks have less meaning as a result.
AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.
Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. “Humanity’s Last Exam” will require abstract reasoning, he said.
Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on “Humanity’s Last Exam” will remain private to make sure AI systems’ answers are not from memorization.
The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.
“We desperately need harder tests for expert-level models to measure the rapid progress of AI,” said Alexandr Wang, Scale’s CEO.
One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.
(Reporting by Jeffrey Dastin in San Francisco and Katie Paul in New York; Editing by Christina Fincher)
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
MoreLess
-
Blockchain1 week ago
Change In US Crypto Laws May Affect Charges In Do Kwon’s Criminal Case – Crypto News
-
others1 week ago
Gold retreats while Fed Powell and President Trump clash over interest rates – Crypto News
-
Technology1 week ago
Branded smartwatches under ₹5000 for style and functionality: Top 10 picks for everyday wear – Crypto News
-
Blockchain1 week ago
Best Crypto to Buy as Polymarket Nears $1B Valuation – Crypto News
-
others1 week ago
Tariffs may be adding a quarter of a percentage point to inflation right now – Crypto News
-
Technology1 week ago
Best laptops under ₹40,000 (June 2025) with latest processors, SSD storage, and Windows 11 features, Top 10 picks – Crypto News
-
Technology1 week ago
Top 10 air coolers for monsoon: Handpicked products for effective cooling from trusted brands – Crypto News
-
Cryptocurrency1 week ago
SHIB Price Prediction for June 26 – Crypto News
-
others1 week ago
EUR/JPY steadies near 169.00 as traders await the next catalyst – Crypto News
-
Cryptocurrency6 days ago
Friday charts: Retail is one-upping Wall Street – Crypto News
-
Cryptocurrency1 week ago
Bitcoin rallies to $106K on Mideast ceasefire news; Circle shares continue explosive climb – Crypto News
-
Cryptocurrency1 week ago
What next for XRP after breaking above the $2.15 resistance? – Crypto News
-
Technology1 week ago
US judge rules Anthropic’s use of books for AI training is fair use: All you need to know – Crypto News
-
Blockchain1 week ago
Bitcoin Price Could Rally To $110,000 ATH As These Macroeconomic Factors Align – Crypto News
-
Blockchain1 week ago
Bearish Breakdown Meets Bullish Flag, Which Will Prevail? – Crypto News
-
Blockchain1 week ago
Bearish Breakdown Meets Bullish Flag, Which Will Prevail? – Crypto News
-
Blockchain1 week ago
Cutting Block Times To Boost Performance – Crypto News
-
others1 week ago
Bank of America, Netflix and Apple Customers Targeted by Widescale Google Search Scams: Report – Crypto News
-
Technology1 week ago
OpenAI and Jony Ive’s AI hardware ambitions hit roadblock over trademark dispute: Report – Crypto News
-
Technology1 week ago
Turkey plans stricter crypto rules to fight money laundering – Crypto News
-
others1 week ago
Winnebago Industries (WGO) tops Q3 earnings estimates – Crypto News
-
Cryptocurrency1 week ago
US Housing Chief Orders Fannie Mae, Freddie Mac to Prepare for Crypto Assessment in Mortgages – Crypto News
-
others7 days ago
AI-Focused Layer-1 Blockchain Altcoin SAHARA Flames Out Following New Binance Listing – Crypto News
-
Cryptocurrency6 days ago
TRON price forecast as USDT supply surpasses $80 billion – Crypto News
-
others1 week ago
Indian Rupee recovers as Oil falls post Iran strike, Fed dovish signals limit US Dollar strength – Crypto News
-
others1 week ago
US stocks downplay Iran retaliation concerns as indices edge higher – Crypto News
-
Blockchain1 week ago
Taker Buy Volume Spikes Sharply – Crypto News
-
Cryptocurrency1 week ago
Solana-based StarFun lets projects raise capital with crypto – Crypto News
-
De-fi1 week ago
Synaptogenix Acquires Bittensor’s TAO for AI Crypto Treasury – Crypto News
-
others1 week ago
Right now, we’re in watch and wait mode – Crypto News
-
De-fi1 week ago
Barclays to Ban Crypto Purchases via Credit Card – Crypto News
-
De-fi1 week ago
Sei Soars 70% as Wallet Growth and On-Chain Activity Hit New Highs – Crypto News
-
Technology1 week ago
Microsoft launches Mu AI model for smart local tasks on Windows PCs – Crypto News
-
De-fi1 week ago
Russia’s Central Bank Pushes CBDC Launch to 2026 – Crypto News
-
Cryptocurrency7 days ago
Wormhole price jumps 12% amid Ripple’s XRPL integration – Crypto News
-
Cryptocurrency6 days ago
Vodafone Share Price Tests 78p Ahead of July Earnings, Is a Breakout Imminent? – Crypto News
-
others6 days ago
USD/INR drops to two-week low as Rupee gains on weak US Dollar – Crypto News
-
De-fi1 week ago
With ETFs in Sight, Solana’s Latest Network Health Report Is Upbeat – Crypto News
-
others1 week ago
Bitcoin (BTC) and Ethereum (ETH) Lead $1,240,000,000 of Inflows to Crypto Products Despite Geopolitical Tensions: CoinShares – Crypto News
-
others1 week ago
German IFO Business Climate Index rises further to 88.4 in June vs. 88.3 expected – Crypto News
-
others1 week ago
Jerome Powell testifies Fed is well-positioned to wait to learn more about economy – Crypto News
-
Blockchain1 week ago
Aptos and Jump Crypto Launch Shelby, a Web3 Cloud Storage Platform – Crypto News
-
De-fi1 week ago
Dragonfly-Backed Codex Launches Blockchain for Stablecoins with Native USDC Support – Crypto News
-
Technology1 week ago
US judge rules Anthropic’s use of books for AI training is fair use: All you need to know – Crypto News
-
others1 week ago
United States 2-Year Note Auction fell from previous 3.955% to 3.786% – Crypto News
-
Blockchain1 week ago
Many Senators Absent From ‘Bipartisan’ Crypto Market Structure Hearing – Crypto News
-
Blockchain1 week ago
Bunker Buster: Ethereum Titans Stake $100 Million Amid US-Iran Hostilities – Crypto News
-
others1 week ago
Australian Dollar advances as US Dollar struggles following Israel-Iran ceasefire – Crypto News
-
Cryptocurrency1 week ago
BTC holds $106K; analysts point to institutional integration, on-chain innovation – Crypto News
-
Cryptocurrency1 week ago
XRP crashes 12.5% in TVL, ETF delay and war fears trigger selloff – Crypto News