
Metaverse
Experts launch global call for tough AI questions in ‘Humanity’s Last Exam’ – Crypto News
A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child’s play.
Dubbed “Humanity’s Last Exam,” the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.
The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which “destroyed the most popular reasoning benchmarks,” said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk’s xAI startup.
Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like U.S. history, the other probing models’ ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.
At the time of those papers, AI was giving almost random answers to questions on the exams. “They’re now crushed,” Hendrycks told Reuters.
As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.
These common benchmarks have less meaning as a result.
AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.
Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. “Humanity’s Last Exam” will require abstract reasoning, he said.
Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on “Humanity’s Last Exam” will remain private to make sure AI systems’ answers are not from memorization.
The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.
“We desperately need harder tests for expert-level models to measure the rapid progress of AI,” said Alexandr Wang, Scale’s CEO.
One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.
(Reporting by Jeffrey Dastin in San Francisco and Katie Paul in New York; Editing by Christina Fincher)
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
MoreLess
-
others1 week ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
Blockchain7 days ago
XRP MVRV Ratio Dips Below The 200-Day MA – Trend Shift Underway? – Crypto News
-
others6 days ago
Bybit Shuts Down Its NFT Marketplace As Crypto Sector Struggles To Recover – Crypto News
-
Business1 week ago
Can Pi Coin Price Hit $1 Soon? – Crypto News
-
others1 week ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
others1 week ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
others7 days ago
XRP and Three Other Altcoins Could Witness Another Sell-Off Event, According to Crypto Strategist – Crypto News
-
Cryptocurrency6 days ago
Sony Singapore Now Lets Shoppers Pay in USDC Through Crypto.com – Crypto News
-
Blockchain6 days ago
Dogecoin Breaking These Levels Could Be The Catalyst For Next Bull Run, Analyst Says – Crypto News
-
Business1 week ago
XRP, BTC, ETH Price Prediction As Inflation Data Sparks Downturn in U.S. Stocks – Crypto News
-
Blockchain1 week ago
zkLend hacker claims losing stolen ETH to Tornado Cash phishing site – Crypto News
-
others1 week ago
Gold extends bullish trend amid rising trade tensions; fresh record high and counting – Crypto News
-
Blockchain1 week ago
SpaceX flight bankrolled by crypto investor launches first manned polar orbit – Crypto News
-
Blockchain1 week ago
Will Bitcoin Downtrend Continue? This Metric Suggests Yes – Crypto News
-
Blockchain1 week ago
Binance ends Tether USDT trading in Europe to comply with MiCA rules – Crypto News
-
Blockchain1 week ago
Binance ends Tether USDT trading in Europe to comply with MiCA rules – Crypto News
-
others1 week ago
Austria Unemployment fell from previous 347.4K to 316.3K in March – Crypto News
-
Blockchain7 days ago
Analyst Calls Dogecoin Chart A ‘Beauty’ As Key Indicators Align – Crypto News
-
Technology6 days ago
XRP Price Predicted to Reach $10 in April if US Congress Stablecoin Bill Promotes Ripple’s RLUSD – Crypto News
-
Business6 days ago
XRP Price Predicted to Reach $10 in April if US Congress Stablecoin Bill Promotes Ripple’s RLUSD – Crypto News
-
Technology6 days ago
Apple Intelligence debuts on Vision Pro with visionOS 2.4 update: AI-powered features, spatial content and more – Crypto News
-
Cryptocurrency6 days ago
US equities slip after job openings disappointment – Crypto News
-
others6 days ago
Will BNB Price Rally to ATH After VanEck BNB ETF Filing? – Crypto News
-
others6 days ago
Pound Sterling consolidates against US Dollar ahead of Trump’s tariffs announcement – Crypto News
-
Cryptocurrency6 days ago
FLOKI price poised for 20% rally, Here’s why – Crypto News
-
others6 days ago
PENDLE Price Jumps 8% Today Amid Huge Whale Accumulation – Crypto News
-
others6 days ago
Fundstrat’s Tom Lee Calls for Imminent Stock Market Reversal, Says US Has the ‘Right Pieces’ for a Bottom – Crypto News
-
Technology6 days ago
European regulators warn of financial risks from US crypto integration – Crypto News
-
Business1 week ago
Builder.ai Announces Third-Party Audit After Allegations – Crypto News
-
others1 week ago
Will the RBA hint at further interest rate hikes at its policy meeting? – Crypto News
-
Technology1 week ago
Over 60 pc broadband, fiber, DSL users surveyed flag problems with connection: LocalCircles poll – Crypto News
-
Cryptocurrency1 week ago
Tether Boosts Bitcoin Holdings By 8,888 BTC In Q1 2025 – Crypto News
-
Technology1 week ago
Bumper discounts in Amazon Gaming Fest! Up to 70% off on gaming laptops, monitors, vlog cameras and more – Crypto News
-
Blockchain1 week ago
Hayes Predicts $250,000 Bitcoin As Fed Caves To QE Pressure – Crypto News
-
Cryptocurrency7 days ago
XRP, DOGE Shoot up as BTC Price Reclaims $84K Level (Market Watch) – Crypto News
-
Technology7 days ago
Best washing machines under ₹10000 in April 2025 to boost your laundry routine without overspending – Crypto News
-
Blockchain6 days ago
Bitcoin traders are overstating the impact of the US-led tariff war on BTC price – Crypto News
-
Blockchain6 days ago
Bitcoin Price Bounces Back—Can It Finally Break Resistance? – Crypto News
-
Cryptocurrency6 days ago
US equities slip after job openings disappointment – Crypto News
-
Blockchain6 days ago
UK trade bodies ask government to make crypto a ‘strategic priority’ – Crypto News
-
Blockchain6 days ago
Several Altcoins Crash Up To 50% On Binance, What’s Going On? – Crypto News
-
Business6 days ago
What to Expect From XRP Price as Trump’s ‘Liberation Day’ Tariffs Go Into Effect Today – Crypto News
-
Technology6 days ago
Franklin Templeton Eyes Crypto ETP Launch In Europe After BlackRock & 21Shares – Crypto News
-
Technology1 week ago
ChatGPT finally allows free users to create Ghibli-style AI images: Check our step-by-step guide – Crypto News
-
Technology1 week ago
Elon Musk’s Grok AI calls him ‘top misinformation spreader’, sparks debate on ‘AI freedom’ – Crypto News
-
Cryptocurrency1 week ago
Is Bitcoin’s rebound near as key area rises? Assessing… – Crypto News
-
Cryptocurrency1 week ago
Will the $80K Support Level Hold BTC After Recent Rejection? – Crypto News
-
Metaverse1 week ago
Mint Primer | Resistance is futile: AI is now writing code – Crypto News
-
Metaverse1 week ago
The Tools of Tomorrow: What Lies Ahead with the AI Revolution – Crypto News
-
Business1 week ago
MicroStrategy Acquires 22,048 Bitcoin For $1.92 Billion – Crypto News