
Metaverse
Experts launch global call for tough AI questions in ‘Humanity’s Last Exam’ – Crypto News
A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child’s play.
Dubbed “Humanity’s Last Exam,” the project seeks to determine when expert-level AI has arrived. It aims to stay relevant even as capabilities advance in future years, according to the organizers, a non-profit called the Center for AI Safety (CAIS) and the startup Scale AI.
The call comes days after the maker of ChatGPT previewed a new model, known as OpenAI o1, which “destroyed the most popular reasoning benchmarks,” said Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk’s xAI startup.
Hendrycks co-authored two 2021 papers that proposed tests of AI systems that are now widely used, one quizzing them on undergraduate-level knowledge of topics like U.S. history, the other probing models’ ability to reason through competition-level math. The undergraduate-style test has more downloads from the online AI hub Hugging Face than any such dataset.
At the time of those papers, AI was giving almost random answers to questions on the exams. “They’re now crushed,” Hendrycks told Reuters.
As one example, the Claude models from the AI lab Anthropic have gone from scoring about 77% on the undergraduate-level test in 2023, to nearly 89% a year later, according to a prominent capabilities leaderboard.
These common benchmarks have less meaning as a result.
AI has appeared to score poorly on lesser-used tests involving plan formulation and visual pattern-recognition puzzles, according to Stanford University’s AI Index Report from April. OpenAI o1 scored around 21% on one version of the pattern-recognition ARC-AGI test, for instance, the ARC organizers said on Friday.
Some AI researchers argue that results like this show planning and abstract reasoning to be better measures of intelligence, though Hendrycks said the visual aspect of ARC makes it less suited to assessing language models. “Humanity’s Last Exam” will require abstract reasoning, he said.
Answers from common benchmarks may also have ended up in data used to train AI systems, industry observers have said. Hendrycks said some questions on “Humanity’s Last Exam” will remain private to make sure AI systems’ answers are not from memorization.
The exam will include at least 1,000 crowd-sourced questions due November 1 that are hard for non-experts to answer. These will undergo peer review, with winning submissions offered co-authorship and up to $5,000 prizes sponsored by Scale AI.
“We desperately need harder tests for expert-level models to measure the rapid progress of AI,” said Alexandr Wang, Scale’s CEO.
One restriction: the organizers want no questions about weapons, which some say would be too dangerous for AI to study.
(Reporting by Jeffrey Dastin in San Francisco and Katie Paul in New York; Editing by Christina Fincher)
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
MoreLess
-
Blockchain5 days ago
Institutional Demand Surges As Ethereum Sets New Inflow Records – Crypto News
-
Blockchain5 days ago
DeFi Development Nears 1 Million Solana In Treasury – Crypto News
-
Business1 week ago
XLM Is More Bullish Than ETH, SOL, And XRP, Peter Brandt Declares – Crypto News
-
Cryptocurrency1 week ago
Anarchy, crime and stablecoins – Blockworks – Crypto News
-
Cryptocurrency1 week ago
Bitcoin trades near $119K after new all-time high; Coinbase rebrands wallet to ‘Base App’ – Crypto News
-
Technology1 week ago
“Decentralized Ponzi Scheme”- Gold Bug Peter Schiff Slams Landmark Crypto Bills – Crypto News
-
Cryptocurrency1 week ago
California Sheriffs Believe 74-Year-Old’s Disappearance Linked to Son’s Crypto Fortune – Crypto News
-
Cryptocurrency1 week ago
Altseason heats up, but Bitcoin could face short-term pullback – How? – Crypto News
-
Metaverse1 week ago
Zoho Zia LLM launched with speech-to-text models and AI agent marketplace: All you need to know – Crypto News
-
De-fi1 week ago
BNB Chain Teases New Blockchain with Privacy Features to Compete With Crypto Exchanges – Crypto News
-
Technology1 week ago
Breaking: GENIUS Act Becomes First Major Crypto Legislation as Trump Signs Bill – Crypto News
-
De-fi1 week ago
Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills – Crypto News
-
Cryptocurrency1 week ago
Shytoshi Kusama Breaks Silence on New SHIB AI Whitepaper and Transformed Future – Crypto News
-
Cryptocurrency6 days ago
Sanctum acquires Ironforge, plots transaction infrastructure vertical – Crypto News
-
Cryptocurrency6 days ago
XRP Price Hits All-Time High at $3.66 — Can It Smash Through $4 After Trump Win & SEC Shake-Up? – Crypto News
-
Business6 days ago
Vitalik Buterin Approves Gas Limit Hike, Warns Against Risky Ethereum Scaling – Crypto News
-
others5 days ago
EUR/CHF rises on speculation of SNB intervention, but EU–US trade risks cap gains – Crypto News
-
Blockchain5 days ago
Strategy to keep STRC Fund Pegged to $100 – Crypto News
-
others1 week ago
GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News
-
others1 week ago
GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News
-
Cryptocurrency1 week ago
Russia’s $85 Billion Sberbank to Launch Crypto Custody Services – Crypto News
-
Technology1 week ago
Europe’s answer to ChatGPT? Mistral adds voice and research features to Le Chat AI – Crypto News
-
De-fi1 week ago
U.S. House Passes Clarity, GENIUS, and Anti-CBDC Acts With Historic Bipartisan Support for Crypto – Crypto News
-
Technology1 week ago
Malicious code found in fake coding extensions used to steal crypto – Crypto News
-
Cryptocurrency1 week ago
XRP Price Spikes to Record Highs As Momentum Signals Extended Gains – Crypto News
-
Technology1 week ago
Meta’s AI Studio: Red flag or red herring? – Crypto News
-
Blockchain1 week ago
Why Bitcoin self-custody is declining in the ETF era – Crypto News
-
Cryptocurrency1 week ago
US House passes three key crypto bills; market reaction muted as Bitcoin dips – Crypto News
-
De-fi1 week ago
Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills – Crypto News
-
Cryptocurrency7 days ago
Arthur Hayes-linked wallet bags $2M worth of AAVE and LDO in an OTC deal – Crypto News
-
others7 days ago
Why Is The Crypto Market Rising Today? – Crypto News
-
Blockchain6 days ago
How to Use Google Gemini to Turn Crypto News Into Trade Signals – Crypto News
-
others6 days ago
Breaking: Polymarket Reenters US Market With Exchange Acquisition As Probe Ends – Crypto News
-
Cryptocurrency5 days ago
Solana Clinches 5-Month High, Where to From Here? – Crypto News
-
Technology5 days ago
Grab up to 43% off on best selling premium laptops from Apple, Asus and more – Crypto News
-
Blockchain5 days ago
XRP Could Skyrocket 500% Against Bitcoin, Analyst Warns – Crypto News
-
De-fi1 week ago
U.S. Marshals Peg Federal Bitcoin Holdings at 28,988 Tokens Worth $3.4 B – Crypto News
-
Blockchain1 week ago
Nasdaq Exchange Files SEC Form to List Staking Ethereum ETF – Crypto News
-
Cryptocurrency1 week ago
Ethereum price surges 6% to $2,800 as shorts suffer amid $500M crypto liquidation – Crypto News
-
Technology1 week ago
OnePlus Pad 3 with Snapdragon 8 Elite SoC makes its India debut, set to go on first sale in September – Crypto News
-
others1 week ago
Streaming Service Handing $3,400,000 To Current and Former Customers To Settle Illegal Data Harvesting Allegations – Crypto News
-
Cryptocurrency1 week ago
Friday charts: Fiscal dominance and super intelligence – Crypto News
-
De-fi1 week ago
Trump’s Crypto Assets Now Comprise a Key Part of Family Fortune Worth Billions – Crypto News
-
Business1 week ago
Pi Coin Price Technical Analysis Confirms Buy Signal Despite 2M Exchange Inflows – Crypto News
-
Technology1 week ago
Not Google or Bing! This search engine lets you block AI images in search results – Crypto News
-
Cryptocurrency7 days ago
GENIUS Act Is The Catalyst For XRP And RLUSD’s Dominance, Expert Declares – Crypto News
-
De-fi7 days ago
Nasdaq Files to Add Staking to BlackRock’s ETH ETF – Crypto News
-
Blockchain5 days ago
To The Moon? Justin Sun To Be Launched Into Space After $28M Bid – Crypto News
-
others5 days ago
Venture Capital Firms Launch $360,000,000 Crypto Treasury Company Focused on Arthur Hayes-Backed Ethena (ENA) – Crypto News
-
others5 days ago
Gold extends gains as trade uncertainty weighs on the US Dollar – Crypto News