Metaverse
Forget DeepSeek. Large language models are getting cheaper still – Crypto News
As recently as 2022, just building a large language model (LLM) was a feat at the cutting edge of artificial-intelligence (AI) engineering. Three years on, experts are harder to impress. To really stand out in the crowded marketplace, an AI lab needs not just to build a high-quality model, but to build it cheaply.
In December a Chinese firm, DeepSeek, earned itself headlines for cutting the dollar cost of training a frontier model down from $61.6m (the cost of Llama 3.1, an LLM produced by Meta, a technology company) to just $6m. In a preprint posted online in February, researchers at Stanford University and the University of Washington claim to have gone several orders of magnitude better, training their s1 LLM for just $6. Phrased another way, DeepSeek took 2.7m hours of computer time to train; s1 took just under seven hours.
The figures are eye-popping, but the comparison is not exactly like-for-like. Where DeepSeek’s v3 chatbot was trained from scratch—accusations of data theft from OpenAI, an American competitor, and peers notwithstanding—s1 is instead “fine-tuned” on the pre-existing Qwen2.5 LLM, produced by Alibaba, China’s other top-tier AI lab. Before s1’s training began, in other words, the model could already write, ask questions, and produce code.
Piggybacking of this kind can lead to savings, but can’t cut costs down to single digits on its own. To do that, the American team had to break free of the dominant paradigm in AI research, wherein the amount of data and computing power available to train a language model is thought to improve its performance. They instead hypothesised that a smaller amount of data, of high enough quality, could do the job just as well. To test that proposition, they gathered a selection of 59,000 questions covering everything from standardised English tests to graduate-level problems in probability, with the intention of narrowing them down to the most effective training set possible.
To work out how to do that, the questions on their own aren’t enough. Answers are needed, too. So the team asked another AI model, Google’s Gemini, to tackle the questions using what is known as a reasoning approach, in which the model’s “thought process” is shared alongside the answer. That gave them three datasets to use to train s1: 59,000 questions; the accompanying answers; and the “chains of thought” used to connect the two.
They then threw almost all of it away. As s1 was based on Alibaba’s Qwen AI, anything that model could already solve was unnecessary. Anything poorly formatted was also tossed, as was anything that Google’s model had solved without needing to think too hard. If a given problem didn’t add to the overall diversity of the training set, it was out too. The end result was a streamlined 1,000 questions that the researchers proved could train a model just as high-performing as one trained on all 59,000—and for a fraction of the cost.
Such tricks abound. Like all reasoning models, s1 “thinks” before answering, working through the problem before announcing it has finished and presenting a final answer. But lots of reasoning models give better answers if they’re allowed to think for longer, an approach called “test-time compute”. And so the researchers hit upon the simplest possible approach to get the model to carry on reasoning: when it announces that it has finished thinking, just delete that message and add in the word “Wait” instead.
The tricks also work. Thinking four times as long allows the model to score over 20 percentage points higher on maths tests as well as scientific ones. Being forced to think for 16 times as long takes the model from being unable to earn a single mark on a hard maths exam to getting a score of 60%. Thinking harder is more expensive, of course, and the inference costs increase with each extra “wait”. But with training available so cheaply, the added expense may be worth it.
The researchers say their new model already beats OpenAI’s first effort in the space, September’s o1-preview, on measures of maths ability. The efficiency drive is the new frontier.
Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.
© 2025, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com
-
Cryptocurrency6 days agoIlluminating progress: Is a $140K income ‘poor’? – Crypto News
-
Technology5 days ago
Crypto Lawyer Bill Morgan Praises Ripple’s Multi-Chain Strategy as RLUSD Hits $1.1B – Crypto News
-
others6 days agoGold holds strong at $4,200 as Fed-cut anticipation builds – Crypto News
-
Blockchain4 days agoAnalyst Reveals What You Should Look Out For – Crypto News
-
Cryptocurrency1 week ago
Crypto Platform Polymarket Relaunches in U.S. Following CFTC Approval – Crypto News
-
Cryptocurrency1 week agoUK recognises crypto as property in major digital asset shift – Crypto News
-
others7 days ago
Bitcoin Price Forecast as BlackRock Sends $125M in BTC to Coinbase — Is a Crash Inevitable? – Crypto News
-
Technology6 days agoSamsung Galaxy S25 Ultra 5G for under ₹80,000 on Flipkart? Here’s how the deal works – Crypto News
-
Blockchain3 days agoStripe and Paradigm Open Tempo Blockchain Project to Public – Crypto News
-
others1 week ago
$12T Charles Schwab to Launch Bitcoin and Ethereum Trading in Early 2026, CEO Confirms – Crypto News
-
Cryptocurrency1 week ago‘Get it done on time’ – Lawmakers push regulators on GENIUS Act rollout – Crypto News
-
Business1 week ago
Crypto Platform Polymarket Relaunches in U.S. Following CFTC Approval – Crypto News
-
Technology7 days agoWorking on a screen all day? These 8 LED monitors in Dec 2025 are kinder on your eyes – Crypto News
-
others6 days ago
Morgan Stanley Turns Bullish, Says Fed Will Cut Rates by 25bps This Month – Crypto News
-
Cryptocurrency6 days agoFlorida Appeals Court Revives $80M Bitcoin Theft – Crypto News
-
Cryptocurrency6 days agoCrypto Holiday Gift Guide 2025 – Crypto News
-
others4 days ago
Breaking: Labor Department Cancels October PPI Inflation Report Ahead of FOMC Meeting – Crypto News
-
Cryptocurrency4 days agoArgentina moves to reshape crypto rules as banks prepare for Bitcoin services – Crypto News
-
others1 week ago
XRP News: Ripple Expands Payments Service With RedotPay Integration – Crypto News
-
Business1 week ago
Sui Price Surges 10% As Vanguard Group Adds SUI to Bitwise 10 Crypto Index – Crypto News
-
Cryptocurrency1 week agoRipple CTO Shares Hilarious Email from Jed McCaleb Impersonator – Crypto News
-
Business1 week ago
Senator Tim Scott Floats December 17 and 18 For Crypto Market Bill Markup – Crypto News
-
Cryptocurrency1 week agoBTC staking platform Babylon teams up with Aave for Bitcoin-backed DeFi insurance – Crypto News
-
Blockchain1 week agoSolana (SOL) Cools Off After Rally While Market Eyes a Resistance Break – Crypto News
-
others6 days agoThe rally to 7120 continues – Crypto News
-
Blockchain6 days agoBitcoin Buries The Tulip Myth After 17 Years: Balchunas – Crypto News
-
others5 days agoNasdaq futures hold key structure as price compresses toward major resistance zones – Crypto News
-
others5 days agoNasdaq futures hold key structure as price compresses toward major resistance zones – Crypto News
-
Business1 week ago
Trump Sets Early 2026 Timeline for New Fed Chair Pick – Crypto News
-
Cryptocurrency1 week agoVanguard reverses course, opens door to Bitcoin, Ethereum, XRP, and Solana ETFs – Crypto News
-
Blockchain1 week agoLedger Finds Chip Flaw Allowing Complete Phone Takeover – Crypto News
-
Business1 week ago
Kalshi, Robinhood and Crypto com Face Cease & Desist Order in Connecticut – Crypto News
-
Business1 week ago
What’s Next for Dogecoin Price After Whales Scoop 480M DOGE? – Crypto News
-
Technology1 week agoCloudflare Resolved Services Issues Caused by Software Update – Crypto News
-
others1 week ago
XRP Price Prediction As Spot ETF Inflows Near $1 Billion: What’s Next? – Crypto News
-
others7 days ago
Colombia Consumer Price Index (YoY) below forecasts (5.45%) in November: Actual (5.3%) – Crypto News
-
Technology7 days ago
Solana Price Outlook: Reversal at Key Support Could Lead to $150 Target – Crypto News
-
Technology6 days agoFrom security camera to gaming hub: 6 Easy tricks to make your old smartphone genuinely useful again – Crypto News
-
Cryptocurrency5 days agoWhy Ethereum strengthens despite whale selling – Inside Asia premium twist – Crypto News
-
Technology5 days agoStarlink India pricing revealed: How much does monthly plan cost and what are its benefits? – Crypto News
-
Cryptocurrency1 week ago
Hedera Price Surges 10% After Canary Capital HBAR ETF Goes Live on Vanguard – Crypto News
-
Blockchain1 week agoLeveraged ETFs Tied To Strategy Suffer Major Losses – Crypto News
-
Metaverse1 week agoIndian enterprises all-set to take an AI leap as partners guide adoption – Crypto News
-
Business1 week ago
December Fed Rate Cut Prospects Strengthen After ADP Shows Deepening Labor Market Weakness – Crypto News
-
Blockchain1 week agoTaiwan to Pass Stablecoin Regulations in Late 2026: Report – Crypto News
-
Cryptocurrency1 week agoEric Trump’s American Bitcoin Steadies After ‘First Major Unlock’ of Shares – Crypto News
-
Blockchain1 week agoAnalysts Turn Bullish on SUI as Token Extends Gains Amid Renewed Institutional Interest – Crypto News
-
Technology1 week agoTinder says 2026 will be the year of ‘no mixed signals’ as daters embrace clarity – Crypto News
-
Cryptocurrency1 week agoCayman Islands sees rising Web3 foundation activity – Crypto News
-
Technology1 week agoApple Watch’s latest update drops a lifesaving feature for Indian users: here’s how it works – Crypto News
