
Metaverse
Forget DeepSeek. Large language models are getting cheaper still – Crypto News
As recently as 2022, just building a large language model (LLM) was a feat at the cutting edge of artificial-intelligence (AI) engineering. Three years on, experts are harder to impress. To really stand out in the crowded marketplace, an AI lab needs not just to build a high-quality model, but to build it cheaply.
In December a Chinese firm, DeepSeek, earned itself headlines for cutting the dollar cost of training a frontier model down from $61.6m (the cost of Llama 3.1, an LLM produced by Meta, a technology company) to just $6m. In a preprint posted online in February, researchers at Stanford University and the University of Washington claim to have gone several orders of magnitude better, training their s1 LLM for just $6. Phrased another way, DeepSeek took 2.7m hours of computer time to train; s1 took just under seven hours.
The figures are eye-popping, but the comparison is not exactly like-for-like. Where DeepSeek’s v3 chatbot was trained from scratch—accusations of data theft from OpenAI, an American competitor, and peers notwithstanding—s1 is instead “fine-tuned” on the pre-existing Qwen2.5 LLM, produced by Alibaba, China’s other top-tier AI lab. Before s1’s training began, in other words, the model could already write, ask questions, and produce code.
Piggybacking of this kind can lead to savings, but can’t cut costs down to single digits on its own. To do that, the American team had to break free of the dominant paradigm in AI research, wherein the amount of data and computing power available to train a language model is thought to improve its performance. They instead hypothesised that a smaller amount of data, of high enough quality, could do the job just as well. To test that proposition, they gathered a selection of 59,000 questions covering everything from standardised English tests to graduate-level problems in probability, with the intention of narrowing them down to the most effective training set possible.
To work out how to do that, the questions on their own aren’t enough. Answers are needed, too. So the team asked another AI model, Google’s Gemini, to tackle the questions using what is known as a reasoning approach, in which the model’s “thought process” is shared alongside the answer. That gave them three datasets to use to train s1: 59,000 questions; the accompanying answers; and the “chains of thought” used to connect the two.
They then threw almost all of it away. As s1 was based on Alibaba’s Qwen AI, anything that model could already solve was unnecessary. Anything poorly formatted was also tossed, as was anything that Google’s model had solved without needing to think too hard. If a given problem didn’t add to the overall diversity of the training set, it was out too. The end result was a streamlined 1,000 questions that the researchers proved could train a model just as high-performing as one trained on all 59,000—and for a fraction of the cost.
Such tricks abound. Like all reasoning models, s1 “thinks” before answering, working through the problem before announcing it has finished and presenting a final answer. But lots of reasoning models give better answers if they’re allowed to think for longer, an approach called “test-time compute”. And so the researchers hit upon the simplest possible approach to get the model to carry on reasoning: when it announces that it has finished thinking, just delete that message and add in the word “Wait” instead.
The tricks also work. Thinking four times as long allows the model to score over 20 percentage points higher on maths tests as well as scientific ones. Being forced to think for 16 times as long takes the model from being unable to earn a single mark on a hard maths exam to getting a score of 60%. Thinking harder is more expensive, of course, and the inference costs increase with each extra “wait”. But with training available so cheaply, the added expense may be worth it.
The researchers say their new model already beats OpenAI’s first effort in the space, September’s o1-preview, on measures of maths ability. The efficiency drive is the new frontier.
Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.
© 2025, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com
-
Blockchain4 days ago
The CFO and Treasurer’s Guide to Digital Assets – Crypto News
-
Cryptocurrency1 week ago
RVNL Share Price Crashes Toward ₹317 as PSU Rail Stocks Tank on Global Panic – Crypto News
-
Business1 week ago
Is DOGE Decoupling from Elon Musk? – Crypto News
-
others1 week ago
Gold struggles to capitalize on intraday recovery from multi-week low; remains below $3,050 – Crypto News
-
others1 week ago
Gold struggles to capitalize on intraday recovery from multi-week low; remains below $3,050 – Crypto News
-
others1 week ago
Gold struggles to capitalize on intraday recovery from multi-week low; remains below $3,050 – Crypto News
-
Cryptocurrency1 week ago
Famous Crypto Analyst Advises to Sell NVIDIA Stock: Here’s Why – Crypto News
-
Technology1 week ago
Weekly Tech Recap: ChatGPT generates fake Aadhaar cards, Meta rolls out new Llama 4 AI models and more – Crypto News
-
Blockchain1 week ago
A defendant tried to use an AI avatar in a legal appeal. It didn’t work – Crypto News
-
others1 week ago
$109,000,000,000 in US Gold Reserves Now in Question as German Officials Demand to Count Bullion Bars At New York Fed: Report – Crypto News
-
Cryptocurrency1 week ago
Ethereum DApps are thriving with $1B in fees, but what about ETH’s price? – Crypto News
-
Technology1 week ago
As audio streaming comes of age, companies turn to AI for a growth slingshot – Crypto News
-
others1 week ago
Massive Counter-Trend Rally Could Be Coming for Bitcoin (BTC), According to Benjamin Cowen – Here’s When – Crypto News
-
Business1 week ago
4 Reasons This is a Golden Opportunity to Buy Pi Coin – Crypto News
-
Cryptocurrency1 week ago
Tariffs Are Just the Tip of the Iceberg, Warns Billionaire Investor Ray Dalio – Crypto News
-
Technology1 week ago
Apple could give iPhone a radical makeover for its 20th anniversary, report says – Crypto News
-
Technology7 days ago
Apple exported iPhones worth ₹1.5 trillion from India in FY25: Union Minister Ashwini Vaishnaw – Crypto News
-
Technology1 week ago
Crypto Market Finds New Buyers as Microsoft, Apple and Nvidia Lose $1 trillion in 3 Days – Crypto News
-
Blockchain1 week ago
Crypto plunges as Trump tariff ‘medicine’ brutalizes global stock markets – Crypto News
-
Blockchain1 week ago
Bitcoin Price Crashes Hard—Is The Selloff Just Getting Started? – Crypto News
-
Blockchain1 week ago
Dogecoin Volume Remains Low As Price Tanks, Analyst Says Another Surge Is Coming – Crypto News
-
Technology1 week ago
Arthur Hayes Expects Fed Rate Cuts Soon If This Happens – Crypto News
-
Technology1 week ago
Arthur Hayes Expects Fed Rate Cuts Soon If This Happens – Crypto News
-
Technology1 week ago
Arthur Hayes Expects Fed Rate Cuts Soon If This Happens – Crypto News
-
Business1 week ago
Debt Restructuring Plan Gains Support, User Funds Recovery Imminent? – Crypto News
-
others1 week ago
Debt Restructuring Plan Gains Support, User Funds Recovery Imminent? – Crypto News
-
Technology1 week ago
Forget ChatGPT? China’s DeepSeek is working on smarter, self-improving AI models – Crypto News
-
Blockchain1 week ago
Ethereum Price Looks Set To Crash To $1,000-$1,500, But Can It Fill The CME Gaps Upwards To $3,933 – Crypto News
-
others1 week ago
Offered zero-for-zero tariffs to US for cars and all industrial goods – Crypto News
-
Technology1 week ago
4 Altcoins to Sell As Crypto Market Crash Triggers $1.4B in Liquidations – Crypto News
-
Business1 week ago
Should You DCA These 3 Crypto Tokens This Black Monday for 5X Gains in April? – Crypto News
-
Blockchain1 week ago
Metaplanet repays 2B yen bonds early, CEO comments on BTC ‘down days’ – Crypto News
-
Cryptocurrency1 week ago
Peter Schiff Predicts Ethereum Price To Drop Below $1,000, Compares It To Bitcoin And Gold – Crypto News
-
Blockchain1 week ago
Bitcoin Goes Beast Mode—Mining Power Tops 1 Zetahash in First-Ever Surge – Crypto News
-
Technology1 week ago
Arthur Hayes Predicts 70% Bitcoin Dominance as BTC Whales Hit Peak Accumulation – Crypto News
-
others1 week ago
Euro eases slightly near 1.0900 but maintains bullish bias – Crypto News
-
Blockchain1 week ago
Bloomberg analyst predicts Bitcoin could sink back to $10,000 – Crypto News
-
Technology1 week ago
Samsung Galaxy S25 Edge appears online ahead of launch: Specs, pricing and colours tipped – Crypto News
-
others1 week ago
SEC Commissioner Says Regulatory Agency Drastically Understating Risks of US Dollar Stablecoin Market – Crypto News
-
Blockchain1 week ago
Bitcoin on verge of largest ‘price drawdown’ of the bull market — Analyst – Crypto News
-
Blockchain1 week ago
Bitcoin Flashes ‘Death Cross’ Amid Tariff-Induced Market Turmoil – Crypto News
-
others1 week ago
EUR/USD continues to pare brief tariff gains – Crypto News
-
Technology1 week ago
Meity, dept of science working with Cert-In to build quantum cyber framework – Crypto News
-
Technology1 week ago
ChatGPT vs Meta AI: Which AI chatbot is better after the Llama 4 launch? – Crypto News
-
Business1 week ago
Key Levels ETH Traders Must Watch as BTC Trades Below $80K – Crypto News
-
Blockchain1 week ago
Best Meme Coins to Buy as Solana TVL Reaches All-Time High – Crypto News
-
Cryptocurrency1 week ago
Bitcoin’s price consolidates, but altcoins drop: Breaking down how and why – Crypto News
-
Technology1 week ago
Save big on ACs, refrigerators, chimneys and more in Amazon Summer Fest: Get up to 60% off on home appliances – Crypto News
-
Cryptocurrency1 week ago
Panic Hits Crypto, Stocks, and Commodities at Levels Not Seen Since 2020 Covid Crash – Crypto News
-
Blockchain1 week ago
Bitcoin Dips Below $75K As Markets Tremble: What’s Goin On? – Crypto News