Forget DeepSeek. Large language models are getting cheaper still

Metaverse

Forget DeepSeek. Large language models are getting cheaper still – Crypto News

Published

4 months ago

April 13, 2025

Dripp

As recently as 2022, just building a large language model (LLM) was a feat at the cutting edge of artificial-intelligence (AI) engineering. Three years on, experts are harder to impress. To really stand out in the crowded marketplace, an AI lab needs not just to build a high-quality model, but to build it cheaply.

In December a Chinese firm, DeepSeek, earned itself headlines for cutting the dollar cost of training a frontier model down from $61.6m (the cost of Llama 3.1, an LLM produced by Meta, a technology company) to just $6m. In a preprint posted online in February, researchers at Stanford University and the University of Washington claim to have gone several orders of magnitude better, training their s1 LLM for just $6. Phrased another way, DeepSeek took 2.7m hours of computer time to train; s1 took just under seven hours.

The figures are eye-popping, but the comparison is not exactly like-for-like. Where DeepSeek’s v3 chatbot was trained from scratch—accusations of data theft from OpenAI, an American competitor, and peers notwithstanding—s1 is instead “fine-tuned” on the pre-existing Qwen2.5 LLM, produced by Alibaba, China’s other top-tier AI lab. Before s1’s training began, in other words, the model could already write, ask questions, and produce code.

Piggybacking of this kind can lead to savings, but can’t cut costs down to single digits on its own. To do that, the American team had to break free of the dominant paradigm in AI research, wherein the amount of data and computing power available to train a language model is thought to improve its performance. They instead hypothesised that a smaller amount of data, of high enough quality, could do the job just as well. To test that proposition, they gathered a selection of 59,000 questions covering everything from standardised English tests to graduate-level problems in probability, with the intention of narrowing them down to the most effective training set possible.

To work out how to do that, the questions on their own aren’t enough. Answers are needed, too. So the team asked another AI model, Google’s Gemini, to tackle the questions using what is known as a reasoning approach, in which the model’s “thought process” is shared alongside the answer. That gave them three datasets to use to train s1: 59,000 questions; the accompanying answers; and the “chains of thought” used to connect the two.

They then threw almost all of it away. As s1 was based on Alibaba’s Qwen AI, anything that model could already solve was unnecessary. Anything poorly formatted was also tossed, as was anything that Google’s model had solved without needing to think too hard. If a given problem didn’t add to the overall diversity of the training set, it was out too. The end result was a streamlined 1,000 questions that the researchers proved could train a model just as high-performing as one trained on all 59,000—and for a fraction of the cost.

Such tricks abound. Like all reasoning models, s1 “thinks” before answering, working through the problem before announcing it has finished and presenting a final answer. But lots of reasoning models give better answers if they’re allowed to think for longer, an approach called “test-time compute”. And so the researchers hit upon the simplest possible approach to get the model to carry on reasoning: when it announces that it has finished thinking, just delete that message and add in the word “Wait” instead.

The tricks also work. Thinking four times as long allows the model to score over 20 percentage points higher on maths tests as well as scientific ones. Being forced to think for 16 times as long takes the model from being unable to earn a single mark on a hard maths exam to getting a score of 60%. Thinking harder is more expensive, of course, and the inference costs increase with each extra “wait”. But with training available so cheaply, the added expense may be worth it.

The researchers say their new model already beats OpenAI’s first effort in the space, September’s o1-preview, on measures of maths ability. The efficiency drive is the new frontier.

Curious about the world? To enjoy our mind-expanding science coverage, sign up to Simply Science, our weekly subscriber-only newsletter.

Up Next

Google Gemini launches video generator: How to make AI clips using Veo 2 — Step-by-step guide – Crypto News

Don't Miss

After Ghibli trend, Tom and Jerry makes AI debut: One-minute episode sparks debate, social media says ‘its shitty’ – Crypto News

Institutional Demand Surges As Ethereum Sets New Inflow Records

Blockchain6 days ago

Institutional Demand Surges As Ethereum Sets New Inflow Records – Crypto News

DeFi Development Nears 1 Million Solana In Treasury

Blockchain6 days ago

DeFi Development Nears 1 Million Solana In Treasury – Crypto News

Altseason heats up, but Bitcoin could face short-term pullback - How?

Cryptocurrency1 week ago

Altseason heats up, but Bitcoin could face short-term pullback – How? – Crypto News

Technology5 days ago

Bitwise Crypto Index Fund Holding BTC, ETH, XRP To Convert Into ETF – Crypto News

BNB Chain Teases New Blockchain with Privacy Features to Compete With Crypto Exchanges

De-fi1 week ago

BNB Chain Teases New Blockchain with Privacy Features to Compete With Crypto Exchanges – Crypto News

Technology1 week ago

“Decentralized Ponzi Scheme”- Gold Bug Peter Schiff Slams Landmark Crypto Bills – Crypto News

7 Ways to Protect Yourself From Violent Crypto Attacks (Without a Shotgun)

Cryptocurrency1 week ago

California Sheriffs Believe 74-Year-Old’s Disappearance Linked to Son’s Crypto Fortune – Crypto News

Business7 days ago

Vitalik Buterin Approves Gas Limit Hike, Warns Against Risky Ethereum Scaling – Crypto News

Bitcoin gets March 25 'blast-off date' as US dollar hits 4-month low

Blockchain6 days ago

Strategy to keep STRC Fund Pegged to $100 – Crypto News

Grab up to 43% off on best selling premium laptops from Apple, Asus and more

Technology6 days ago

Grab up to 43% off on best selling premium laptops from Apple, Asus and more – Crypto News

Meta’s AI Studio: Red flag or red herring?

Technology1 week ago

Meta’s AI Studio: Red flag or red herring? – Crypto News

Technology1 week ago

Breaking: GENIUS Act Becomes First Major Crypto Legislation as Trump Signs Bill – Crypto News

Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills

De-fi1 week ago

Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills – Crypto News

Shytoshi Kusama Breaks Silence on New SHIB AI Whitepaper and Transformed Future

Cryptocurrency1 week ago

Shytoshi Kusama Breaks Silence on New SHIB AI Whitepaper and Transformed Future – Crypto News

Sanctum acquires Ironforge, plots transaction infrastructure vertical

Cryptocurrency1 week ago

Sanctum acquires Ironforge, plots transaction infrastructure vertical – Crypto News

Bulls Hold $130B Market Cap as Settlement Buzz Heats Up

Cryptocurrency1 week ago

XRP Price Hits All-Time High at $3.66 — Can It Smash Through $4 After Trump Win & SEC Shake-Up? – Crypto News

How to Use Google Gemini to Turn Crypto News Into Trade Signals

Blockchain7 days ago

How to Use Google Gemini to Turn Crypto News Into Trade Signals – Crypto News

EUR/CHF rises on speculation of SNB intervention, but EU–US trade risks cap gains

others7 days ago

EUR/CHF rises on speculation of SNB intervention, but EU–US trade risks cap gains – Crypto News

Solana Skyrockets as Bitcoin and Ethereum Grind Higher: Where Do Prices Go Next?

Cryptocurrency6 days ago

Solana Clinches 5-Month High, Where to From Here? – Crypto News

Malicious code found in fake coding extensions used to steal crypto

Technology1 week ago

Malicious code found in fake coding extensions used to steal crypto – Crypto News

Cryptocurrency1 week ago

XRP Price Spikes to Record Highs As Momentum Signals Extended Gains – Crypto News

$Bitcoin fractal, boost in HODLers hints at rally to $120K$ $Bitcoin fractal, boost in HODLers hints at rally to $120K$

Blockchain1 week ago

Why Bitcoin self-custody is declining in the ETF era – Crypto News

Bitcoin hits record high above $120K; US June inflation data awaited

Cryptocurrency1 week ago

US House passes three key crypto bills; market reaction muted as Bitcoin dips – Crypto News

De-fi1 week ago

Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills – Crypto News

Streaming Service Handing $3,400,000 To Current and Former Customers To Settle Illegal Data Harvesting Allegations

others1 week ago

Streaming Service Handing $3,400,000 To Current and Former Customers To Settle Illegal Data Harvesting Allegations – Crypto News

Cryptocurrency1 week ago

Arthur Hayes-linked wallet bags $2M worth of AAVE and LDO in an OTC deal – Crypto News

others1 week ago

Why Is The Crypto Market Rising Today? – Crypto News

others7 days ago

Breaking: Polymarket Reenters US Market With Exchange Acquisition As Probe Ends – Crypto News

Justin Sun 'not aware' of circulating reports about CZ plea deal

Blockchain6 days ago

To The Moon? Justin Sun To Be Launched Into Space After $28M Bid – Crypto News

Arthur Hayes-Backed Ethena Labs Announces New Tokenomics Update for ENA

others6 days ago

Venture Capital Firms Launch $360,000,000 Crypto Treasury Company Focused on Arthur Hayes-Backed Ethena (ENA) – Crypto News

Ethereum Shatters Inflow Records, Pulls in $2.12 Billion in a Week

Cryptocurrency6 days ago

Ethereum Shatters Inflow Records, Pulls in $2.12 Billion in a Week – Crypto News

Galaxy Digital Claims Futarchy Model Can Give DAOs ‘Stronger Chance of Success’

De-fi6 days ago

Galaxy Digital Claims Futarchy Model Can Give DAOs ‘Stronger Chance of Success’ – Crypto News

Technology6 days ago

BitOrigin Begins $500M Dogecoin Treasury With 40.5M Buy, Analysts Predict Fresh Bull Cycle – Crypto News

Blockchain6 days ago

XRP Could Skyrocket 500% Against Bitcoin, Analyst Warns – Crypto News

Crypto Needs Minimum Viable Decentralization

Blockchain6 days ago

Crypto Needs Minimum Viable Decentralization – Crypto News

Alibaba launches its ‘most advanced’ open-source AI model Qwen3-Coder — All you need to know

Technology5 days ago

Alibaba launches its ‘most advanced’ open-source AI model Qwen3-Coder — All you need to know – Crypto News

Bitcoin Stable, Ethereum Declines, Niche Tokens Rally

Cryptocurrency2 days ago

ETH to Lead BTC Over Next 6 Months, Says Galaxy CEO – Crypto News

others1 week ago

From Staking to Arbitrage: Why Mevstaking Is Gaining Investor Attention – Crypto News

Cryptocurrency1 week ago

Ethereum price surges 6% to $2,800 as shorts suffer amid $500M crypto liquidation – Crypto News

OnePlus Pad 3 with Snapdragon 8 Elite SoC makes its India debut, set to go on first sale in September

Technology1 week ago

OnePlus Pad 3 with Snapdragon 8 Elite SoC makes its India debut, set to go on first sale in September – Crypto News

Business1 week ago

XRP ETF Approval Odds Rise to 86% Following ProShares Launch Success – Crypto News

Friday charts: Fiscal dominance and super intelligence

Cryptocurrency1 week ago

Friday charts: Fiscal dominance and super intelligence – Crypto News

Trump’s Crypto Assets Now Comprise a Key Part of Family Fortune Worth Billions

De-fi1 week ago

Trump’s Crypto Assets Now Comprise a Key Part of Family Fortune Worth Billions – Crypto News

Business1 week ago

Pi Coin Price Technical Analysis Confirms Buy Signal Despite 2M Exchange Inflows – Crypto News

Not Google or Bing! This search engine lets you block AI images in search results

Technology1 week ago

Not Google or Bing! This search engine lets you block AI images in search results – Crypto News

Cryptocurrency1 week ago

GENIUS Act Is The Catalyst For XRP And RLUSD’s Dominance, Expert Declares – Crypto News

Nasdaq Files to Add Staking to BlackRock’s ETH ETF

De-fi1 week ago

Nasdaq Files to Add Staking to BlackRock’s ETH ETF – Crypto News

Cryptocurrency1 week ago

Bitcoin Darknet Giant Abacus Vanishes – Crypto News

Business1 week ago

Expert Predicts XRP Price to Hit $45 as XRP Ledger Activity Soars and SWIFT Volume Declines – Crypto News

Galaxy Watch 8 and Watch 8 Classic go on sale in India: Check price, variants and offers

Technology6 days ago

Galaxy Watch 8 and Watch 8 Classic go on sale in India: Check price, variants and offers – Crypto News

Crypto News

Forget DeepSeek. Large language models are getting cheaper still – Crypto News

Metaverse

Forget DeepSeek. Large language models are getting cheaper still – Crypto News

You may like

Trending