Technology
The bigger-is-better approach to AI is running out of road. – Crypto News
View Full Image
But the most consistent result from modern AI research is that, while big is good, bigger is better. Models have therefore been growing at a blistering pace. GPT-4, released in March, is thought to have around 1trn parameters—nearly six times as many as its predecessor. Sam Altman, the firm’s boss, put its development costs at more than $100m. Similar trends exist across the industry. Epoch AI, a research firm, estimated in 2022 that the computing power necessary to train a cutting-edge model was doubling every six to ten months (see chart).
This gigantism is becoming a problem. If Epoch AI’s ten-monthly doubling figure is correct, then training costs could exceed a billion dollars by 2026—assuming, that is, models do not run out of data first. An analysis published in October 2022 forecasts that the stock of high-quality text for training may well be exhausted around the same time. And even once the training is complete, actually using the resulting model can be expensive as well. The bigger the model, the more it costs to run. Earlier this year Morgan Stanley, a bank, guessed that, were half of Google’s searches to be handled by a current GPT-style program, it could cost the firm an additional $6bn a year. As the models get bigger, that number will probably rise.
Many in the field therefore think the “bigger is better” approach is running out of road. If AI models are to carry on improving—never mind fulfilling the AI-related dreams currently sweeping the tech industry—their creators will need to work out how. to get more performance out of fewer resources. As Mr Altman put it in April, reflecting on the history of giant-sized AI: “I think we’re at the end of an era.”
Quantitative tightening
Instead, researchers are beginning to turn their attention to making their models more efficient, rather than simply bigger. One approach is to make trade-offs, cutting the number of parameters but training models with more data. In 2022 researchers at DeepMind, a division of Google, trained Chinchilla, an LLM with 70bn parameters, on a corpus of 1.4trn words. The model outperforms GPT-3, which has 175bn parameters trained on 300bn words. Feeding a smaller LLM more data means it takes longer to train. But the result is a smaller model that is faster and cheaper to use.
Another option is to make the maths fuzzier. Tracking fewer decimal places for each number in the model—rounding them off, in other words—can cut hardware requirements drastically. In March researchers at the Institute of Science and Technology in Austria showed that rounding could squash the amount of memory consumed by a model similar to GPT-3, allowing the model to run on one high-end GPU instead of five, and with only ” negligible accuracy degradation”.
Some users fine-tune general-purpose LLMs to focus on a specific task such as generating legal documents or detecting fake news. That is not as cumbersome as training an LLM in the first place, but can still be costly and slow. Fine-tuning LLaMA, an open-source model with 65bn parameters that was built by Meta, Facebook’s corporate parent, takes multiple GPUs anywhere from several hours to a few days.
Researchers at the University of Washington have invented a more efficient method that allowed them to create a new model, Guanaco, from LLaMA on a single GPU in a day without sacrificing much, if any, performance. Part of the trick was to use a similar rounding technique to the Austrians. But they also used a technique called “low-rank adaptation”, which involves freezing a model’s existing parameters, then adding a new, smaller set of parameters in between. The fine-tuning is done by altering only those new variables. enough that even relatively feeble computers such as smartphones might be up to the task. Allowing LLMs to live on a user’s device, rather than in the giant data centers they currently inhabit, could allow for both greater personalization and more privacy.
A team at Google, meanwhile, has come up with a different option for those who can get by with smaller models. This approach focuses on extracting the specific knowledge required from a large, general-purpose model into a smaller, specialized one. The bigger model acts as a teacher, and the smaller one as a student. The researchers ask the teacher to answer questions and show how it comes to its conclusions. Both the answers and the teacher’s reasoning are used to train the student’s model. The team was able to train a student model with just 770m parameters, which outperformed its 540bn-parameter teacher on a specialized reasoning task.
Rather than focus on what the models are doing, another approach is to change how they are made. A great deal of AI programming is done in a language called Python. It is designed to be easy to use, freeing coders from the need to think about exactly how their programs will behave on the chips that run them. The price of abstracting away such details is slow code. Paying more attention to these implementation details can bring big benefits. This is “a huge part of the game at the moment”, says Thomas Wolf, chief science officer of Hugging Face, an open-source AI company.
Learn to code
In 2022, for instance, researchers at Stanford University published a modified version of the “attention algorithm”, which allows LLMs to learn connections between words and ideas. The idea was to modify the code to take account of what is happening on the chip that is running it, and especially to keep track of when a given piece of information needs to be looked up or stored. Their algorithm was able to speed up the training of GPT-2, an older large language model, threefold. It also gave it the ability to respond to longer queries.
Sleeker code can also come from better tools. Earlier this year, Meta released an updated version of PyTorch, an AI-programming framework. By allowing coders to think more about how computations are arranged on the actual chip, it can double a model’s training speed by adding just one line of code. Modular, a startup founded by former engineers at Apple and Google, last month released a new AI-focused programming language called Mojo, which is based on Python. It also gives coders control over all sorts of fine details that were previously hidden. In some cases, code written in Mojo can run thousands of times faster than the same code in Python.
A final option is to improve the chips on which that code runs. GPUs are only accidentally good at running AI software—they were originally designed to process the fancy graphics in modern video games. In particular, says a hardware researcher at Meta, GPUs are imperfectly designed for “inference” work (ie, actually running a model once it has been trained). Some firms are therefore designing their own, more specialized hardware. its AI projects on its in-house “TPU” chips. Meta, with its MTIAs, and Amazon, with its Inferentia chips, are pursuing a similar path.
That such large performance increases can be extracted from relatively simple changes like rounding numbers or switching programming languages might seem surprising. But it reflects the breakneck speed with which LLMs have been developed. For many years they were research projects, and simply getting them to work well was more important than making them elegant. Only recently have they graduated to commercial, mass-market products. Most experts think there remains plenty of room for improvement. As Chris Manning, a computer scientist at Stanford University, put it: “There’s absolutely no reason to believe…that this is the ultimate neural architecture, and we will never find anything better.”
© 2023, The Economist Newspaper Limited. All rights reserved. From The Economist, published under license. The original content can be found on www.economist.com
Catch all the technology news and Updates on Live Mint. Download Mint News App to get Daily market update Live business news,
Updated: 24 Jun 2023, 09:39 AM IST
-
Cryptocurrency1 week agoIlluminating progress: Is a $140K income ‘poor’? – Crypto News
-
Technology6 days ago
Crypto Lawyer Bill Morgan Praises Ripple’s Multi-Chain Strategy as RLUSD Hits $1.1B – Crypto News
-
Blockchain5 days agoAnalyst Reveals What You Should Look Out For – Crypto News
-
Technology1 week agoSamsung Galaxy S25 Ultra 5G for under ₹80,000 on Flipkart? Here’s how the deal works – Crypto News
-
others1 week agoGold holds strong at $4,200 as Fed-cut anticipation builds – Crypto News
-
Cryptocurrency5 days agoArgentina moves to reshape crypto rules as banks prepare for Bitcoin services – Crypto News
-
Cryptocurrency1 week agoUK recognises crypto as property in major digital asset shift – Crypto News
-
others1 week ago
Bitcoin Price Forecast as BlackRock Sends $125M in BTC to Coinbase — Is a Crash Inevitable? – Crypto News
-
Cryptocurrency1 week agoCrypto Holiday Gift Guide 2025 – Crypto News
-
others5 days ago
Breaking: Labor Department Cancels October PPI Inflation Report Ahead of FOMC Meeting – Crypto News
-
Blockchain4 days agoStripe and Paradigm Open Tempo Blockchain Project to Public – Crypto News
-
Technology1 week agoWorking on a screen all day? These 8 LED monitors in Dec 2025 are kinder on your eyes – Crypto News
-
others1 week ago
Morgan Stanley Turns Bullish, Says Fed Will Cut Rates by 25bps This Month – Crypto News
-
Cryptocurrency1 week agoFlorida Appeals Court Revives $80M Bitcoin Theft – Crypto News
-
Blockchain4 days agoBMW Helps JPMorgan Drive Blockchain-Based FX Payments – Crypto News
-
Blockchain1 week agoSolana (SOL) Cools Off After Rally While Market Eyes a Resistance Break – Crypto News
-
others1 week ago
XRP Price Prediction As Spot ETF Inflows Near $1 Billion: What’s Next? – Crypto News
-
others1 week agoThe rally to 7120 continues – Crypto News
-
Blockchain1 week agoBitcoin Buries The Tulip Myth After 17 Years: Balchunas – Crypto News
-
Cryptocurrency7 days agoWhy Ethereum strengthens despite whale selling – Inside Asia premium twist – Crypto News
-
others6 days agoNasdaq futures hold key structure as price compresses toward major resistance zones – Crypto News
-
others6 days agoNasdaq futures hold key structure as price compresses toward major resistance zones – Crypto News
-
Business1 week ago
Kalshi, Robinhood and Crypto com Face Cease & Desist Order in Connecticut – Crypto News
-
Business1 week ago
What’s Next for Dogecoin Price After Whales Scoop 480M DOGE? – Crypto News
-
Blockchain1 week agoN3XT Launches Blockchain-Powered Bank | PYMNTS.com – Crypto News
-
Cryptocurrency1 week agoCoinDCX data reveals India’s rising appetite for diversified digital assets – Crypto News
-
Technology1 week agoCloudflare Resolved Services Issues Caused by Software Update – Crypto News
-
others1 week ago
Colombia Consumer Price Index (YoY) below forecasts (5.45%) in November: Actual (5.3%) – Crypto News
-
Technology1 week ago
Solana Price Outlook: Reversal at Key Support Could Lead to $150 Target – Crypto News
-
Technology1 week agoFrom security camera to gaming hub: 6 Easy tricks to make your old smartphone genuinely useful again – Crypto News
-
others1 week agoCanadian Dollar soars after upbeat labor report – Crypto News
-
others1 week ago
$1.3T BPCE To Roll Out Bitcoin, Ethereum and Solana Trading For Clients – Crypto News
-
others7 days agoStocks survive PCE and consumer data – FOMC too? – Crypto News
-
Cryptocurrency6 days agoThursday links: Prediction markets, agent hackers, quantum risks – Crypto News
-
Technology6 days agoStarlink India pricing revealed: How much does monthly plan cost and what are its benefits? – Crypto News
-
Cryptocurrency1 week agoCayman Islands sees rising Web3 foundation activity – Crypto News
-
Technology1 week agoApple Watch’s latest update drops a lifesaving feature for Indian users: here’s how it works – Crypto News
-
Metaverse1 week agoBetter Tomorrow: How OpenAI is reimagining education and inclusion for the digital age – Crypto News
-
Business1 week ago
Bitcoin, ETH, XRP, SOL’s Max Pain Price as Over $4B Options to Expire – Crypto News
-
Business1 week ago
Is ZCash Price Set for a Bigger Rally After Its 10% Surge on the Bitget Listing? – Crypto News
-
Cryptocurrency1 week agoGlassnode report reveals Bitcoin’s growing stability amid ETF activity and RWA expansion – Crypto News
-
Technology1 week ago
Peter Brandt Hints at Further Downside for Bitcoin After Brief Rebound – Crypto News
-
others1 week ago
United States Consumer Credit Change came in at $9.18B, below expectations ($10.5B) in October – Crypto News
-
Technology6 days agoTier 2 and Tier 3 cities drive over 90% of engagement on audio social platforms in India, says report – Crypto News
-
Blockchain6 days agoBittensor Set for First TAO Halving on Dec. 14 – Crypto News
-
Blockchain6 days agoBitcoin Santa Rally Talk Meets Last FOMC of 2025 – Crypto News
-
Blockchain6 days agoEthereum Founder Breaks Silence With Major Upgrade Proposal – Crypto News
-
Business5 days ago
Solana Price Set for $150+ as Bullish Sentiment Rises in Crypto Market – Crypto News
-
Cryptocurrency5 days agoCircle Wins ADGM License, Taps Former Visa Executive to Lead Middle East Push – Crypto News
-
Cryptocurrency5 days agoCircle Wins ADGM License, Taps Former Visa Executive to Lead Middle East Push – Crypto News
