If AI is to keep getting better, it will have to do more with less

Technology

The bigger-is-better approach to AI is running out of road. – Crypto News

Published

2 years ago

June 24, 2023

Dripp

View Full Image

But the most consistent result from modern AI research is that, while big is good, bigger is better. Models have therefore been growing at a blistering pace. GPT-4, released in March, is thought to have around 1trn parameters—nearly six times as many as its predecessor. Sam Altman, the firm’s boss, put its development costs at more than $100m. Similar trends exist across the industry. Epoch AI, a research firm, estimated in 2022 that the computing power necessary to train a cutting-edge model was doubling every six to ten months (see chart).

This gigantism is becoming a problem. If Epoch AI’s ten-monthly doubling figure is correct, then training costs could exceed a billion dollars by 2026—assuming, that is, models do not run out of data first. An analysis published in October 2022 forecasts that the stock of high-quality text for training may well be exhausted around the same time. And even once the training is complete, actually using the resulting model can be expensive as well. The bigger the model, the more it costs to run. Earlier this year Morgan Stanley, a bank, guessed that, were half of Google’s searches to be handled by a current GPT-style program, it could cost the firm an additional $6bn a year. As the models get bigger, that number will probably rise.

Many in the field therefore think the “bigger is better” approach is running out of road. If AI models are to carry on improving—never mind fulfilling the AI-related dreams currently sweeping the tech industry—their creators will need to work out how. to get more performance out of fewer resources. As Mr Altman put it in April, reflecting on the history of giant-sized AI: “I think we’re at the end of an era.”

Quantitative tightening

Instead, researchers are beginning to turn their attention to making their models more efficient, rather than simply bigger. One approach is to make trade-offs, cutting the number of parameters but training models with more data. In 2022 researchers at DeepMind, a division of Google, trained Chinchilla, an LLM with 70bn parameters, on a corpus of 1.4trn words. The model outperforms GPT-3, which has 175bn parameters trained on 300bn words. Feeding a smaller LLM more data means it takes longer to train. But the result is a smaller model that is faster and cheaper to use.

Another option is to make the maths fuzzier. Tracking fewer decimal places for each number in the model—rounding them off, in other words—can cut hardware requirements drastically. In March researchers at the Institute of Science and Technology in Austria showed that rounding could squash the amount of memory consumed by a model similar to GPT-3, allowing the model to run on one high-end GPU instead of five, and with only ” negligible accuracy degradation”.

Some users fine-tune general-purpose LLMs to focus on a specific task such as generating legal documents or detecting fake news. That is not as cumbersome as training an LLM in the first place, but can still be costly and slow. Fine-tuning LLaMA, an open-source model with 65bn parameters that was built by Meta, Facebook’s corporate parent, takes multiple GPUs anywhere from several hours to a few days.

Researchers at the University of Washington have invented a more efficient method that allowed them to create a new model, Guanaco, from LLaMA on a single GPU in a day without sacrificing much, if any, performance. Part of the trick was to use a similar rounding technique to the Austrians. But they also used a technique called “low-rank adaptation”, which involves freezing a model’s existing parameters, then adding a new, smaller set of parameters in between. The fine-tuning is done by altering only those new variables. enough that even relatively feeble computers such as smartphones might be up to the task. Allowing LLMs to live on a user’s device, rather than in the giant data centers they currently inhabit, could allow for both greater personalization and more privacy.

A team at Google, meanwhile, has come up with a different option for those who can get by with smaller models. This approach focuses on extracting the specific knowledge required from a large, general-purpose model into a smaller, specialized one. The bigger model acts as a teacher, and the smaller one as a student. The researchers ask the teacher to answer questions and show how it comes to its conclusions. Both the answers and the teacher’s reasoning are used to train the student’s model. The team was able to train a student model with just 770m parameters, which outperformed its 540bn-parameter teacher on a specialized reasoning task.

Rather than focus on what the models are doing, another approach is to change how they are made. A great deal of AI programming is done in a language called Python. It is designed to be easy to use, freeing coders from the need to think about exactly how their programs will behave on the chips that run them. The price of abstracting away such details is slow code. Paying more attention to these implementation details can bring big benefits. This is “a huge part of the game at the moment”, says Thomas Wolf, chief science officer of Hugging Face, an open-source AI company.

Learn to code

In 2022, for instance, researchers at Stanford University published a modified version of the “attention algorithm”, which allows LLMs to learn connections between words and ideas. The idea was to modify the code to take account of what is happening on the chip that is running it, and especially to keep track of when a given piece of information needs to be looked up or stored. Their algorithm was able to speed up the training of GPT-2, an older large language model, threefold. It also gave it the ability to respond to longer queries.

Sleeker code can also come from better tools. Earlier this year, Meta released an updated version of PyTorch, an AI-programming framework. By allowing coders to think more about how computations are arranged on the actual chip, it can double a model’s training speed by adding just one line of code. Modular, a startup founded by former engineers at Apple and Google, last month released a new AI-focused programming language called Mojo, which is based on Python. It also gives coders control over all sorts of fine details that were previously hidden. In some cases, code written in Mojo can run thousands of times faster than the same code in Python.

A final option is to improve the chips on which that code runs. GPUs are only accidentally good at running AI software—they were originally designed to process the fancy graphics in modern video games. In particular, says a hardware researcher at Meta, GPUs are imperfectly designed for “inference” work (ie, actually running a model once it has been trained). Some firms are therefore designing their own, more specialized hardware. its AI projects on its in-house “TPU” chips. Meta, with its MTIAs, and Amazon, with its Inferentia chips, are pursuing a similar path.

That such large performance increases can be extracted from relatively simple changes like rounding numbers or switching programming languages might seem surprising. But it reflects the breakneck speed with which LLMs have been developed. For many years they were research projects, and simply getting them to work well was more important than making them elegant. Only recently have they graduated to commercial, mass-market products. Most experts think there remains plenty of room for improvement. As Chris Manning, a computer scientist at Stanford University, put it: “There’s absolutely no reason to believe…that this is the ultimate neural architecture, and we will never find anything better.”

Catch all the technology news and Updates on Live Mint. Download Mint News App to get Daily market update Live business news,

More
less

Updated: 24 Jun 2023, 09:39 AM IST

Up Next

Beware of Pink WhatsApp scam! Mumbai Police issues advisory to protect yourself if already downloaded.. – Crypto News

Don't Miss

BGMI redeem codes for June 24, 2023: Unlock exciting rewards and free gifts including vehicle skins, emotes and weapons – Crypto News

Click to comment

Leave a Reply
Cancel reply

others1 week ago

Will Ethereum Price Rally to $3,200 as Wall Street Pivots from BTC to ETH – Crypto News

DIS Elliott Wave technical analysis [Video]

others6 days ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

others6 days ago

Skies are clearing for Delta as stock soars 13% on earnings beat – Crypto News

Cryptocurrency1 week ago

TON Foundation Confirms UAE Golden Visa Offer Is Not Official – Crypto News

Insomnia Labs Debuts Stablecoin Credit Platform for Creators

Blockchain1 week ago

Insomnia Labs Debuts Stablecoin Credit Platform for Creators – Crypto News

This Week in Crypto Games: Planetside Dev's 'Reaper Actual', What's Next for 'MapleStory Universe'

Cryptocurrency1 week ago

This Week in Crypto Games: Planetside Dev’s ‘Reaper Actual’, What’s Next for ‘MapleStory Universe’ – Crypto News

Coinbase hacker returns with $12.5 mln ETH buy: Will security concerns affect Ethereum?

Cryptocurrency1 week ago

Coinbase hacker returns with $12.5 mln ETH buy: Will security concerns affect Ethereum? – Crypto News

others1 week ago

Appropriate to have cautious gradual stance on easing – Crypto News

Tornado Cash Judge Won’t Let One Case Be Mentioned in Roman Storm’s Trial: Here’s Why

Cryptocurrency1 week ago

Tornado Cash Judge Won’t Let One Case Be Mentioned in Roman Storm’s Trial: Here’s Why – Crypto News

Blockchain7 days ago

Kraken and Backed Expand Tokenized Equities to BNB Chain – Crypto News

EUR/GBP gathers strength above 0.8550 ahead of Eurozone/UK PMI releases

others7 days ago

EUR/GBP posts modest gain above 0.8600 ahead of German inflation data – Crypto News

Ant Group Eyes USDC Integration Circle's: Report

Blockchain7 days ago

Ant Group Eyes USDC Integration Circle’s: Report – Crypto News

Bitcoin Traders Brace for Volatility Amid Crypto Market Uncertainty

Cryptocurrency6 days ago

Bitcoin Breaks New Record at $111K, What’s Fueling the $120K Price Target? – Crypto News

Technology6 days ago

XRP Eyes $3 Breakout Amid Rising BlackRock ETF Speculation – Crypto News

US State Paying $2,000,000,000 To Residents – No Strings Attached – in First-Ever 'Inflation Rebate' Program

others1 week ago

Bank Insider Admits to Nearly Decade-Long Scheme of Falsifying Loan Applications To Steal Funds: DOJ – Crypto News

Business1 week ago

Toncoin Price Drops 10% As UAE Authorities Call TON Golden Visa Offer Unofficial – Crypto News

XRP Set To Shock The Crypto Market With 30% Share: Analyst

Blockchain1 week ago

XRP Set To Shock The Crypto Market With 30% Share: Analyst – Crypto News

others1 week ago

Is a Pi Network Crash Ahead As 272M Coins Unlock in July – Crypto News

Business1 week ago

Solana ETF Launch Delayed Amid Wait for SEC’s Crypto ETF Framework – Crypto News

USD/CHF gains ground below 0.8000 ahead of US tariff deadline

others1 week ago

USD/CHF gains ground below 0.8000 ahead of US tariff deadline – Crypto News

EU Questions Robinhood About OpenAI and SpaceX Stock Tokens

Blockchain1 week ago

EU Questions Robinhood About OpenAI and SpaceX Stock Tokens – Crypto News

On thinking ahead when markets get murky

Cryptocurrency1 week ago

On thinking ahead when markets get murky – Crypto News

Technology1 week ago

Solana Meme Coin PNUT Rallies 10% Amid Elon Musk’s Statement – Crypto News

Is ETH Finally Ready to Shoot For $3K? (Ethereum Price Analysis)

Cryptocurrency1 week ago

Is ETH Finally Ready to Shoot For $3K? (Ethereum Price Analysis) – Crypto News

XRP Rally Possible If Senate Web3 Crypto Summit Goes Well

Blockchain1 week ago

XRP Rally Possible If Senate Web3 Crypto Summit Goes Well – Crypto News

USD/CAD holds positive ground above 1.4400 as Fed holds rates steady

others1 week ago

USD/CAD trades with positive bias below 1.3700; looks to FOMC minutes for fresh impetus – Crypto News

Ethereum Bulls Roar — $3K Beckons After 5% Spike

Blockchain7 days ago

Ethereum Bulls Roar — $3K Beckons After 5% Spike – Crypto News

NovaEx Launches with a Security-First Crypto Trading Platform Offering Deep Liquidity and Institutional-Grade Infrastructure

others7 days ago

NovaEx Launches with a Security-First Crypto Trading Platform Offering Deep Liquidity and Institutional-Grade Infrastructure – Crypto News

Business7 days ago

Did Ripple Really Win XRP Lawsuit Despite $125M Fine? Lawyer Fires Back at CEO – Crypto News

XRP price rises 15% to $2.24, but whale sell-off raises downside risks

Cryptocurrency7 days ago

XRP price forecast as coins surges 2.19% to $2.33 – Crypto News

Anthony Scaramucci Says $180,000 Bitcoin Price Explosion Possible As BTC 'Supremacy' Creeps Up – Here’s His Timeline

others6 days ago

Anthony Scaramucci Says $180,000 Bitcoin Price Explosion Possible As BTC ‘Supremacy’ Creeps Up – Here’s His Timeline – Crypto News

SUI Chart Pattern Confirmation Sets $3.89 Price Target

Blockchain6 days ago

SUI Chart Pattern Confirmation Sets $3.89 Price Target – Crypto News

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation

others5 days ago

EUR/GBP climbs as weak UK data fuels BoE rate cut speculation – Crypto News

Bitcoin Hits All-Time High as Crypto Legislation Votes Near

Blockchain5 days ago

Bitcoin Hits All-Time High as Crypto Legislation Votes Near – Crypto News

Business5 days ago

PENGU Rallies Over 20% Amid Coinbase’s Pudgy Penguins PFP Frenzy – Crypto News

Cryptocurrency1 week ago

This Week in Crypto Games: Planetside Dev’s ‘Reaper Actual’, What’s Next for ‘MapleStory Universe’ – Crypto News

Cardano (ADA) Turns Upward — Signs of a Recovery Emerge

Blockchain1 week ago

Cardano (ADA) Turns Upward — Signs of a Recovery Emerge – Crypto News

Macroeconomics, Market Shifts, and Trading Speed Take Center Stage at B2MEET by B2PRIME

Cryptocurrency1 week ago

Macroeconomics, Market Shifts, and Trading Speed Take Center Stage at B2MEET by B2PRIME – Crypto News

UAE Golden Visa Is ‘Being Developed Independently‘ — TON Foundation

Blockchain1 week ago

UAE Golden Visa Is ‘Being Developed Independently‘ — TON Foundation – Crypto News

others1 week ago

Nasdaq-Listed Bit Digital Converts Entire Bitcoin Holdings To Ethereum Treasury – Crypto News

Ethereum Continues Outperforming Institutional Capital Flows As Investors Pour $1,040,000,000 Into Crypto Products: CoinShares

others1 week ago

Ethereum Continues Outperforming Institutional Capital Flows As Investors Pour $1,040,000,000 Into Crypto Products: CoinShares – Crypto News

Cryptocurrency1 week ago

Elon Musk announces his ‘America Party’ will embrace Bitcoin, criticizes Trump’s fiscal bill – Crypto News

We have put a lot of emphasis on enhancing National Highways’ quality, safety: Nitin Gadkari

Technology1 week ago

Huaweis AI lab denies that one of its Pangu models copied Alibabas Qwen – Crypto News

Cryptocurrency1 week ago

XRP could rally higher on steady capital inflow; check forecast – Crypto News

Vitalik Buterin Backs Copyleft Licensing for Fairer Crypto

Blockchain1 week ago

Vitalik Buterin Backs Copyleft Licensing for Fairer Crypto – Crypto News

Bulls Hold $130B Market Cap as Settlement Buzz Heats Up

Cryptocurrency1 week ago

Bulls In Control But Resistance Persists at $2.30. What Next? – Crypto News

Technology1 week ago

GameSquare Stock Shoots 58% After Revealing $100 Million Ethereum Treasury Strategy – Crypto News

Australian Dollar remains stronger due to persistent inflation risks, FOMC Minutes eyed

others1 week ago

Australian Dollar remains stronger due to persistent inflation risks, FOMC Minutes eyed – Crypto News

US Dollar Resurgence May Be Around the Corner, According to Barclays Currency Strategist – Here’s Why: Report

others1 week ago

US Dollar Resurgence May Be Around the Corner, According to Barclays Currency Strategist – Here’s Why: Report – Crypto News

others1 week ago

Trump Jr. Backed Thumzup Media To Invest In ETH, XRP, SOL, DOGE And LTC – Crypto News

Crypto News

The bigger-is-better approach to AI is running out of road. – Crypto News

Technology

The bigger-is-better approach to AI is running out of road. – Crypto News

Quantitative tightening

Learn to code

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply