Large language models are getting bigger and better

Metaverse

Large language models are getting bigger and better – Crypto News

Published

12 months ago

April 19, 2024

Dripp

That hunger for the new has only accelerated. In March Anthropic launched Claude 3, which bested the previous top models from OpenAI and Google on various leaderboards. On April 9th OpenAI reclaimed the crown (on some measures) by tweaking its model. On April 18th Meta released Llama 3, which early results suggest is the most capable open model to date. OpenAI is likely to make a splash sometime this year when it releases GPT-5, which may have capabilities beyond any current large language model (LLM). If the rumours are to be believed, the next generation of models will be even more remarkable—able to perform multi-step tasks, for instance, rather than merely responding to prompts, or analysing complex questions carefully instead of blurting out the first algorithmically available answer.

For those who believe that this is the usual tech hype, consider this: investors are deadly serious about backing the next generation of models. GPT-5 and other next-gen models are expected to cost billions of dollars to train. OpenAI is also reportedly partnering with Microsoft, a tech giant, to build a new $100bn data centre. Based on the numbers alone, it seems as though the future will hold limitless exponential growth. This chimes with a view shared by many AI researchers called the “scaling hypothesis”, namely that the architecture of current LLMs is on the path to unlocking phenomenal progress. All that is needed to exceed human abilities, according to the hypothesis, is more data and more powerful computer chips.

Look closer at the technical frontier, however, and some daunting hurdles become evident.

Beauty’s not enough

Data may well present the most immediate bottleneck. Epoch AI, a research outfit, estimates the well of high-quality textual data on the public internet will run dry by 2026. This has left researchers scrambling for ideas. Some labs are turning to the private web, buying data from brokers and news websites. Others are turning to the internet’s vast quantities of audio and visual data, which could be used to train ever-bigger models for decades. Video can be particularly useful in teaching AI models about the physics of the world around them. If a model can observe a ball flying through the air, it might more easily work out the mathematical equation that describes the projectile’s motion. Leading models like GPT-4 and Gemini are now “multimodal”, capable of dealing with various types of data.

When data can no longer be found, it can be made. Companies like Scale AI and Surge AI have built large networks of people to generate and annotate data, including PhD researchers solving problems in maths or biology. One executive at a leading AI startup estimates this is costing AI labs hundreds of millions of dollars per year. A cheaper approach involves generating “synthetic data” in which one LLM makes billions of pages of text to train a second model. Though that method can run into trouble: models trained like this can lose past knowledge and generate uncreative responses. A more fruitful way to train AI models on synthetic data is to have them learn through collaboration or competition. Researchers call this “self-play”. In 2017 Google DeepMind, the search giant’s AI lab, developed a model called AlphaGo that, after training against itself, beat the human world champion in the game of Go. Google and other firms now use similar techniques on their latest LLMs.

Extending ideas like self-play to new domains is hot topic of research. But most real-world problems—from running a business to being a good doctor—are more complex than a game, without clear-cut winning moves. This is why, for such complex domains, data to train models is still needed from people who can differentiate between good and bad quality responses. This in turn slows things down.

More silicon, but make it fashion

Better hardware is another route to more powerful models. Graphics-processing units (GPUs), originally designed for video-gaming, have become the go-to chip for most AI programmers thanks to their ability to run intensive calculations in parallel. One way to unlock new capabilities may lie in using chips designed specifically for AI models. Cerebras, a chipmaker based in Silicon Valley, released a product in March containing 50 times as many transistors as the largest GPU. Model-building is usually hampered by data needing to be continuously loaded on and off the GPUs as the model is trained. Cerebras’s giant chip, by contrast, has memory built in.

New models that can take advantage of these advances will be more reliable and better at handling tricky requests from users. One way this may happen is through larger “context windows”, the amount of text, image or video that a user can feed into a model when making requests. Enlarging context windows to allow users to upload additional relevant information also seems to be an effective way of curbing hallucination, the tendency of AI models to confidently answer questions with made-up information.

But while some model-makers race for more resources, others see signs that the scaling hypothesis is running into trouble. Physical constraints—insufficient memory, say, or rising energy costs—place practical limitations on bigger model designs. More worrying, it is not clear that expanding context windows will be enough for continued progress. Yann LeCun, a star AI boffin now at Meta, is one of many who believe the limitations in the current AI models cannot be fixed with more of the same.

Some scientists are therefore turning to a long-standing source of inspiration in the field of AI—the human brain. The average adult can reason and plan far better than the best LLMs, despite using less power and much less data. “AI needs better learning algorithms, and we know they’re possible because your brain has them,” says Pedro Domingos, a computer scientist at the University of Washington.

One problem, he says, is the algorithm by which LLMs learn, called backpropagation. All LLMs are neural networks arranged in layers, which receive inputs and transform them to predict outputs. When the LLM is in its learning phase, it compares its predictions against the version of reality available in its training data. If these diverge, the algorithm makes small tweaks to each layer of the network to improve future predictions. That makes it computationally intensive and incremental.

The neural networks in today’s LLMs are also inefficiently structured. Since 2017 most AI models have used a type of neural-network architecture known as a transformer (the “T” in GPT), which allowed them to establish relationships between bits of data that are far apart within a data set. Previous approaches struggled to make such long-range connections. If a transformer-based model were asked to write the lyrics to a song, for example, it could, in its coda, riff on lines from many verses earlier, whereas a more primitive model would have forgotten all about the start by the time it had got to the end of the song. Transformers can also be run on many processors at once, significantly reducing the time it takes to train them.

Albert Gu, a computer scientist at Carnegie Mellon University, nevertheless thinks the transformers’ time may soon be up. Scaling up their context windows is highly computationally inefficient: as the input doubles, the amount of computation required to process it quadruples. Alongside Tri Dao of Princeton University, Dr Gu has come up with an alternative architecture called Mamba. If, by analogy, a transformer reads all of a book’s pages at once, Mamba reads them sequentially, updating its worldview as it progresses. This is not only more efficient, but also more closely approximates the way human comprehension works.

LLMs also need help getting better at reasoning and planning. Andrej Karpathy, a researcher formerly at OpenAI, explained in a recent talk that current LLMs are only capable of “system 1″ thinking. In humans, this is the automatic mode of thought involved in snap decisions. In contrast, “system 2″ thinking is slower, more conscious and involves iteration. For AI systems, that may require algorithms capable of something called search—an ability to outline and examine many different courses of action before selecting the best one. This would be similar in spirit to how game-playing AI models can choose the best moves after exploring several options.

Advanced planning via search is the focus of much current effort. Meta’s Dr LeCun, for example, is trying to program the ability to reason and make predictions directly into an AI system. In 2022 he proposed a framework called “Joint Embedding Predictive Architecture” (JEPA), which is trained to predict larger chunks of text or images in a single step than current generative-AI models. That lets it focus on global features of a data set. When analysing animal images, for example, a JEPA-based model may more quickly focus on size, shape and colour rather than individual patches of fur. The hope is that by abstracting things out JEPA learns more efficiently than generative models, which get distracted by irrelevant details.

Experiments with approaches like Mamba or JEPA remain the exception. Until data and computing power become insurmountable hurdles, transformer-based models will stay in favour. But as engineers push them into ever more complex applications, human expertise will remain essential in the labelling of data. This could mean slower progress than before. For a new generation of AI models to stun the world as ChatGPT did in 2022, fundamental breakthroughs may be needed.

From The Economist, published under licence. The original content can be found on www.economist.com

Up Next

Meta AI chatbot claims parenthood of ‘Gifted Child’ in NYC program: Cause for concern? – Crypto News

Don't Miss

AI meets beauty! Influencers go head-to-head in Miss AI Pageant for $20,000 prize: Know judges and criteria – Crypto News

Click to comment

Leave a Reply
Cancel reply

Blockchain7 days ago

The CFO and Treasurer’s Guide to Digital Assets – Crypto News

Famous Crypto Analyst Advises to Sell NVIDIA Stock: Here's Why

Cryptocurrency1 week ago

Famous Crypto Analyst Advises to Sell NVIDIA Stock: Here’s Why – Crypto News

Binance Enables Apple & Google Pay Features With This Latest Partnership

Business1 week ago

Binance Enables Apple & Google Pay Features With This Latest Partnership – Crypto News

Tariffs Are Just the Tip of the Iceberg, Warns Billionaire Investor Ray Dalio

Cryptocurrency1 week ago

Tariffs Are Just the Tip of the Iceberg, Warns Billionaire Investor Ray Dalio – Crypto News

BitMEX Study Reveals Exchange-Specific Price Trends for Perpetual Swaps Across Leading Exchanges

Cryptocurrency1 week ago

BitMEX Study Reveals Exchange-Specific Price Trends for Perpetual Swaps Across Leading Exchanges – Crypto News

Apple could give iPhone a radical makeover for its 20th anniversary, report says

Technology1 week ago

Apple could give iPhone a radical makeover for its 20th anniversary, report says – Crypto News

3 reasons Why Dogecoin Price Risks 60% Crash to $0.06

Business1 week ago

Will Dogecoin Price Ever Reach $1? Top Analysts Weigh In – Crypto News

Dire Wolf Solana Meme Coin Soars to $13.6M Market Cap After ‘De-Extinction’

Cryptocurrency1 week ago

Dire Wolf Solana Meme Coin Soars to $13.6M Market Cap After ‘De-Extinction’ – Crypto News

Apple exported iPhones worth ₹1.5 trillion from India in FY25: Union Minister Ashwini Vaishnaw

Technology1 week ago

Apple exported iPhones worth ₹1.5 trillion from India in FY25: Union Minister Ashwini Vaishnaw – Crypto News

John Deaton Highlights Ripple’s Role In XRP ETF’s Acknowledgement

others1 week ago

John Deaton Highlights Ripple’s Journey from Legal Struggle To ETF Launches – Crypto News

Solana Price: Can It Take The Baton And Initiate The Next Altcoin Rally As The Market Strengthens?

Technology1 week ago

Can It Take The Baton And Initiate The Next Altcoin Rally As The Market Strengthens? – Crypto News

The Downside Prevails As Cardano Price Rejected at $0.60

Cryptocurrency1 week ago

The Downside Prevails As Cardano Price Rejected at $0.60 – Crypto News

Dogecoin hits multi-month low, but is a market reset on the way?

Cryptocurrency1 week ago

Dogecoin hits multi-month low, but is a market reset on the way? – Crypto News

36% of Indian enterprises started budgeting for Gen AI: E&Y report

Technology1 week ago

Musks DOGE using AI to snoop on U.S. federal workers, sources say – Crypto News

Bitcoin Ethereum XRP's Potential Correction

Cryptocurrency1 week ago

ETH Hits 2-Year Low as BTC, XRP Hold Support – Crypto News

Peter Schiff Cautions US Against Trade War Escalation With China

Cryptocurrency1 week ago

Peter Schiff Cautions US Against Trade War Escalation With China – Crypto News

How to mine Bitcoin at home in 2025: A realistic guide

Blockchain6 days ago

How to mine Bitcoin at home in 2025: A realistic guide – Crypto News

iPad Air M3 (2025) Review: Still the most practical iPad

Technology1 week ago

iPad Air M3 (2025) Review: Still the most practical iPad – Crypto News

Cathie Wood’s Ark Invest Loads $13 Million of Coinbase Stock, COIN Price Reversal Soon?

Business1 week ago

Cathie Wood’s Ark Invest Loads $13 Million of Coinbase Stock, COIN Price Reversal Soon? – Crypto News

Crypto Scam: Australia Shuts Over 90 Companies Linked To Pig Butchering Schemes

others1 week ago

Australia Shuts Over 90 Companies Linked To Pig Butchering Schemes – Crypto News

“Perfect Time to Buy” - Patterns Point to a Pepe Coin Price Resurgence

Business1 week ago

“Perfect Time to Buy” – Patterns Point to a Pepe Coin Price Resurgence – Crypto News

Bitcoin is highly correlated with stock market since August 2024

Cryptocurrency1 week ago

Bitcoin is highly correlated with stock market since August 2024 – Crypto News

Can SUI Price Hit $7.2 Amid Canary Capital's SUI ETF Filing?

Business1 week ago

Sui Price Recovers As CBOE Files To List SUI ETF – Crypto News

Technology6 days ago

Microsoft’s Greatest Hits and Epic Fails: A 50-Year Wild Ride – Crypto News

Cardano (ADA) Eyes Resistance Break—Failure Could Spark Fresh Losses

Blockchain1 week ago

Cardano (ADA) Eyes Resistance Break—Failure Could Spark Fresh Losses – Crypto News

PumpFun Livestream Feature Is Back — But What’s Changed?

Technology1 week ago

PumpFun Livestream Feature Is Back — But What’s Changed? – Crypto News

XRP Price Prediction: Is Ripple Hinting at Cardano Partnership?

Business1 week ago

Is Ripple Hinting at Cardano Partnership? – Crypto News

Cathie Wood’s ARK bags $26M in Coinbase shares, unloads Bitcoin ETF

Blockchain1 week ago

Cathie Wood’s ARK bags $26M in Coinbase shares, unloads Bitcoin ETF – Crypto News

China Retaliates, Triggering a Dead Cat Bounce in Crypto

Technology1 week ago

China Retaliates, Triggering a Dead Cat Bounce in Crypto – Crypto News

Solana Unveils Confidential Balances Token Extension, Here's Everything to Know

Business1 week ago

Solana Unveils Confidential Balances Token Extension – Crypto News

Did XRP Price Just Hit $21K? Live TV Display Error Goes Viral

others1 week ago

Top 3 Reasons XRP Price May Surge as Analyst Delivers a $693 Billion Prediction – Crypto News

BTC Risks Further Downside if it Fails to Reclaim This Resistance

Cryptocurrency1 week ago

BTC Risks Further Downside if it Fails to Reclaim This Resistance – Crypto News

ChatGPT Maker OpenAI Inks $12B Deal With CoreWeave Ahead of Planned IPO

Cryptocurrency1 week ago

OpenAI Countersues Elon Musk, Accuses Billionaire of ‘Bad-Faith Tactics’ – Crypto News

BTC, ETH, XRP, BNB, SOL, DOGE, ADA, LEO, LINK, AVAX

Blockchain6 days ago

BTC, ETH, XRP, BNB, SOL, DOGE, ADA, LEO, LINK, AVAX – Crypto News

Over 120M DOGE bought in a week, signaling strong bullish sentiment and potential for a breakout above $0.18.

Technology6 days ago

Dogecoin Price Gearing for A 3X Rally Amid DOGE Whale Accumulation – Crypto News

Binance Excludes Pi Network In Vote To List Initiative, Here's How

others5 days ago

Binance Issues Important Update On 10 Crypto, Here’s All – Crypto News

WTI holds gains above $70.00 due to rising concerns over Russia-Ukraine peace deal

others1 week ago

WTI price mostly unchanged at European opening – Crypto News

Technical Indicator Suggesting Bitcoin (BTC) Bull Market Hasn’t Started Yet: Quant Analyst PlanB

others1 week ago

Technical Indicator Suggesting Bitcoin (BTC) Bull Market Hasn’t Started Yet: Quant Analyst PlanB – Crypto News

Gold surges with traders fleeing markets as tariff war erupts

others1 week ago

Gold price under pressure despite high risk aversion – Commerzbank – Crypto News

Analyst Forecasts 550% Rally for Shiba Inu Price As Bullish Triangle Pattern Emerges

Technology1 week ago

Shiba Inu Price Risks 50% Crash As Bearish Breakout Looms – Crypto News

Web3 active developers drop nearly 40% in one year

Blockchain1 week ago

Web3 active developers drop nearly 40% in one year – Crypto News

XRP Down, But History Says Millionaires Were Made This Way

Blockchain1 week ago

XRP Down, But History Says Millionaires Were Made This Way – Crypto News

Economist Alex Krüger Warns US Stocks Could Repeat 2008 Bear Market Amid Trump’s Trade War

others1 week ago

Economist Alex Krüger Warns US Stocks Could Repeat 2008 Bear Market Amid Trump’s Trade War – Crypto News

XRP Leveraged ETF Outshines Solana At Launch, But There's a Twist

Technology1 week ago

XRP Leveraged ETF Outshines Solana At Launch – Crypto News

Stablecoin infrastructure platform M^0 expands to Solana

Cryptocurrency1 week ago

Stablecoin infrastructure platform M^0 expands to Solana – Crypto News

Blockchain1 week ago

Investors Looking To Buy Bitcoin? – Crypto News

Galaxy’s imminent US listing reflects SEC change

Cryptocurrency1 week ago

Galaxy’s imminent US listing reflects SEC change – Crypto News

Crypto Products See $240,000,000 in Outflows Likely in Response to US Tariff Threats: CoinShares

others1 week ago

Crypto Products See $240,000,000 in Outflows Likely in Response to US Tariff Threats: CoinShares – Crypto News

NY attorney general urges Congress to keep pensions crypto-free — ‘No intrinsic value’

Blockchain7 days ago

NY attorney general urges Congress to keep pensions crypto-free — ‘No intrinsic value’ – Crypto News

iQOO Z10 5G, Z10x 5G launched in India, price starts at ₹13,499. Check full price, specs and more

Technology7 days ago

iQOO Z10 5G, Z10x 5G launched in India, price starts at ₹13,499. Check full price, specs and more – Crypto News

Crypto News

Large language models are getting bigger and better – Crypto News

Metaverse

Large language models are getting bigger and better – Crypto News

Beauty’s not enough

More silicon, but make it fashion

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply