Metaverse

How to train your large language model – Crypto News

Published

1 year ago

May 13, 2024

Dripp

It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.

But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.

One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.

RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.

This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.

It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.

This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.

According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.

Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.

From The Economist, published under licence. The original content can be found on www.economist.com

Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.

Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!

Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint.
Download The Mint News App to get Daily Market Updates.

More
Less

Published: 13 May 2024, 07:00 PM IST

Up Next

OpenAI unveils GPT-4o, a faster, more efficient and smarter AI platform: All you need to know – Crypto News

Don't Miss

OpenAI ‘Spring Updates’ Event sparks speculation: GPT-5 launch or Google Search Revival? – Crypto News

Click to comment

Leave a Reply
Cancel reply

Cryptocurrency1 week ago

Robinhood Lists HYPE As Hyperliquid Flips Aster, Lighter In Perp DEX Volume – Crypto News

Tech layoffs: From Meta, Amazon to Google — these IT majors have cut AI related jobs

Metaverse1 week ago

Tech layoffs: From Meta, Amazon to Google — these IT majors have cut AI related jobs – Crypto News

Cryptocurrency1 week ago

XRP News: Ripple Unveils ‘Ripple Prime’ After Closing $1.25B Hidden Road Deal – Crypto News

Africa Countries Pass Crypto Laws to Attract Industry

Blockchain1 week ago

Africa Countries Pass Crypto Laws to Attract Industry – Crypto News

Business1 week ago

Peter Schiff Challenges Binance Founder CZ to Debate as Bitcoin Vs. Gold Rivalry Heats Up – Crypto News

Aster Rallies on ‘Rocket Launch’ Incentives Campaign

De-fi1 week ago

Aster Rallies on ‘Rocket Launch’ Incentives Campaign – Crypto News

Trump plans to pick Michael Selig to lead CFTC: Report

Cryptocurrency1 week ago

Trump plans to pick Michael Selig to lead CFTC: Report – Crypto News

Rising Bitcoin activity hints at market bottom, potential reversal

Blockchain1 week ago

ISM Data Hints Bitcoin Cycle Could Last Longer Than Usual – Crypto News

others7 days ago

JPY soft and underperforming G10 in quiet trade – Scotiabank – Crypto News

Nearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets

De-fi7 days ago

Nearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News

'It just freezes': Spotify users fume over app crashes on Android devices, company responds

Technology1 week ago

‘It just freezes’: Spotify users fume over app crashes on Android devices, company responds – Crypto News

DOGE to $0.33 in Sight? Dogecoin Must Defend This Key Level First

Cryptocurrency1 week ago

DOGE to $0.33 in Sight? Dogecoin Must Defend This Key Level First – Crypto News

What next for Avantis price after the 73% recovery?

Cryptocurrency1 week ago

What next for Avantis price after the 73% recovery? – Crypto News

Nothing OS 4.0 Beta introduces pre-installed apps to Phone (3a) series: Co-founder Akis Evangelidis explains the update

Technology1 week ago

Nothing OS 4.0 Beta introduces pre-installed apps to Phone (3a) series: Co-founder Akis Evangelidis explains the update – Crypto News

Ethereum Rebounds From Bull Market Support: Can It Conquer The ‘Golden Pocket’ Next?

Blockchain7 days ago

Ethereum Rebounds From Bull Market Support: Can It Conquer The ‘Golden Pocket’ Next? – Crypto News

XRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout

Blockchain6 days ago

XRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout – Crypto News

Sam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own

Technology4 days ago

Sam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own – Crypto News

Uniswap Foundation (UNI) awards Brevis $9M grant to accelerate V4 adoption

Technology1 week ago

Uniswap Foundation (UNI) awards Brevis $9M grant to accelerate V4 adoption – Crypto News

From Studio smoke to golden hour: How to create stunning AI portraits with Google Gemini - 16 viral prompts

Technology1 week ago

From Studio smoke to golden hour: How to create stunning AI portraits with Google Gemini – 16 viral prompts – Crypto News

Binance Stablecoin Outflow On A Steady Rise — What This Means For The Market

Blockchain1 week ago

Binance Stablecoin Outflow On A Steady Rise — What This Means For The Market – Crypto News

HYPE Jumps 10% as Robinhood Announces Spot Listing

De-fi7 days ago

HYPE Jumps 10% as Robinhood Announces Spot Listing – Crypto News

others7 days ago

Platinum price recovers from setback – Commerzbank – Crypto News

Western Union eyes stablecoin rails in pursuit of a ‘super app’ vision

Cryptocurrency6 days ago

Western Union eyes stablecoin rails in pursuit of a ‘super app’ vision – Crypto News

Bezos fund believes AI can save the planet. Nvidia, Google are all-in.

Metaverse1 week ago

Bezos fund believes AI can save the planet. Nvidia, Google are all-in. – Crypto News

Crypto update: Bitcoin and Ethereum are stable as market

Cryptocurrency1 week ago

Crypto update: Bitcoin and Ethereum are stable as market’s focus shifts to US inflation data – Crypto News

Solana DEX Meteora Launches Native MET Token

De-fi1 week ago

Solana DEX Meteora Launches Native MET Token – Crypto News

Centre notifies amended IT rules to enhance transparency, accountability in content removal by intermediaries

Technology1 week ago

Google and Apple face extra UK scrutiny over strategic role in mobile platforms – Crypto News

Business1 week ago

White House Crypto Czar Backs Michael Selig as ‘Excellent Choice’ To Lead CFTC – Crypto News

others1 week ago

JPY weak and underperforming – Scotiabank – Crypto News

Business1 week ago

Breaking: Trump To Meet China’s President On October 30, Bitcoin Bounces – Crypto News

USDJPY Forecast: The Dollar's Winning Streak Why New Highs Could Be At Hand

Cryptocurrency6 days ago

USDJPY Forecast: The Dollar’s Winning Streak Why New Highs Could Be At Hand – Crypto News

Ledger Nano Gen5 feels like Flex for less

Cryptocurrency1 week ago

Ledger Nano Gen5 feels like Flex for less – Crypto News

Fetch.ai and Ocean Protocol move toward resolving $120M FET dispute

Cryptocurrency1 week ago

Fetch.ai and Ocean Protocol move toward resolving $120M FET dispute – Crypto News

Gemini in Gmail automates meeting schedules effortlessly

Metaverse1 week ago

Gemini in Gmail automates meeting schedules effortlessly – Crypto News

Entire Startup Lifecycle to Move Onchain

Blockchain7 days ago

Entire Startup Lifecycle to Move Onchain – Crypto News

NEAR’s inflation reduction vote fails pass threshold, but it may still be implemented

Cryptocurrency7 days ago

NEAR’s inflation reduction vote fails pass threshold, but it may still be implemented – Crypto News

XRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling

Blockchain6 days ago

XRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling – Crypto News

others6 days ago

Indian Court Declares XRP as Property in WazirX Hack Case – Crypto News

Survival instinct? New study says some leading AI models won’t let themselves be shut down

Technology6 days ago

Survival instinct? New study says some leading AI models won’t let themselves be shut down – Crypto News

others6 days ago

Is Changpeng “CZ” Zhao Returning To Binance? Probably Not – Crypto News

Solana (SOL) sets new milestone as tokenized assets value hits $671M all-time high

Technology1 week ago

Solana’s RWA market surpasses $700M all-time high as adoption accelerates – Crypto News

Cryptocurrency1 week ago

Jito’s JTO token rises on a16z’s $50 million investment in Solana staking protocol – Crypto News

Technology1 week ago

Dogecoin Price Crash Looms as Flag, Death Cross, Falling DOGE ETF Inflows Coincide – Crypto News

Bitcoin Whale From 2009 Moves Coins After 14 Years Asleep

Blockchain1 week ago

Bitcoin Whale From 2009 Moves Coins After 14 Years Asleep – Crypto News

OpenAI announces major Sora update: Editing, trending cameos, and Android launch on the way

Technology1 week ago

OpenAI announces major Sora update: Editing, trending cameos, and Android launch on the way – Crypto News

Business1 week ago

HBAR Price Targets 50% Jump as Hedera Unleashes Massive Staking Move – Crypto News

Business1 week ago

PEPE Coin Price Prediction as Weekly Outflows Hit $17M – Is Rebound Ahead? – Crypto News

HYPE Breaks Out After Robinhood Listing and S-1 Filing: What’s Next?

Cryptocurrency1 week ago

HYPE Breaks Out After Robinhood Listing and S-1 Filing: What’s Next? – Crypto News

Technology1 week ago

Analyst Eyes Key Support Retest Before a Rebound for Ethereum Price Amid $93M ETF Outflows and BlackRock Dump – Crypto News

Business1 week ago

Ripple Explores New XRP Use Cases as Brad Garlinghouse Reaffirms Token’s ‘Central’ Role – Crypto News

Crypto News

How to train your large language model – Crypto News

Metaverse

How to train your large language model – Crypto News

You may like

Leave a Reply Cancel reply

Leave a Reply

Trending

Leave a Reply
Cancel reply