Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Cryptocurrency1 week ago
Robinhood Lists HYPE As Hyperliquid Flips Aster, Lighter In Perp DEX Volume – Crypto News
-
Metaverse1 week agoTech layoffs: From Meta, Amazon to Google — these IT majors have cut AI related jobs – Crypto News
-
Cryptocurrency1 week ago
XRP News: Ripple Unveils ‘Ripple Prime’ After Closing $1.25B Hidden Road Deal – Crypto News
-
Blockchain1 week agoAfrica Countries Pass Crypto Laws to Attract Industry – Crypto News
-
Business1 week ago
Peter Schiff Challenges Binance Founder CZ to Debate as Bitcoin Vs. Gold Rivalry Heats Up – Crypto News
-
De-fi1 week agoAster Rallies on ‘Rocket Launch’ Incentives Campaign – Crypto News
-
Cryptocurrency1 week agoTrump plans to pick Michael Selig to lead CFTC: Report – Crypto News
-
Blockchain1 week agoISM Data Hints Bitcoin Cycle Could Last Longer Than Usual – Crypto News
-
others7 days ago
JPY soft and underperforming G10 in quiet trade – Scotiabank – Crypto News
-
De-fi7 days agoNearly Half of US Retail Crypto Holders Haven’t Earned Yield: MoreMarkets – Crypto News
-
Technology1 week ago‘It just freezes’: Spotify users fume over app crashes on Android devices, company responds – Crypto News
-
Cryptocurrency1 week agoDOGE to $0.33 in Sight? Dogecoin Must Defend This Key Level First – Crypto News
-
Cryptocurrency1 week agoWhat next for Avantis price after the 73% recovery? – Crypto News
-
Technology1 week agoNothing OS 4.0 Beta introduces pre-installed apps to Phone (3a) series: Co-founder Akis Evangelidis explains the update – Crypto News
-
Blockchain7 days agoEthereum Rebounds From Bull Market Support: Can It Conquer The ‘Golden Pocket’ Next? – Crypto News
-
Blockchain6 days agoXRP Price Gains Traction — Buyers Pile In Ahead Of Key Technical Breakout – Crypto News
-
Technology4 days agoSam Altman says OpenAI is developing a ‘legitimate AI researcher’ by 2028 that can discover new science on its own – Crypto News
-
Technology1 week agoUniswap Foundation (UNI) awards Brevis $9M grant to accelerate V4 adoption – Crypto News
-
Technology1 week agoFrom Studio smoke to golden hour: How to create stunning AI portraits with Google Gemini – 16 viral prompts – Crypto News
-
Blockchain1 week agoBinance Stablecoin Outflow On A Steady Rise — What This Means For The Market – Crypto News
-
De-fi7 days agoHYPE Jumps 10% as Robinhood Announces Spot Listing – Crypto News
-
others7 days ago
Platinum price recovers from setback – Commerzbank – Crypto News
-
Cryptocurrency6 days agoWestern Union eyes stablecoin rails in pursuit of a ‘super app’ vision – Crypto News
-
Metaverse1 week agoBezos fund believes AI can save the planet. Nvidia, Google are all-in. – Crypto News
-
Cryptocurrency1 week agoCrypto update: Bitcoin and Ethereum are stable as market’s focus shifts to US inflation data – Crypto News
-
De-fi1 week agoSolana DEX Meteora Launches Native MET Token – Crypto News
-
Technology1 week agoGoogle and Apple face extra UK scrutiny over strategic role in mobile platforms – Crypto News
-
Business1 week ago
White House Crypto Czar Backs Michael Selig as ‘Excellent Choice’ To Lead CFTC – Crypto News
-
others1 week ago
JPY weak and underperforming – Scotiabank – Crypto News
-
Business1 week ago
Breaking: Trump To Meet China’s President On October 30, Bitcoin Bounces – Crypto News
-
Cryptocurrency6 days agoUSDJPY Forecast: The Dollar’s Winning Streak Why New Highs Could Be At Hand – Crypto News
-
Cryptocurrency1 week agoLedger Nano Gen5 feels like Flex for less – Crypto News
-
Cryptocurrency1 week agoFetch.ai and Ocean Protocol move toward resolving $120M FET dispute – Crypto News
-
Metaverse1 week agoGemini in Gmail automates meeting schedules effortlessly – Crypto News
-
Blockchain7 days agoEntire Startup Lifecycle to Move Onchain – Crypto News
-
Cryptocurrency7 days agoNEAR’s inflation reduction vote fails pass threshold, but it may still be implemented – Crypto News
-
Blockchain6 days agoXRP/BTC Retests 6-Year Breakout Trendline, Analyst Calls For Decoupling – Crypto News
-
others6 days ago
Indian Court Declares XRP as Property in WazirX Hack Case – Crypto News
-
Technology6 days agoSurvival instinct? New study says some leading AI models won’t let themselves be shut down – Crypto News
-
others6 days ago
Is Changpeng “CZ” Zhao Returning To Binance? Probably Not – Crypto News
-
Technology1 week agoSolana’s RWA market surpasses $700M all-time high as adoption accelerates – Crypto News
-
Cryptocurrency1 week agoJito’s JTO token rises on a16z’s $50 million investment in Solana staking protocol – Crypto News
-
Technology1 week ago
Dogecoin Price Crash Looms as Flag, Death Cross, Falling DOGE ETF Inflows Coincide – Crypto News
-
Blockchain1 week agoBitcoin Whale From 2009 Moves Coins After 14 Years Asleep – Crypto News
-
Technology1 week agoOpenAI announces major Sora update: Editing, trending cameos, and Android launch on the way – Crypto News
-
Business1 week ago
HBAR Price Targets 50% Jump as Hedera Unleashes Massive Staking Move – Crypto News
-
Business1 week ago
PEPE Coin Price Prediction as Weekly Outflows Hit $17M – Is Rebound Ahead? – Crypto News
-
Cryptocurrency1 week agoHYPE Breaks Out After Robinhood Listing and S-1 Filing: What’s Next? – Crypto News
-
Technology1 week ago
Analyst Eyes Key Support Retest Before a Rebound for Ethereum Price Amid $93M ETF Outflows and BlackRock Dump – Crypto News
-
Business1 week ago
Ripple Explores New XRP Use Cases as Brad Garlinghouse Reaffirms Token’s ‘Central’ Role – Crypto News
