

Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Blockchain7 days ago
On-Chain Tokenization for Payments Professionals – Crypto News
-
Cryptocurrency1 week ago
Copper and P2P.org announce strategic collaboration to elevate institutional staking solutions – Crypto News
-
others1 week ago
Will Pi Network Price Crash or Rally After 212M Unlocks? – Crypto News
-
others1 week ago
Billionaire Barry Silbert Says This Is the Next Big Investment Theme for Crypto Assets – Crypto News
-
Cryptocurrency1 week ago
Breaking: Canary Capital Files For Staked Tron ETF – Crypto News
-
others1 week ago
Interoperability Protocol Asset Surges After a16z Acquires $55,000,000 Worth of the Project’s Native Asset – Crypto News
-
others6 days ago
GBP/USD retreats from YTD high past 1.34 on Fed turmoil – Crypto News
-
others5 days ago
Cantor Partners With Tether, SoftBank, Bitfinex For $3 Billion Bitcoin Bet – Crypto News
-
others1 week ago
Macro Guru Lyn Alden Predicts ‘Pretty Good Performance’ for Bitcoin Over the Coming Months – But There’s a Catch – Crypto News
-
Cryptocurrency1 week ago
Dow Jones Index Hangs On TSMC Earnings to Resist Slip. Will It Hold? – Crypto News
-
Technology1 week ago
Researchers Unveil 3D Tech That Lets Users ‘Touch’ Virtual Items – Crypto News
-
Business1 week ago
Coinbase Faces Renewed Legal Battle as Oregon Revives Old SEC Playbook – Crypto News
-
Technology1 week ago
Bitcoin Price Analysis: Two Weeks After Trump’s Tariffs, BTC Outperforms S&P 500 by 50% – Crypto News
-
Cryptocurrency1 week ago
Solana founders ‘returning to cypherpunk roots’: Colosseum’s Taylor – Crypto News
-
others1 week ago
Binance Reveals Major Update For Indian Users: Details – Crypto News
-
Blockchain1 week ago
Crypto, DeFi may widen wealth gap, destabilize finance: BIS report – Crypto News
-
Blockchain1 week ago
Cardano Whales Offload 180 Million ADA In 5 Days – Smart Profit-Taking? – Crypto News
-
Blockchain1 week ago
Altcoin unit bias ‘absolutely destroying’ crypto newbies — Samson Mow – Crypto News
-
Cryptocurrency1 week ago
Cardano Price Safeguards $0.600 Support, Upside Momentum Weakens – Crypto News
-
Cryptocurrency1 week ago
XRP bulls eye $2.60 – One move can trigger a major squeeze – Crypto News
-
Business1 week ago
Expert Says Solana Price To $2,000 Is Within Reach, Here’s How – Crypto News
-
Cryptocurrency1 week ago
Bybit CEO: Two-Thirds of Funds From $1.4B Lazarus Group Hack Still Traceable – Crypto News
-
others5 days ago
Australian Dollar receives support as private sector activity expands in April – Crypto News
-
others5 days ago
Tests 99.00 support after pulling back from nine-day EMA – Crypto News
-
Cryptocurrency5 days ago
Dogecoin ETF? 21Shares files as crypto market sees 12% rally – Crypto News
-
Blockchain1 week ago
Ethereum Price Stalls In Tight Range – Big Price Move Incoming? – Crypto News
-
Blockchain1 week ago
Ethereum Price Stalls In Tight Range – Big Price Move Incoming? – Crypto News
-
Cryptocurrency1 week ago
Altcoins struggle while stablecoins shine: Is this the new normal? – Crypto News
-
Blockchain1 week ago
XRP to $50? Technical Analyst Lays Out the Roadmap – Crypto News
-
others1 week ago
SEC Greenlights New VanEck ‘Onchain Economy ETF’ That Holds Stocks Tied to the Digital Asset Sector – Crypto News
-
Blockchain1 week ago
Firing Jerome Powell will crash financial markets — Sen. Elizabeth Warren – Crypto News
-
Business1 week ago
Lorenzo Protocol (BANK) Price Rallies 150% After This Binance Announcement – Crypto News
-
Cryptocurrency1 week ago
Elizabeth Warren Warns Stock Market Will Crash If Trump Fires Powell, Will Crypto Market Crash Too? – Crypto News
-
Business1 week ago
XRP, Bitcoin, Ethereum Price Prediction: $100M Shorts Wiped as Crypto Market Bounces 2.89% – Crypto News
-
Technology7 days ago
Peter Schiff Predicts Gold Will Soar As Fed Cuts Rates, Will Bitcoin Price Follow? – Crypto News
-
Technology6 days ago
European Central Bank Claims Trump’s Crypto Push to Impact Europe Economy – Crypto News
-
Technology5 days ago
AWS, Microsoft Slow Down Data Center Deployments – Crypto News
-
Technology5 days ago
HashKey, Bosera partner to launch world’s first tokenized money market ETFs – Crypto News
-
Blockchain4 days ago
Strike’s Mallers to head firm seeking superior Bitcoin play to MSTR – Crypto News
-
Technology1 week ago
Special discounts for Amazon Prime members only! Up to 75% off on vacuum cleaners, water purifier, and more – Crypto News
-
others1 week ago
GBP/USD remains on track to post weekly gains – Crypto News
-
Technology1 week ago
XRP Price Holds $2.08 Amid Derivatives Surge and Liquidation Risks – Crypto News
-
Technology1 week ago
OpenAI’s latest AI models are smarter, but they make things up more often. Here’s what we know – Crypto News
-
Blockchain1 week ago
Trump firing Powell would be a ‘very bad precedent to set’ — Pompliano – Crypto News
-
others1 week ago
Italy Trade Balance EU up to €-0.361B in February from previous €-0.635B – Crypto News
-
Blockchain1 week ago
Bitcoin Ready For $90K? ‘Next Big Move’ Could Come Next Week – Crypto News
-
Cryptocurrency1 week ago
Canary Capital Seeks SEC Approval for Tron ETF With Staking – Crypto News
-
others1 week ago
Peter Brandt Predicts 50% Crash For ‘Everyone’s Favorite’ XRP Price – Crypto News
-
others1 week ago
On-Chain Metrics Suggest Bitcoin (BTC) Could Be Approaching Early Bear Market Phase: Glassnode – Crypto News
-
others1 week ago
Controversial Exchange eXch To Shutter in May Amid Allegations the Project Laundered Crypto Stolen in Bybit Hack – Crypto News