

Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Blockchain1 week ago
Ripple and Ctrl Alt Team to Support Real Estate Tokenization – Crypto News
-
others1 week ago
EUR/USD recovers with trade talks and Fed independence in focus – Crypto News
-
Technology1 week ago
Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News
-
Business1 week ago
Pepe Coin Rich List June 2025: Who’s Holding Highest PEPE as it Nears Half a Million Holders? – Crypto News
-
Cryptocurrency1 week ago
It’s a Statement, Says Bitfinex Alpha – Crypto News
-
Metaverse1 week ago
Why voice is emerging as India’s next frontier for AI interaction – Crypto News
-
Technology1 week ago
Fed Rate Cut Odds Surge As Powell’s Future Hangs In The Balance – Crypto News
-
Business1 week ago
XLM Is More Bullish Than ETH, SOL, And XRP, Peter Brandt Declares – Crypto News
-
Cryptocurrency1 week ago
Anarchy, crime and stablecoins – Blockworks – Crypto News
-
others1 week ago
Top Crypto Exchange by Trading Volume Binance Announces Airdrop for New Ethereum (ETH) Ecosystem Altcoin – Crypto News
-
Cryptocurrency1 week ago
Bitcoin trades near $119K after new all-time high; Coinbase rebrands wallet to ‘Base App’ – Crypto News
-
Technology6 days ago
“Decentralized Ponzi Scheme”- Gold Bug Peter Schiff Slams Landmark Crypto Bills – Crypto News
-
Cryptocurrency1 week ago
1inch price forecast: 1INCH hits 7-month high after double digit gains – Crypto News
-
Cryptocurrency1 week ago
1inch price forecast: 1INCH hits 7-month high after double digit gains – Crypto News
-
others1 week ago
VanEck Details Key Drivers Boosting Bitcoin Price, Including Corporate Treasury Demand, ETF Flows and More – Crypto News
-
Business1 week ago
XRP Lawsuit Update: Ripple Paid $125M in Cash, Settlement Hinges on Appeal – Crypto News
-
De-fi7 days ago
BNB Chain Teases New Blockchain with Privacy Features to Compete With Crypto Exchanges – Crypto News
-
Technology6 days ago
Breaking: GENIUS Act Becomes First Major Crypto Legislation as Trump Signs Bill – Crypto News
-
Cryptocurrency6 days ago
California Sheriffs Believe 74-Year-Old’s Disappearance Linked to Son’s Crypto Fortune – Crypto News
-
Cryptocurrency5 days ago
Shytoshi Kusama Breaks Silence on New SHIB AI Whitepaper and Transformed Future – Crypto News
-
Cryptocurrency5 days ago
Altseason heats up, but Bitcoin could face short-term pullback – How? – Crypto News
-
Cryptocurrency4 days ago
XRP Price Hits All-Time High at $3.66 — Can It Smash Through $4 After Trump Win & SEC Shake-Up? – Crypto News
-
Blockchain4 days ago
Institutional Demand Surges As Ethereum Sets New Inflow Records – Crypto News
-
Technology1 week ago
XLM Price Forecast: Why Stellar Lumens May Crash After 80% Rally in Last 7 Days – Crypto News
-
Cryptocurrency1 week ago
Hypercharged Exposure to XRP and Solana Now Available With These Two ETFs – Crypto News
-
Blockchain1 week ago
BitMine Shares Rallied After Peter Thiel Investment. – Crypto News
-
others1 week ago
Scammer Drains $10,000,000 From IRS in International Tax Fraud and Identity Theft Scheme: DOJ – Crypto News
-
Business1 week ago
XRP ETF Still in Play- SEC Commissioner Debunks Rejection Speculations – Crypto News
-
Business1 week ago
Ethereum Price Prediction- Bulls Target $3,700 As ETH Treasury Accumulation Soars – Crypto News
-
others1 week ago
GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News
-
others1 week ago
GBP/USD rallies on US PPI dip and Trump’s potential Powell removal – Crypto News
-
Cryptocurrency1 week ago
Russia’s $85 Billion Sberbank to Launch Crypto Custody Services – Crypto News
-
De-fi1 week ago
U.S. House Passes Clarity, GENIUS, and Anti-CBDC Acts With Historic Bipartisan Support for Crypto – Crypto News
-
Cryptocurrency7 days ago
XRP Price Spikes to Record Highs As Momentum Signals Extended Gains – Crypto News
-
Blockchain7 days ago
Why Bitcoin self-custody is declining in the ETF era – Crypto News
-
Cryptocurrency6 days ago
US House passes three key crypto bills; market reaction muted as Bitcoin dips – Crypto News
-
De-fi6 days ago
Crypto Market Cap Hits $4 Trillion Milestone as US House Passes Landmark Bills – Crypto News
-
others5 days ago
Why Is The Crypto Market Rising Today? – Crypto News
-
Cryptocurrency5 days ago
Sanctum acquires Ironforge, plots transaction infrastructure vertical – Crypto News
-
De-fi1 week ago
DeFi TVL Surges Past $126B, Up Over 45% Since April – Crypto News
-
Cryptocurrency1 week ago
Exclusive: Bitwise to roll out daily attestations for bitcoin ETFs – Crypto News
-
Metaverse1 week ago
Broadcom challenges Nvidia’s AI dominance with ultra-connected Tomahawk networking chip launch – Crypto News
-
De-fi1 week ago
Solana RWA Growth Outpaces Ethereum in 2025 – Crypto News
-
Cryptocurrency1 week ago
Nothing Burger or Crypto Catalyst? – Crypto News
-
Metaverse1 week ago
Nvidia’s Jensen Huang says AI ‘fundamental like electricity’, praises Chinese models as ‘catalyst for global progress’ – Crypto News
-
Business1 week ago
$800 Billion JPMorgan To Rival Tether, Circle, and Ripple In Stablecoin Race – Crypto News
-
Cryptocurrency1 week ago
300% Bitcoin (BTC) Skyrocketing, Ethereum (ETH) Below $3,000 Again, Was Shiba Inu (SHIB) Dump Controlled? – Crypto News
-
De-fi1 week ago
TAC Token Rallies 50% as TON Application Chain Launches to Bring DeFi Back to Telegram – Crypto News
-
others1 week ago
Crypto Exchange Hack: BigONE Users Lose A Massive $27 Million In Recent Exploit – Crypto News
-
Cryptocurrency1 week ago
Pi Network Activity Soars as Mystery Wallet Acquires Millions in PI Coins – Crypto News