

Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Blockchain4 days ago
It’s About Trust as NYSE Owner, Polymarket Bet on Tokenization – Crypto News
-
Technology1 week ago
Einride Raises $100 Million for Road Freight Technology Solutions – Crypto News
-
Technology1 week ago
CAKE eyes 60% rally as PancakeSwap hits $772B trading all-time high – Crypto News
-
Business1 week ago
REX-Osprey Files For ADA, HYPE, XLM, SUI ETFs as Crypto ETF Frenzy Heats Up – Crypto News
-
others1 week ago
USD/JPY returns below 147.00 amid generalized Dollar weakness – Crypto News
-
others1 week ago
UK firms’ inflation expectations seen higher at 3.5% in the September quarter – Crypto News
-
Business1 week ago
October Fed Rate Cut Odds Rise After Weak U.S. Labor Data, Bitcoin Surges – Crypto News
-
Blockchain1 week ago
USDT, USDC Dominance Falls To 82% Amid Rising Competition – Crypto News
-
Cryptocurrency1 week ago
BREAKING: Bitcoin Reclaims $120K. Is ATH Next? – Crypto News
-
Blockchain1 week ago
Robinhood CEO Says Asset Tokenization ‘Can’t Be Stopped’ – Crypto News
-
others1 week ago
Fed’s Lorie Logan Urges Caution on Further Rate Cuts Citing Inflation Risks – Crypto News
-
Business1 week ago
Nasdaq-Listed Fitell Adds Pump.fun’s PUMP To Supplement Solana Treasury – Crypto News
-
others1 week ago
EUR/USD remains bid as investors ramp up bets of Fed rate cuts – Crypto News
-
Cryptocurrency1 week ago
A Complete Guide for Beginners – Crypto News
-
Cryptocurrency1 week ago
XRP and DOGE ETFs Push $500 Million Milestone for U.S. Investment Fund – Crypto News
-
Cryptocurrency1 week ago
Private Key Leakage Remains the Leading Cause of Crypto Theft in Q3 2025 – Crypto News
-
Technology7 days ago
What Arattai, Zoho’s homegrown messaging app offers: Key features, how to download, top FAQs explained – Crypto News
-
Technology6 days ago
Diwali bonanza: iPhone 16 Pro Max price crashes by up to ₹55,000 on Flipkart – Don’t miss out! – Crypto News
-
Cryptocurrency1 week ago
Is Today’s $165B Crypto Market Rally The Start of a Massive Bull Run? – Crypto News
-
Metaverse1 week ago
BlackRock launches AI tool for financial advisors. Its first client is a big one. – Crypto News
-
others1 week ago
Current interest rate level is very appropriate – Crypto News
-
Blockchain1 week ago
WLFI and the Trump connection, opportunity or just hype? – Crypto News
-
Cryptocurrency1 week ago
What happens when $1.8M RLUSD enters the market – Is it an XRP rally? – Crypto News
-
others1 week ago
Pound Sterling trades firmly against Greenback on slowing US job demand – Crypto News
-
Business1 week ago
BNB Leads Crypto Market Rally With Fresh All-Time High, Expert Sees $5000 Upside – Crypto News
-
Technology1 week ago
Tech Giant Samsung Taps Coinbase To Provide Crypto Access, Driving Adoption – Crypto News
-
others1 week ago
Bitget Joins UNICEF Game Jam To Train 300,000 Youths In Blockchain – Crypto News
-
others1 week ago
MetaMask Gears Up for Major MASK Token Airdrop With Reward Points System Launch – Crypto News
-
Technology1 week ago
Gemini Nano Banana hacks: How to make AI-powered handwritten Diwali 2025 invites, reveals Google – Crypto News
-
Technology1 week ago
Expert Predicts SHIB Rally as Shiba Inu Restores Shibarium After $4M Hack Shutdown – Crypto News
-
Cryptocurrency1 week ago
Private Key Leakage Remains the Leading Cause of Crypto Theft in Q3 2025 – Crypto News
-
Technology1 week ago
ASTER Deposits Flows Into Binance Wallets Following CZ Endorsement, Listing Incoming? – Crypto News
-
Technology1 week ago
Morgan Stanley’s Tech Boss Says AI Coding Has ‘Profound’ Impact – Crypto News
-
Technology1 week ago
Boom or bubble: How long can the AI investment craze last? – Crypto News
-
Technology1 week ago
Breaking: CME to Launch 24/7 Crypto Futures Trading Amid Rising Institutional Demand – Crypto News
-
Blockchain1 week ago
ETHZilla CEO Predicts Ethereum as Future of Finance – Crypto News
-
De-fi1 week ago
Zcash Leads Rally as Bitcoin Surpasses $120,000 – Crypto News
-
others1 week ago
Bitcoin Price Hits $120K, Is Citigroup’s Bold Q4 Prediction in Motion? – Crypto News
-
Technology1 week ago
Aravind Srinivas takes a jab at Google as $200 Perplexity Comet browser goes free: ‘O hey hi Chrome!’ – Crypto News
-
Business1 week ago
BNB Rally to $1,300 Will Continue As Binance Hits Crucial Q3 Milestone, Says Expert – Crypto News
-
Technology1 week ago
Exclusive discounts on newly launched 2025 tablets with up to 53% off from Apple, Samsung, Lenovo, Xiaomi and OnePlus – Crypto News
-
Metaverse1 week ago
AI chatbots move toward a future with advertising and online shopping – Crypto News
-
Cryptocurrency1 week ago
ETF inflows, ‘debasement trade’ fuel bitcoin’s climb above $123K – Crypto News
-
Business1 week ago
Pro-Crypto Mike Selig Emerges As CFTC Chair Frontrunner, Gains Ripple CLO’s Endorsement – Crypto News
-
Technology1 week ago
‘Every Crypto ETF You Can Imagine’: Expert Predicts Flurry of Filings After REX-Osprey’s 21 Applications – Crypto News
-
Business1 week ago
99.3% of Bitcoin Supply in Profit, Analyst Warns of Short-Term Correction – Crypto News
-
Blockchain1 week ago
Bitcoin And XRP Are Testing Key Resistances And Could Turn Bloody Again, Here’s Why – Crypto News
-
others1 week ago
Trump’s Real Estate Moves On-Chain as Hut8 Adds WLFI Tokens to Boost Treasury – Crypto News
-
Technology1 week ago
Indonesia Revokes TikTok License Suspension After Data Submitted – Crypto News
-
Cryptocurrency1 week ago
Stripe’s USDC Transfers Exceed $100 Million on Polygon, Base, Ethereum – Crypto News