

Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
others7 days ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
Blockchain7 days ago
XRP MVRV Ratio Dips Below The 200-Day MA – Trend Shift Underway? – Crypto News
-
others6 days ago
Bybit Shuts Down Its NFT Marketplace As Crypto Sector Struggles To Recover – Crypto News
-
Technology1 week ago
How to generate Ghibli-style AI portraits using Grok 3 — no ChatGPT subscription needed – Crypto News
-
Technology1 week ago
How to generate Ghibli-style AI portraits using Grok 3 — no ChatGPT subscription needed – Crypto News
-
Business7 days ago
Can Pi Coin Price Hit $1 Soon? – Crypto News
-
others7 days ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
others7 days ago
Here’s What Bitcoin Needs To Do To Confirm Bullish Breakout, According to Trader Who Nailed 2024 BTC Correction – Crypto News
-
others7 days ago
XRP and Three Other Altcoins Could Witness Another Sell-Off Event, According to Crypto Strategist – Crypto News
-
Cryptocurrency6 days ago
Sony Singapore Now Lets Shoppers Pay in USDC Through Crypto.com – Crypto News
-
Blockchain6 days ago
Dogecoin Breaking These Levels Could Be The Catalyst For Next Bull Run, Analyst Says – Crypto News
-
Business1 week ago
XRP, BTC, ETH Price Prediction As Inflation Data Sparks Downturn in U.S. Stocks – Crypto News
-
Blockchain7 days ago
zkLend hacker claims losing stolen ETH to Tornado Cash phishing site – Crypto News
-
others7 days ago
Gold extends bullish trend amid rising trade tensions; fresh record high and counting – Crypto News
-
Blockchain7 days ago
SpaceX flight bankrolled by crypto investor launches first manned polar orbit – Crypto News
-
Blockchain7 days ago
Will Bitcoin Downtrend Continue? This Metric Suggests Yes – Crypto News
-
Blockchain7 days ago
Binance ends Tether USDT trading in Europe to comply with MiCA rules – Crypto News
-
Blockchain7 days ago
Binance ends Tether USDT trading in Europe to comply with MiCA rules – Crypto News
-
Technology6 days ago
XRP Price Predicted to Reach $10 in April if US Congress Stablecoin Bill Promotes Ripple’s RLUSD – Crypto News
-
Business6 days ago
XRP Price Predicted to Reach $10 in April if US Congress Stablecoin Bill Promotes Ripple’s RLUSD – Crypto News
-
Technology6 days ago
Apple Intelligence debuts on Vision Pro with visionOS 2.4 update: AI-powered features, spatial content and more – Crypto News
-
Cryptocurrency6 days ago
US equities slip after job openings disappointment – Crypto News
-
others6 days ago
Will BNB Price Rally to ATH After VanEck BNB ETF Filing? – Crypto News
-
others6 days ago
Pound Sterling consolidates against US Dollar ahead of Trump’s tariffs announcement – Crypto News
-
Cryptocurrency6 days ago
FLOKI price poised for 20% rally, Here’s why – Crypto News
-
others6 days ago
PENDLE Price Jumps 8% Today Amid Huge Whale Accumulation – Crypto News
-
others6 days ago
Fundstrat’s Tom Lee Calls for Imminent Stock Market Reversal, Says US Has the ‘Right Pieces’ for a Bottom – Crypto News
-
Technology6 days ago
European regulators warn of financial risks from US crypto integration – Crypto News
-
Business1 week ago
Builder.ai Announces Third-Party Audit After Allegations – Crypto News
-
Technology7 days ago
Over 60 pc broadband, fiber, DSL users surveyed flag problems with connection: LocalCircles poll – Crypto News
-
Cryptocurrency7 days ago
Tether Boosts Bitcoin Holdings By 8,888 BTC In Q1 2025 – Crypto News
-
others7 days ago
Austria Unemployment fell from previous 347.4K to 316.3K in March – Crypto News
-
Blockchain7 days ago
Hayes Predicts $250,000 Bitcoin As Fed Caves To QE Pressure – Crypto News
-
Cryptocurrency7 days ago
XRP, DOGE Shoot up as BTC Price Reclaims $84K Level (Market Watch) – Crypto News
-
Technology7 days ago
Best washing machines under ₹10000 in April 2025 to boost your laundry routine without overspending – Crypto News
-
Blockchain7 days ago
Analyst Calls Dogecoin Chart A ‘Beauty’ As Key Indicators Align – Crypto News
-
Blockchain6 days ago
Bitcoin Price Bounces Back—Can It Finally Break Resistance? – Crypto News
-
Cryptocurrency6 days ago
US equities slip after job openings disappointment – Crypto News
-
Blockchain6 days ago
Several Altcoins Crash Up To 50% On Binance, What’s Going On? – Crypto News
-
Business6 days ago
What to Expect From XRP Price as Trump’s ‘Liberation Day’ Tariffs Go Into Effect Today – Crypto News
-
Metaverse1 week ago
The Tools of Tomorrow: What Lies Ahead with the AI Revolution – Crypto News
-
Business1 week ago
MicroStrategy Acquires 22,048 Bitcoin For $1.92 Billion – Crypto News
-
Business7 days ago
ChatGPT To Launch Next Big Model, New Studio Ghibli Ahead? – Crypto News
-
others7 days ago
Will the RBA hint at further interest rate hikes at its policy meeting? – Crypto News
-
Technology7 days ago
Bumper discounts in Amazon Gaming Fest! Up to 70% off on gaming laptops, monitors, vlog cameras and more – Crypto News
-
Technology7 days ago
Whale Offloads ETH at Loss, But Experts Predict Ethereum Price Rally Amid April Macroeconomic Events – Crypto News
-
Technology7 days ago
SBI down: Mobile banking, ATMs and more affected; here’s what the bank said… – Crypto News
-
Cryptocurrency7 days ago
Crypto Game ‘Blade of God X’ Accused of Mismanagement by Former Exec – Crypto News
-
others7 days ago
GBP/USD could stretch lower if 1.2900 support fails – Crypto News
-
others7 days ago
Goldman Sachs Raises Chance of Recession in a Year to 35%, Says Trump Tariffs To Cause Inflation Spike: Report – Crypto News