

Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Blockchain1 week ago
Conduit Raises $36M to Expand Cross-Border Stablecoin System – Crypto News
-
Blockchain1 week ago
Conduit Raises $36M to Expand Cross-Border Stablecoin System – Crypto News
-
others1 week ago
BitMEX Unveils AI-Powered VIP Trading Reports in Partnership with Hoc-trade – Crypto News
-
Cryptocurrency1 week ago
XRP Spot ETF Update: SEC Advances WisdomTree Proposal Review – Crypto News
-
Blockchain1 week ago
$8 XRP Sounds Huge—But This Analyst Isn’t Cheering Yet – Crypto News
-
Business1 week ago
Bitcoin Crash Fears Escalate as BTC Price Stalls Under $110K Amid $3.2B BTC Inflow – Crypto News
-
Metaverse1 week ago
Anthropic rolls out real-time voice chat for Claude on iOS and Android: What it means for users – Crypto News
-
Cryptocurrency1 week ago
XRP drops 1.05% as resistance levels cap recovery – Crypto News
-
Cryptocurrency1 week ago
The monetary power of the periphery: How Dallas defends the dollar – Crypto News
-
others1 week ago
Trader Michaël van de Poppe Says Ethereum-Based Altcoin Primed To Do Well in Coming Months, Updates Outlook on Bitcoin and Sui – Crypto News
-
Technology1 week ago
Cool savings for a hot season: Top 10 deals for you on ACs, refrigerators, microwaves, and more with up to 60% off – Crypto News
-
Cryptocurrency7 days ago
One day left to invest in Bitcoin Pepe before it hits centralised exchanges – Crypto News
-
Technology1 week ago
Why Is Pepe Coin Trending Today? – Crypto News
-
Cryptocurrency1 week ago
Nifty 50 Ends Higher After Two-Day Drop, But Bulls Struggle to Break 25,000 – Crypto News
-
others1 week ago
Gold surges above $3,300 as US jobs data disappoints, Trump tariffs blocked – Crypto News
-
Blockchain1 week ago
Testing Strength At Key Support – Crypto News
-
others1 week ago
Echo Announces New Platform Sonar For Public Token Sales – Are ICO Days Back? – Crypto News
-
Blockchain1 week ago
XRP Marks Another Milestone As Dubai Brings $16 Billion In Real Estate Company To The Blockchain – Details – Crypto News
-
Cryptocurrency1 week ago
Coinbase helps bust $20M spoofing case – Crypto News
-
Business1 week ago
Sharplink Gaming Files $1 Billion Shelf Offering To Purchase Ethereum – Crypto News
-
others1 week ago
Sharplink Gaming Files $1 Billion Shelf Offering To Purchase Ethereum – Crypto News
-
Technology7 days ago
WhatsApp Status gets new Instagram-like features: Here’s what’s new – Crypto News
-
Technology6 days ago
Just-In: IMF Raises Red Flag Over Pakistan’s Bitcoin Mining Plans, Is $1.5B IMF Loan at Risk? – Crypto News
-
Blockchain6 days ago
Czech Justice Minister Resigns Over $45M Bitcoin Donation Scandal – Crypto News
-
Technology1 week ago
Breaking: Telegram Partners with Elon Musks’s xAI, TON Price Jumps 23% – Crypto News
-
Cryptocurrency1 week ago
Ethereum surges 5% as SharpLink eyes $425m ETH treasury – Crypto News
-
Technology1 week ago
XRP News: RLUSD Stablecoin Bags New Listing on Major DeFi Platform – Crypto News
-
Cryptocurrency1 week ago
SOL Strategies Files $1B Shelf Prospectus to Boost Solana Investment ‘Flexibility’ – Crypto News
-
Blockchain1 week ago
Bitcoin $106,800 Support Retest To Determine BTC’s Next Move – Crypto News
-
Cryptocurrency1 week ago
Litecoin price forecast: tracking LTC’s bullish technical setup – Crypto News
-
Cryptocurrency1 week ago
Litecoin price forecast: tracking LTC’s bullish technical setup – Crypto News
-
Cryptocurrency1 week ago
XRP futures surge past $223M as price holds $2.27 support – Crypto News
-
Cryptocurrency1 week ago
Cold Summer? Bitcoin Price Breaches $105K Support As Tariffs Return to Play – Crypto News
-
others6 days ago
JPMorgan Chase CEO Warns US Bond Crisis Coming After Massive Money Printing, Says Regulators Will Panic – Crypto News
-
Cryptocurrency6 days ago
Bitcoin in ‘make or break’ zone – Trump Media hints at what’s next – Crypto News
-
Cryptocurrency6 days ago
Can Shiba Inu Price Recover as Age Consumed & Falling MVRV Signal Bottom? – Crypto News
-
Blockchain6 days ago
Bitcoin Still Bullish, But $200,000 Off The Table And $137,000 In Sight – Crypto News
-
Blockchain1 week ago
Metaplanet issues $50M in zero-interest bonds to boost Bitcoin holdings – Crypto News
-
Cryptocurrency1 week ago
‘Rich Dad Poor Dad’ Author Predicts Bitcoin to $1 Million Amid Coming Economy Crash – Crypto News
-
others1 week ago
USD/CAD extends gains above 1.3800 with all eyes on the FOMC minutes – Crypto News
-
others1 week ago
U.S. Department of Labor Reverses 2022 Guidance That Blocked Digital Assets From 401(k) Plans – Crypto News
-
Blockchain1 week ago
RBI Expands Digital Rupee Pilots, UPI Leads Global Real-Time Payments – Crypto News
-
Blockchain1 week ago
Telegram raises $1.7 billion via bond offering – Crypto News
-
Business1 week ago
XRP Crash: Why Price Is Falling Today? – Crypto News
-
Business1 week ago
Floki Inu Announces Valhalla Mainnet Launch Date; FLOKI Price to Rally? – Crypto News
-
Metaverse1 week ago
IndiaAI Mission gets 16,000 new GPUs, three more foundational models – Crypto News
-
others1 week ago
$413,200,000,000 in Unrealized Losses Hit US Banks As FDIC Warns Rising Rates Adding Pressure – Crypto News
-
Cryptocurrency7 days ago
Friday Charts: Click here for good news – Crypto News
-
others7 days ago
Bankrupt Crypto Exchange FTX Officially Kicks Off Second Round of Creditor Repayments With $5,400,000,000 Distribution – Crypto News
-
Blockchain6 days ago
Major crypto hacks fell 40% in May, says PeckShield – Crypto News