Metaverse
How to train your large language model – Crypto News
It is no secret that building a large language model (LLM) requires vast amounts of data. In conventional training, an LLM is fed mountains of text, and encouraged to guess each word before it appears. With each prediction, the LLM makes small adjustments to improve its chances of guessing right. The end result is something that has a certain statistical “understanding” of what is proper language and what isn’t.
But an LLM that has only undergone this so-called “pretraining” is not yet particularly useful. When asked for a joke to cheer your correspondent up, for instance, the pretrained model GPT-2 just repeated the question back three times. When asked who the American president was, it responded: “The answer is no. The president is not the president.” Clearly, teaching an LLM to do what humans want requires something more.
One way to align such models with users’ expectations is through reinforcement learning from human feedback (RLHF). OpenAI, an American startup, introduced this technique in a preprint published in March 2022. It was a major ingredient in its recipe for ChatGPT, which was released eight months later.
RLHF normally involves three steps. First, human volunteers are asked to choose which of two potential LLM responses might better fit a given prompt. This is then repeated many thousands of times over. This data set is then used to train a second LLM to, in effect, stand in for the human being. This so-called reward model, designed to assign higher scores to responses a human would like, and lower scores to everything else, is then used to train the original LLM. As a final touch, a machine-learning technique called reinforcement learning tweaks the knobs and levers of the original LLM to help reinforce the behaviours that earn it a reward.
This way of doing RLHF is quite involved—using two separate LLMs takes time and money, and the algorithm used for reinforcement learning is, to quote Rafael Rafailov at Stanford University, “quite painful”. This has meant that, outside of OpenAI, Google and their rivals, nobody has really exploited its full potential.
It now turns out that the same results can be achieved for a fraction of the effort. Dr Rafailov and his colleagues, including Archit Sharma and Eric Mitchell, presented this alternative in December 2023 at NeurIPS, an AI conference. Their method, Direct Preference Optimisation (DPO), relies on a satisfying mathematical trick.
This trick hinges on the observation that for every reward model there is a specific theoretical LLM that would get full marks, and every LLM likewise has a theoretical reward model that would give it flying colours. (Just as, more prosaically, every pair of trousers has a theoretical person on whom they would sit perfectly, and every person has a theoretical pair of trousers that would best fit.) This observation that each LLM conceals an implicit reward model allowed the researchers to tinker with this model directly. In the old regime, the LLM learned from the reward model, which learned from the data. Now, the LLM can learn directly from the data.
According to the authors, removing the middleman makes DPO between three and six times more efficient than RLHF, and capable of better performance at tasks such as text summarisation. Its ease of use is already allowing smaller companies to tackle the problem of alignment, says Dr Sharma. A year ago only a few world-leading models, such as Google’s Gemini and OpenAI’s GPT-4, could afford to use RLHF. But as of March 12th eight out of the ten highest-ranked LLMs on an industry leaderboard used DPO. Mistral, the French startup seeking to rival OpenAI, uses it. Meta, a social-media giant, has integrated it into a home-grown LLM.
Further improvements are sure to come. For one thing, the consensus view is that the big AI labs have made improvements to their proprietary algorithms since they stopped publishing details in 2022. But the problem of getting an LLM to do what a human would want and expect is far from done and dusted. After all, even other humans occasionally struggle.
© 2024, The Economist Newspaper Ltd. All rights reserved.
From The Economist, published under licence. The original content can be found on www.economist.com
Milestone Alert!
Livemint tops charts as the fastest growing news website in the world 🌏 Click here to know more.
Unlock a world of Benefits! From insightful newsletters to real-time stock tracking, breaking news and a personalized newsfeed – it’s all here, just a click away! Login Now!
Download The Mint News App to get Daily Market Updates.
Published: 13 May 2024, 07:00 PM IST
-
Metaverse1 week agoHow Clear is using AI Agents to simplify tax filing in India – Crypto News
-
Cryptocurrency1 week agoOver 80% of Bitcoin ETF assets hit Coinbase custody choke point with $74B at risk – Crypto News
-
Cryptocurrency1 week agoOver 80% of Bitcoin ETF assets hit Coinbase custody choke point with $74B at risk – Crypto News
-
Cryptocurrency1 week agoOver 80% of Bitcoin ETF assets hit Coinbase custody choke point with $74B at risk – Crypto News
-
Cryptocurrency5 days agoBitcoin Cracks 7-Month Ceiling. Can Bulls Push It Higher? – Crypto News
-
Cryptocurrency6 days agoWhy the SEC just gave self custody crypto apps 5 years to get traditional broker licenses – Crypto News
-
Blockchain5 days agoWhy Ethereum Has Become One Of The Most Heavily Shorted Assets Globally – Crypto News
-
Technology1 week ago
Strategy’s STRC Raises Enough Capital to Buy Another $1.76B in Bitcoin – Crypto News
-
Cryptocurrency1 week agoWhy This Massive $297M Bitcoin ETF Outflow Could Actually Be a Buy Signal – Crypto News
-
Cryptocurrency6 days agoShiba Inu (SHIB) Most Stable It Has Ever Been, Hyperliquid (HYPE) on Verge of New ATH, XRP Price Spikes Through First Resistance: Crypto Market Review – Crypto News
-
others5 days agoGold Purchases by Global Central Banks Skyrocket 575%, Surpassing $4,600,000,000 in Just One Month – Crypto News
-
others5 days ago$815,420,000 in Bitcoin and Crypto Liquidated As BTC Surges Above $78,000 – Crypto News
-
Cryptocurrency4 days agoBitcoin now has just 4 days before ceasefire deadline risks price reversal with Hormuz closed again – Crypto News
-
Cryptocurrency1 week agoTrump family’s WLFI starts damage control but its new plan leaves holders who refuse the new terms locked indefinitely – Crypto News
-
Blockchain5 days agoWhat CFOs Need to Know About Freezing and Burning Stablecoins – Crypto News
-
Blockchain5 days agoCircle Launches USDC Bridge For Native Cross-Chain Transfers – Crypto News
-
Technology5 days ago
RAVE Coin Faces Pump-and-Dump Alert Amid 44% Rally, Binance & Bitget Urged to Probe – Crypto News
-
Blockchain19 hours agoDoorDash Turns to Tempo to Offer Stablecoin Payments – Crypto News
-
Technology1 week agoChatGPT, Gemini and Grok confidently generate dangerous medical advice half the time, study finds – Crypto News
-
Cryptocurrency6 days agoWhy the SEC just gave self custody crypto apps 5 years to get traditional broker licenses – Crypto News
-
Technology5 days ago
XRP News: Coinbase Derivatives Files XRP Market Maker Program With CFTC To Boost Liquidity – Crypto News
-
Business5 days ago
Bitcoin and XRP Price as Iran Opens Strait Of Hormuz – Crypto News
-
others5 days ago
Just-In: Ripple XRP Is Now Live On Solana-Powered Apps, Price Jumps 5% – Crypto News
-
Blockchain5 days agoRussia Introduces Bill To Criminalize Unregistered Crypto Services – Crypto News
-
Cryptocurrency5 days agoRipple taps Kyobo Life to enable real-time government bond settlements in Korea – Crypto News
-
Blockchain5 days agoCircle Launches USDC Bridge For Native Cross-Chain Transfers – Crypto News
-
Technology5 days agoIn the AI propaganda war, Iran is winning – Crypto News
-
others5 days agoJPMorgan Chase, Citi and Wells Fargo Lose $5,606,000,000 to Bad Loans in Just Three Months – Crypto News
-
Metaverse1 week agoIndia’s manufacturing giants are embracing agentic AI to enhance efficiencies – Crypto News
-
Metaverse1 week agoIndia’s manufacturing giants are embracing agentic AI to enhance efficiencies – Crypto News
-
Cryptocurrency1 week agoTrump family’s WLFI starts damage control but its new plan leaves holders who refuse the new terms locked indefinitely – Crypto News
-
Cryptocurrency7 days agoAnthropic’s Mythos puts hundreds of billions in crypto at immediate risk – Crypto News
-
Blockchain6 days agoFrench Minister Seeks Measures Against Crypto Wrench Attacks, Kidnappings – Crypto News
-
Technology6 days agoFormer Meta contractor Sama to lay off more than 1,000 workers in Kenya – Crypto News
-
Business6 days ago
Fed’s John Williams Signals Support for Holding Rates Steady Ahead of FOMC Meeting – Crypto News
-
De-fi6 days agoFoundation NFT Marketplace Shuts Down Permanently After Failed Sale – Crypto News
-
De-fi5 days agoMemecoin Sector Shows Signs of Life as ASTEROID Rockets Past $25M – Crypto News
-
Technology5 days agoWhite House chief of staff to meet with Anthropic CEO over its new AI technology – Crypto News
-
Blockchain5 days agoDanger Zone Or Entry Point? – Crypto News
-
Technology5 days ago
X’s BTC, ETH, XRP, DOGE Cashtags Drive $1B in Trading Volume Since Launch – Crypto News
-
Cryptocurrency5 days agoThe $78K Bull Trap? Why Iran’s Latest Statement Could Send Bitcoin Tumbling – Crypto News
-
Technology5 days agoIn the AI propaganda war, Iran is winning – Crypto News
-
Blockchain5 days agoXRP Rallies Toward $1.50—Expert Cites 3 Dates That Could Decide The Next Direction – Crypto News
-
Cryptocurrency4 days agoBitcoin miners pivot to AI is now an immediate risk to network security – Crypto News
-
Technology4 days agoBackup calling, direct voicemail features in smartphones originated in India: Samsung official – Crypto News
-
Blockchain1 week agoBanks Bet Big on Tokenized Deposits to Power Real-Time Treasury – Crypto News
-
Metaverse1 week agoHow to disable Google Gemini in Gmail, Docs and Workspace: A step-by-step guide – Crypto News
-
Metaverse1 week agoHow to disable Google Gemini in Gmail, Docs and Workspace: A step-by-step guide – Crypto News
-
Blockchain6 days agoFrench Minister Seeks Measures Against Crypto Wrench Attacks, Kidnappings – Crypto News
-
De-fi6 days agoCharles Schwab Announces Rollout of Spot BTC and ETH Trading for Retail Clients – Crypto News
