

Metaverse
Researchers are figuring out how large language models work – Crypto News
LLMs are built using a technique called deep learning, in which a network of billions of neurons, simulated in software and modelled on the structure of the human brain, is exposed to trillions of examples of something to discover inherent patterns. Trained on text strings, LLMs can hold conversations, generate text in a variety of styles, write software code, translate between languages and more besides.
Models are essentially grown, rather than designed, says Josh Batson, a researcher at Anthropic, an AI startup. Because LLMs are not explicitly programmed, nobody is entirely sure why they have such extraordinary abilities. Nor do they know why LLMs sometimes misbehave, or give wrong or made-up answers, known as “hallucinations”. LLMs really are black boxes. This is worrying, given that they and other deep-learning systems are starting to be used for all kinds of things, from offering customer support to preparing document summaries to writing software code.
It would be helpful to be able to poke around inside an LLM to see what is going on, just as it is possible, given the right tools, to do with a car engine or a microprocessor. Being able to understand a model’s inner workings in bottom-up, forensic detail is called “mechanistic interpretability”. But it is a daunting task for networks with billions of internal neurons. That has not stopped people trying, including Dr Batson and his colleagues. In a paper published in May, they explained how they have gained new insight into the workings of one of Anthropic’s LLMs.
One might think individual neurons inside an LLM would correspond to specific words. Unfortunately, things are not that simple. Instead, individual words or concepts are associated with the activation of complex patterns of neurons, and individual neurons may be activated by many different words or concepts. This problem was pointed out in earlier work by researchers at Anthropic, published in 2022. They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called “sparse autoencoder”. In their latest results they have scaled up this approach to work with Claude 3 Sonnet, a full-sized LLM.
A sparse autoencoder is, essentially, a second, smaller neural network that is trained on the activity of an LLM, looking for distinct patterns in activity when “sparse” (ie, very small) groups of its neurons fire together. Once many such patterns, known as features, have been identified, the researchers can determine which words trigger which features. The Anthropic team found individual features that corresponded to specific cities, people, animals and chemical elements, as well as higher-level concepts such as transport infrastructure, famous female tennis players, or the notion of secrecy. They performed this exercise three times, identifying 1m, 4m and, on the last go, 34m features within the Sonnet LLM.
The result is a sort of mind-map of the LLM, showing a small fraction of the concepts it has learned about from its training data. Places in the San Francisco Bay Area that are close geographically are also “close” to each other in the concept space, as are related concepts, such as diseases or emotions. “This is exciting because we have a partial conceptual map, a hazy one, of what’s happening,” says Dr Batson. “And that’s the starting point—we can enrich that map and branch out from there.”
Focus the mind
As well as seeing parts of the LLM light up, as it were, in response to specific concepts, it is also possible to change its behaviour by manipulating individual features. Anthropic tested this idea by “spiking” (ie, turning up) a feature associated with the Golden Gate Bridge. The result was a version of Claude that was obsessed with the bridge, and mentioned it at any opportunity. When asked how to spend $10, for example, it suggested paying the toll and driving over the bridge; when asked to write a love story, it made up one about a lovelorn car that could not wait to cross it.
That may sound silly, but the same principle could be used to discourage the model from talking about particular topics, such as bioweapons production. “AI safety is a major goal here,” says Dr Batson. It can also be applied to behaviours. By tuning specific features, models could be made more or less sycophantic, empathetic or deceptive. Might a feature emerge that corresponds to the tendency to hallucinate? “We didn’t find a smoking gun,” says Dr Batson. Whether hallucinations have an identifiable mechanism or signature is, he says, a “million-dollar question”. And it is one addressed, by another group of researchers, in a new paper in Nature.
Sebastian Farquhar and colleagues at the University of Oxford used a measure called “semantic entropy” to assess whether a statement from an LLM is likely to be a hallucination or not. Their technique is quite straightforward: essentially, an LLM is given the same prompt several times, and its answers are then clustered by “semantic similarity” (ie, according to their meaning). The researchers’ hunch was that the “entropy” of these answers—in other words, the degree of inconsistency—corresponds to the LLM’s uncertainty, and thus the likelihood of hallucination. If all its answers are essentially variations on a theme, they are probably not hallucinations (though they may still be incorrect).
In one example, the Oxford group asked an LLM which country is associated with fado music, and it consistently replied that fado is the national music of Portugal—which is correct, and not a hallucination. But when asked about the function of a protein called StarD10, the model gave several wildly different answers, which suggests hallucination. (The researchers prefer the term “confabulation”, a subset of hallucinations they define as “arbitrary and incorrect generations”.) Overall, this approach was able to distinguish between accurate statements and hallucinations 79% of the time; ten percentage points better than previous methods. This work is complementary, in many ways, to Anthropic’s.
Others have also been lifting the lid on LLMs: the “superalignment” team at OpenAI, maker of GPT-4 and ChatGPT, released its own paper on sparse autoencoders in June, though the team has now been dissolved after several researchers left the firm. But the OpenAI paper contained some innovative ideas, says Dr Batson. “We are really happy to see groups all over, working to understand models better,” he says. “We want everybody doing it.”
© 2024, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com
-
Technology1 week ago
Chip Designer Arm Plans to Become Chip Manufacturer – Crypto News
-
Cryptocurrency3 days ago
SUI eyes 24% rally as bullish price action gains strength – Crypto News
-
others6 days ago
Japanese Yen remains depressed amid modest USD strength; downside seems limited – Crypto News
-
Technology1 week ago
MacBook Air M3 15-inch model gets a ₹12,000 price drop on Amazon: Deal explained – Crypto News
-
Cryptocurrency2 days ago
Coinbase scores major win as SEC set to drop lawsuit – Crypto News
-
others1 week ago
Japan Foreign Investment in Japan Stocks declined to ¥-384.4B in February 7 from previous ¥-315.2B – Crypto News
-
Technology1 week ago
Perplexity takes on ChatGPT and Gemini with new Deep Research AI that completes most tasks in under 3 minutes – Crypto News
-
Technology1 week ago
Lava Pro Watch X with 1.44-inch AMOLED display, in-built GPS launched in India at ₹4,499 – Crypto News
-
Blockchain6 days ago
XRP Set To Outshine Gold? Analyst Predicts 1,000% Surge – Crypto News
-
Cryptocurrency1 week ago
Advisers on crypto: Takeaways from another survey – Crypto News
-
others1 week ago
Remains subdued below 1.4200 near falling wedge’s lower threshold – Crypto News
-
Cryptocurrency1 week ago
0xLoky Introduces AI-powered Intel for Crypto Data & On-chain Insights – Crypto News
-
Technology1 week ago
Factbox-China’s AI firms take spotlight with deals, low-cost models – Crypto News
-
Technology1 week ago
Massive price drops on Samsung Galaxy devices: Up to ₹10000 discount on Watch Ultra, Tab S10 Plus, and more – Crypto News
-
Cryptocurrency1 week ago
Tether Acquires a Minority Stake in Italian Football Giant Juventus – Crypto News
-
Blockchain1 week ago
XRP To 3 Digits? The ‘Signs’ That Could Confirm It, Basketball Analyst Says – Crypto News
-
others1 week ago
Australian Dollar jumps to highs since December on USD weakness – Crypto News
-
Technology1 week ago
Weekly Tech Recap: JioHotstar launched, Sam Altman vs Elon Musk feud intensifies, Perplexity takes on ChatGPT and more – Crypto News
-
Technology1 week ago
What will it take for India to become a global data centre hub? – Crypto News
-
Technology1 week ago
ChatGPT vs Perplexity: Sam Altman praises Aravind Srinivas’ Deep Research AI; ‘Proud of you’ – Crypto News
-
Blockchain1 week ago
NEAR Breaks Below Parallel Channel: Key Levels To Watch – Crypto News
-
Blockchain7 days ago
Will BTC Rebound Or Drop To $76,000? – Crypto News
-
Blockchain7 days ago
XRP Price Settles After Gains—Is a Fresh Upside Move Coming? – Crypto News
-
Metaverse6 days ago
How AI will divide the best from the rest – Crypto News
-
Business6 days ago
What Will be KAITO Price At Launch? – Crypto News
-
Business6 days ago
Elon Musk’s DOGE Launches Probe into US SEC, Ripple Lawsuit To End? – Crypto News
-
Blockchain6 days ago
XRP Price Pulls Back From Highs—Are Bulls Still in Control? – Crypto News
-
Business5 days ago
Whales Move From Shiba Inu to FXGuys – Here’s Why – Crypto News
-
Technology3 days ago
Stellantis Debuts System to Handle ‘Routine Driving Tasks’ – Crypto News
-
Technology1 week ago
Best phones under ₹20,000 in February 2025: Poco X7, Motorola Edge 50 Neo and more – Crypto News
-
Blockchain1 week ago
Popular Investor Says Memecoin More Superior With ‘World’s Best Chart’ – Crypto News
-
Cryptocurrency1 week ago
Crypto narratives as we await next market move – Crypto News
-
Business1 week ago
How Will It Affect Pi Coin Price? – Crypto News
-
Cryptocurrency1 week ago
Who is Satoshi Nakamoto, The Creator of Bitcoin? – Crypto News
-
Technology1 week ago
Grok 3 is coming! Elon Musk announces launch date, promises ‘smartest AI on Earth’ – Crypto News
-
Technology7 days ago
Union Minister Ashwini Vaishnaw to launch India AI Mission portal soon, 10 companies set to provide 14,000 GPUs – Crypto News
-
Business6 days ago
These 3 Altcoins Will Help You Capitalize on Stellar’s Recent DIp – Crypto News
-
others6 days ago
Forex Today: What if the RBA…? – Crypto News
-
Cryptocurrency6 days ago
Hayden Davis crypto scandal deepens as LIBRA memecoin faces fraud allegations – Crypto News
-
Technology6 days ago
Luminious inverters for your home to never see darkness again – Crypto News
-
Metaverse1 week ago
Strange Love: why people are falling for their AI companions – Crypto News
-
Technology1 week ago
Former Google CEO warns of ‘Bin Laden scenario’ for AI: ‘They could misuse it and do real harm’ – Crypto News
-
Cryptocurrency1 week ago
Yap-to-earn takes over Twitter – Blockworks – Crypto News
-
Cryptocurrency1 week ago
Someone Just Won $100K in Bitcoin From a $50 Pack of Trading Cards – Crypto News
-
Technology1 week ago
Cyber fraud alert: Doctor duped of ₹15.50 lakh via fake trading app; here’s what happened – Crypto News
-
Cryptocurrency1 week ago
GameStop Stock Price Pumps After Report of Bitcoin Buying Plans – Crypto News
-
Blockchain1 week ago
XRP Bullish Pennant Targets $15-$17 But Confirmation Is Required – Crypto News
-
Technology7 days ago
South Korea removes DeepSeek from app stores, existing users advised to ‘service with caution’ – Crypto News
-
Business6 days ago
Why Ethereum (ETH) Price Revival Could Start Soon After Solana Mess? – Crypto News
-
Business6 days ago
Market Veteran Predicts XRP Price If Ripple Completes Cup and Handle Pattern – Crypto News