

Metaverse
Researchers are figuring out how large language models work – Crypto News
LLMs are built using a technique called deep learning, in which a network of billions of neurons, simulated in software and modelled on the structure of the human brain, is exposed to trillions of examples of something to discover inherent patterns. Trained on text strings, LLMs can hold conversations, generate text in a variety of styles, write software code, translate between languages and more besides.
Models are essentially grown, rather than designed, says Josh Batson, a researcher at Anthropic, an AI startup. Because LLMs are not explicitly programmed, nobody is entirely sure why they have such extraordinary abilities. Nor do they know why LLMs sometimes misbehave, or give wrong or made-up answers, known as “hallucinations”. LLMs really are black boxes. This is worrying, given that they and other deep-learning systems are starting to be used for all kinds of things, from offering customer support to preparing document summaries to writing software code.
It would be helpful to be able to poke around inside an LLM to see what is going on, just as it is possible, given the right tools, to do with a car engine or a microprocessor. Being able to understand a model’s inner workings in bottom-up, forensic detail is called “mechanistic interpretability”. But it is a daunting task for networks with billions of internal neurons. That has not stopped people trying, including Dr Batson and his colleagues. In a paper published in May, they explained how they have gained new insight into the workings of one of Anthropic’s LLMs.
One might think individual neurons inside an LLM would correspond to specific words. Unfortunately, things are not that simple. Instead, individual words or concepts are associated with the activation of complex patterns of neurons, and individual neurons may be activated by many different words or concepts. This problem was pointed out in earlier work by researchers at Anthropic, published in 2022. They proposed—and subsequently tried—various workarounds, achieving good results on very small language models in 2023 with a so-called “sparse autoencoder”. In their latest results they have scaled up this approach to work with Claude 3 Sonnet, a full-sized LLM.
A sparse autoencoder is, essentially, a second, smaller neural network that is trained on the activity of an LLM, looking for distinct patterns in activity when “sparse” (ie, very small) groups of its neurons fire together. Once many such patterns, known as features, have been identified, the researchers can determine which words trigger which features. The Anthropic team found individual features that corresponded to specific cities, people, animals and chemical elements, as well as higher-level concepts such as transport infrastructure, famous female tennis players, or the notion of secrecy. They performed this exercise three times, identifying 1m, 4m and, on the last go, 34m features within the Sonnet LLM.
The result is a sort of mind-map of the LLM, showing a small fraction of the concepts it has learned about from its training data. Places in the San Francisco Bay Area that are close geographically are also “close” to each other in the concept space, as are related concepts, such as diseases or emotions. “This is exciting because we have a partial conceptual map, a hazy one, of what’s happening,” says Dr Batson. “And that’s the starting point—we can enrich that map and branch out from there.”
Focus the mind
As well as seeing parts of the LLM light up, as it were, in response to specific concepts, it is also possible to change its behaviour by manipulating individual features. Anthropic tested this idea by “spiking” (ie, turning up) a feature associated with the Golden Gate Bridge. The result was a version of Claude that was obsessed with the bridge, and mentioned it at any opportunity. When asked how to spend $10, for example, it suggested paying the toll and driving over the bridge; when asked to write a love story, it made up one about a lovelorn car that could not wait to cross it.
That may sound silly, but the same principle could be used to discourage the model from talking about particular topics, such as bioweapons production. “AI safety is a major goal here,” says Dr Batson. It can also be applied to behaviours. By tuning specific features, models could be made more or less sycophantic, empathetic or deceptive. Might a feature emerge that corresponds to the tendency to hallucinate? “We didn’t find a smoking gun,” says Dr Batson. Whether hallucinations have an identifiable mechanism or signature is, he says, a “million-dollar question”. And it is one addressed, by another group of researchers, in a new paper in Nature.
Sebastian Farquhar and colleagues at the University of Oxford used a measure called “semantic entropy” to assess whether a statement from an LLM is likely to be a hallucination or not. Their technique is quite straightforward: essentially, an LLM is given the same prompt several times, and its answers are then clustered by “semantic similarity” (ie, according to their meaning). The researchers’ hunch was that the “entropy” of these answers—in other words, the degree of inconsistency—corresponds to the LLM’s uncertainty, and thus the likelihood of hallucination. If all its answers are essentially variations on a theme, they are probably not hallucinations (though they may still be incorrect).
In one example, the Oxford group asked an LLM which country is associated with fado music, and it consistently replied that fado is the national music of Portugal—which is correct, and not a hallucination. But when asked about the function of a protein called StarD10, the model gave several wildly different answers, which suggests hallucination. (The researchers prefer the term “confabulation”, a subset of hallucinations they define as “arbitrary and incorrect generations”.) Overall, this approach was able to distinguish between accurate statements and hallucinations 79% of the time; ten percentage points better than previous methods. This work is complementary, in many ways, to Anthropic’s.
Others have also been lifting the lid on LLMs: the “superalignment” team at OpenAI, maker of GPT-4 and ChatGPT, released its own paper on sparse autoencoders in June, though the team has now been dissolved after several researchers left the firm. But the OpenAI paper contained some innovative ideas, says Dr Batson. “We are really happy to see groups all over, working to understand models better,” he says. “We want everybody doing it.”
© 2024, The Economist Newspaper Limited. All rights reserved. From The Economist, published under licence. The original content can be found on www.economist.com
-
Blockchain1 week ago
Crypto Hedge Fund Veterans Seek $100M to Buy BNB, Emulating Saylor’s Bitcoin Strategy – Crypto News
-
Blockchain1 week ago
Change In US Crypto Laws May Affect Charges In Do Kwon’s Criminal Case – Crypto News
-
Technology1 week ago
Branded smartwatches under ₹5000 for style and functionality: Top 10 picks for everyday wear – Crypto News
-
Blockchain1 week ago
Best Crypto to Buy as Polymarket Nears $1B Valuation – Crypto News
-
Technology1 week ago
Top 10 air coolers for monsoon: Handpicked products for effective cooling from trusted brands – Crypto News
-
Cryptocurrency7 days ago
SHIB Price Prediction for June 26 – Crypto News
-
others6 days ago
Gold retreats while Fed Powell and President Trump clash over interest rates – Crypto News
-
Cryptocurrency5 days ago
Friday charts: Retail is one-upping Wall Street – Crypto News
-
others1 week ago
Polemos Launches PLMS Token On MEXC and Uniswap, Advancing Web3 Gaming Infrastructure – Crypto News
-
Cryptocurrency1 week ago
Story (IP) surges after whales buy 16M tokens – Crypto News
-
others1 week ago
Rich Dad Poor Dad Author Says ‘Biggest Crash in History’ Approaching While Baby Boomers Lose Retirements to Inflation – Crypto News
-
Blockchain1 week ago
Investor Anthony Pompliano Launches $1B Bitcoin Treasury Firm – Crypto News
-
Blockchain1 week ago
Bitcoin Price Could Rally To $110,000 ATH As These Macroeconomic Factors Align – Crypto News
-
others6 days ago
EUR/JPY steadies near 169.00 as traders await the next catalyst – Crypto News
-
Blockchain1 week ago
Crypto ETPs Post $1.2B Inflows While Spot Prices Drop – Crypto News
-
Blockchain1 week ago
Anchorage to Integrate Uniswap to Wallet for Insitutional DeFi – Crypto News
-
Cryptocurrency1 week ago
Bitcoin rallies to $106K on Mideast ceasefire news; Circle shares continue explosive climb – Crypto News
-
Cryptocurrency1 week ago
What next for XRP after breaking above the $2.15 resistance? – Crypto News
-
Technology1 week ago
OpenAI quietly removes all mention of Jonny Ive’s ‘IO’, but deal stays on: What you need to know – Crypto News
-
others1 week ago
EUR/GBP gathers strength above 0.8550 ahead of Eurozone/UK PMI releases – Crypto News
-
Cryptocurrency1 week ago
Bitcoin bounces after dip to under $99K – Crypto News
-
others1 week ago
Walmart Ordered To Pay $10,000,000 After Retail Giant Allegedly ‘Turned a Blind Eye’ to Scammers Exploiting Customers – Crypto News
-
Blockchain1 week ago
Bitcoin Wobbles? Metaplanet Buys Big, Breaks $1 Billion Mark – Crypto News
-
others1 week ago
Indian Rupee recovers as Oil falls post Iran strike, Fed dovish signals limit US Dollar strength – Crypto News
-
Blockchain1 week ago
Bearish Breakdown Meets Bullish Flag, Which Will Prevail? – Crypto News
-
Blockchain1 week ago
Bearish Breakdown Meets Bullish Flag, Which Will Prevail? – Crypto News
-
Blockchain1 week ago
Cutting Block Times To Boost Performance – Crypto News
-
others1 week ago
US stocks downplay Iran retaliation concerns as indices edge higher – Crypto News
-
others1 week ago
Bank of America, Netflix and Apple Customers Targeted by Widescale Google Search Scams: Report – Crypto News
-
others1 week ago
Tariffs may be adding a quarter of a percentage point to inflation right now – Crypto News
-
Technology1 week ago
US judge rules Anthropic’s use of books for AI training is fair use: All you need to know – Crypto News
-
Technology1 week ago
OpenAI and Jony Ive’s AI hardware ambitions hit roadblock over trademark dispute: Report – Crypto News
-
Technology1 week ago
Best laptops under ₹40,000 (June 2025) with latest processors, SSD storage, and Windows 11 features, Top 10 picks – Crypto News
-
Technology1 week ago
Turkey plans stricter crypto rules to fight money laundering – Crypto News
-
others1 week ago
Winnebago Industries (WGO) tops Q3 earnings estimates – Crypto News
-
De-fi7 days ago
Barclays to Ban Crypto Purchases via Credit Card – Crypto News
-
others6 days ago
AI-Focused Layer-1 Blockchain Altcoin SAHARA Flames Out Following New Binance Listing – Crypto News
-
Cryptocurrency5 days ago
TRON price forecast as USDT supply surpasses $80 billion – Crypto News
-
De-fi1 week ago
Bitcoin Slides Below $102,000 as Israel–Iran Clash Triggers $500 Million Crypto Liquidations – Crypto News
-
Technology1 week ago
Top 10 8kg washing machines in June 2025 worth checking if you want a balance of space, speed, and value – Crypto News
-
others1 week ago
FactSet research (FDS) Q3 earnings lag estimates – Crypto News
-
De-fi1 week ago
Payments Giant Fiserv to Roll Out ‘Bank-Friendly’ Stablecoin on Solana – Crypto News
-
Technology1 week ago
Google introduces AI-powered Chromebook Plus 14: Features, specifications, and more – Crypto News
-
Cryptocurrency1 week ago
Polymarket ‘politics’ markets surge, Celestia rethinks consensus – Crypto News
-
others1 week ago
Bitcoin (BTC) and Ethereum (ETH) Lead $1,240,000,000 of Inflows to Crypto Products Despite Geopolitical Tensions: CoinShares – Crypto News
-
others1 week ago
German IFO Business Climate Index rises further to 88.4 in June vs. 88.3 expected – Crypto News
-
others1 week ago
Jerome Powell testifies Fed is well-positioned to wait to learn more about economy – Crypto News
-
De-fi1 week ago
Dragonfly-Backed Codex Launches Blockchain for Stablecoins with Native USDC Support – Crypto News
-
Blockchain1 week ago
Taker Buy Volume Spikes Sharply – Crypto News
-
Technology1 week ago
US judge rules Anthropic’s use of books for AI training is fair use: All you need to know – Crypto News